The difference is already visible in the naive overlapping coverage computation. As fewer molecules are sequenced in the PE150 dataset, yielding fewer reads, the impact of off-target reads is higher on overall coverage. Comparing diagnostic coverage for the two read lengths, the difference is significant, as simply more of the sequencing data is informative for the shorter read length.
This means that the nucleotides in the middle of the molecule are sequenced twice. While this adds to raw coverage, it does not provide additional information on the sequenced genome. Only coverage resulting from different library molecules increases diagnostic sensitivity.
One could argue that longer read length generates more output from the same amount of input molecules, therefore the insert size should be increased accordingly to make use of this extra data. For instance, for a PE150 sequencing run, the insert size should be above 300bp.
However, the average size of human coding exons is only 160bp. Including flanking intronic regions of diagnostic significance (e.g., due to splice variants) of about 30bp at either side, the average target of interest is about 220 bp long. Larger insert sizes will lead to a higher proportion of the sequencing data falling outside the region of interest („off-target“ reads), hence wasting sequencing capacity (see figure 2).