Research: Exome

The exome represents the entirety of all known coding exons of the human genome. Although only comprising 1-2% of the genome, a total of 85% of all known disease-causing mutations are estimated to be located in these regions. It is therefore often reasonable to perform a targeted exome analysis. Application areas and objectives for exome sequencing in the scientific field are diverse and range from population genetics to hereditary disease research to tumor diagnostics.

CeGaT has broad expertise in exome sequencing and offers a comprehensive service from DNA extraction (from almost any source) to data analysis.

Protocols may be easily adjusted to the specific requirements of your samples. For samples with starting amounts as low as 200 ng of DNA, we offer a low-input protocol; we have also developed a protocol optimized for the extremely fragmented DNA extracted from FFPE-derived tissues.

For data analysis, our clients profit from our longstanding experience in human genetics and expertise in the field of tumor diagnostics:

  • When receiving samples from several family members, we are able to perform a filtering for different inheritance models depending on the pedigree of the affected family. This includes, for example, the identification of de novo mutations from a trio (parents and affected child), the identification of compound-heterozygous variants from a trio, or the identification of common variants for different affected family members.
  • When receiving tumor tissue along with corresponding normal tissue, we perform a comparative analysis of tumor vs. normal tissue for the targeted identification of somatic mutations in the tumor tissue. For these analyses, a tumor content of only 20% is sufficient, however, a higher tumor content leads to a better detection of subclonal mutations.

When processing your sequencing order, we focus on delivering results with high quality and reliability. Each project is led by a scientist and supervised by a project manager, who will be your contact person during the entire project phase. Upon completion of your project, you will receive a detailed project report with further information regarding sample QC, important laboratory parameters and bioinformatic evaluation and explanations.

Technical information

Technical performance

Enrichment of coding regions as well as flanking intronic regions (+/-10 bp) is accomplished by use of the in solution technology from Agilent (SureSelectXT Human All Exon V6) to guarantee an even distribution of reads.

High-throughput sequencing is carried out by the Illumina HiSeq Platform with 2×100 bp.

The sequencing depth can be individually adjusted according to the customers’ needs. For your guidance, we offer two different standard sequencing depths (10 – 12 gigabases or 18 – 20 gigabases). The resulting average coverage when sequencing high molecular DNA is shown in the table below.

Sequencing Volume (gigabases) Average coverage Cov <10 (bp, %) Cov <30 (bp, %)
10 – 12 GB 100 – 120x 2,5% ~10%
18 – 20 GB 200 – 220x 1,6% ~ 4,5%
Table 1: Calculation of the expected average coverage when sequencing high molecular DNA. When starting with highly fragmented, low molecular DNA, (e.g. DNA from FFPE-tissue), lower coverage values are expected for the same sequencing depth. The columns Cov <10 (bp, %) or Cov <30 (bp, %) indicate the percentage of the sequenced region that will reach a coverage of <10 or <30.

We are processing your sequencing data in a way that will allow you to directly start the genetic evaluation of your samples. This includes the alignment of reads against the reference genome, variant calling, and the annotation of variant lists through the use of external databases. The annotated lists may be directly opened (e.g. in Microsoft Excel), filtered, sorted and further evaluated.

When receiving samples from several family members, we are able to perform a filtering for different inheritance models, if requested by the customer. The most suitable filtering strategy will be discussed at the beginning of the project.

If you wish to perform additional analyses, data is provided for each intermediate step (raw data, aligned reads, variant calls). You can use this data in your further analyses, for example, in programs such as Ingenuity Variant Analysis.

Delivery content

Our delivery includes the following services:

  • Raw data in FastQ format (*fastq.gz)
    You will receive the sequencing data as it comes from the sequencer. Technical adapter sequences are already removed.
  • Aligned reads in BAM format (*bam)
    Reads will be aligned against the most recent version of the human genome (currently hg19) by Burrows Wheeler Aligner (bwa v.0.7.2). We will remove sequences that may lead to incorrect results such as PCR duplicates or sequences that cannot be clearly assigned to a unique genomic region.
  • Variant lists (*.vcf)
    On the basis of the aligned reads, we will determine differences between the genome of your sample and the reference genome. To minimize the risk of false negative variants, we apply highly sensitive settings. This leads to a relatively large number of detected variants, which may be further reduced by filtering during later analysis steps depending on your requirements.
  • Annotated variant lists (*.tsv)
    The detected variants will be compared to various databases to enable data interpretation (e.g. prioritization). You will receive information about the chromosomal position of a detected variant, its functional classification, position and sequence alterations (always referring to the most affected transcript), as well as information about the frequency of the variant in the total population. Position specifications in HGVS nomenclature allows for direct transfer of information, e.g. for publications.
  • Detailed project report
    Our project report includes information about the results of sample quality controls, kits and protocols used in the laboratory, as well as the programs and versions used during data processing. You will also receive statistical information about detected variants.
  • Optional: When receiving samples from several family members, we perform a filtering for different inheritance models. These include identifying de novo mutations from a trio, identifying compound-heterozygous variants from a trio, and identifying common variants for multiple affected family members.
Further information

Data transmission: Data will be avalible for download (secured in-house server).

Processing time: 6-8 weeks

Sample storage: The remaining DNA will be stored for 3 months after data delivery.

Start your project now

Please do not hesitate to contact us – we are happy to design an individual concept for your project.

If possible please provide us with information about sample material, number of samples and preferred sequencing depth.

1 + 4 = ?