DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery. Among others, DNA sequencing is used for identifying:

  • The complete DNA sequence of an organism’s genome (Whole Genome Seq, WGS)
  • The precise DNA sequencing of all the protein-coding genes in a genome (Whole Exome Seq, WES) or of a specific group of genes (Gene Panel Seq)
  • The binding sites of a DNA-associated protein (ChIP-seq)
  • The DNA regions in the genome associated with regulatory activity (DNAse-seq, FAIRE-seq)
  • The sequences that act as transcriptional enhancers (STARR-seq)

In principal, DNA molecules are end-repaired and tagged with specific “barcoded” oligo adapters. Several DNA libraries are usually pooled (multiplexing) and sequenced in a single assay. A read (or 2 reads in paired-end) is produced for every DNA fragment and the total number of reads for each sample are compiled and then aligned against a reference sequence.

Several DNA-Seq applications can be carried out in GGC, each requiring different types of library preparation and sequencing modes. For WGS and WES paired-end, sequencing with long reads (150bp) is usually performed at variable depths (20-100x) depending on the application. For ChIP-seq, FAIRE-seq, DHS-seq, single-end sequencing (75bp) is usually performed but this may change depending on the requirements of each specific project.

Whole Genome Seq (WGS)

The most comprehensive method for analyzing the genome. It has largely been used as a research tool but the utility of patients’ WGS in a clinical setting is actively being researched; either in personalized medicine or as an important tool to guide therapeutic intervention. Common WGS applications are:

  • Identification of all the variants of the studied organism (single-nucleotide polymorphisms (SNPs) or indels)
  • De novo sequencing of
  • Identification of novel disease-related variants

Whole Exome Seq (WES)

Targeted sequencing of all the exons (the protein-coding regions of the human genome). Although exons represent about 2% of the genome, they contain ~85% of known disease-related variants. As WES focuses only in exons, it is mainly used for:

  • Finding mutations in genes already known to cause disease
  • Identifying novel disease-related gene mutations
  • Personalized medicine
  • Predictive testing of at-risk family members

Gene Panel Seq

Targeted gene sequencing containing a select set of genes or gene regions that have known or suspected associations with a disease or a phenotype under study. Gene panels can be purchased with predisigned content (cancer, inherited disorders, cardiac conditions, autism etc.) or custom designed to include genomic regions of interest. Compared to broader approaches such as Whole Genome/Whole Exome Sequencing using Gene Panel Seq provides the user:

  • Lower cost per each sample
  • More samples per sequencing
  • Smaller, more manageable data set making analysis easier
  • It is usually used to achieve higher depth (i.e. 500x–1000x or higher), allowing identification of variants at low allele frequencies (down to 5%)

Chromatin Immunoprecipitation Sequencing (ChIP-seq)

In chromatin immunoprecipitation (ChIP) protocols DNA-bound protein is immunoprecipitated using a specific antibody. The bound DNA is then coprecipitated, purified, and sequenced. The massively parallel sequencing of the ChIP-Seq allows the user to identify the binding sites of the protein of interest across the whole genome without using any a-priori hypothesis.

DNase I hypersensitive sites Sequencing (DNase-seq)

DNase-seq provides an accurate representation of the location of regulatory proteins in the genome. DNA-protein complexes are treated with DNase l, followed by DNA extraction and sequencing. DNase-seq signal is higher at promoter regions, and DNase-seq has been shown to have better sensitivity than FAIRE-seq even at non-promoter regions.

Formaldehyde-Assisted Isolation of Regulatory Elements Sequencing (FAIRE-seq)

Like DNase-seq FAIRE-seq is used to identify areas of open chromatin in the genome but, due to differences in the protocol vs the DNase-seq, it can be applied on any cell type. FAIRE-seq is based on the fact that the formaldehyde cross-linking is more efficient in nucleosome-bound DNA than it is in nucleosome-depleted regions of the genome. This method then segregates the non-cross-linked DNA that is usually found in open chromatin, which is then sequenced.

Self-Transcribing Active Regulatory Region Sequencing (STARR-seq)

A method to assay enhancer activity in a direct, quantitative, and genome-wide manner. DNA fragments are cloned downstream of a core promoter and into the 3′ UTR of a reporter gene. Active enhancers will transcribe themselves and become part of the resulting reporter transcripts that will be used to create a reporter library that will then be sequenced. STARR-seq’s main use is enhancer identification, validation and characterization, DNA sequencing of active enhancers and identification of dynamic changes of enhancer activity induced by signaling.