Genome analysis

Genome analysis

Surveying the genome for exposed regions accessible for active transcription vs those bound tightly into heterochromatin can be an essential first step to understanding the relationship between chromatin structure and function in different contexts. To take a snapshot of genomic architecture, researchers may use one of three methods: DNase-seq, MNase-seq, or ATAC-seq. DNase-seq and ATAC-seq map exposed regions of DNA, whereas MNase-seq maps regions protected by nucleosomes. It is important to keep in mind that these methods provide snapshots of a dynamic process, often averaged across thousands of cells. If a particular region is dynamically changing, or different between cells within the population, the data may seem conflicting between methods. Some single-cell analysis methods are evolving to resolve these challenges.



DNase-seq uses DNase to digest exposed regions of the genome, whereas nucleosome-bound DNA is protected from DNase digestion. The small fragments generated by DNase digestion are then sequenced and mapped to the genome to identify regions of active transcription.



•Most established and practiced method

•DNase cutting bias is well-understood

•Can be adapted to inversely examine protected genomic regions, called DNase footprinting, to identify transcription factor and nucleosome binding sites. However, it is important to use naked DNA as a control for such experiments as DNase I cutting bias can lead to false conclusions.

•Possible to adapt for single-cell analysis 



•Technically difficult to master, especially in optimizing digestion conditions for a given cell type/number

•Requires millions of cells, and may be challenging for analysis of rare patient samples



In contrast to DNase-seq, MNase-seq uses micrococcal nuclease (MNase), from Staphylococcus aureus, to digest exposed genomic regions. Protected DNA bound to nucleosomes is then recovered and sequenced.


•Common and well established in many cell types of many species, from yeast to humans, with some standardization of digestion and data analysis.

•Can be used in combinations with chromatin immunoprecipitation (ChIP-seq), to study regulatory factors that bind to nucleosomes 

•Can be adapted to generate base-pair resolution mapping 

•Can be adapted to examine nucleosome positioning and DNA methylation state in  nucleosome occupancy and methylome sequencing (NOMe-seq) 



•Requires large numbers of cells (10–20 million)

•Sequence-specific bias in the digestion of AT-rich regions (although most enzymes used in chromatin accessibility assays exhibit similar biases), but also unknown biases that may skew results

•Single-cell analysis not possible yet



Established in 2013, the assay for transposase-accessible chromatin (ATAC)-seq inserts sequencing adapters directly into accessible DNA using the enzyme Tn5 transposase. The DNA between the adapters is then amplified with qPCR and sequenced. ATAC-seq uses a mutant hyperactive Tn5 transposase that is preloaded with DNA adaptors to simultaneously fragment and tag the genome with sequence adaptors (a process called tagmentation). PCR amplification and NGS follow this fragmentation and tagging. The frequency of sequences in a region correlates with open chromatin conformation. 



•Easiest method: no sonication, phenol-chloroform extraction, antibodies (ChIP-seq), or enzymatic digestion (DNase-seq, MNase-seq) are required

•Fastest method: <3 hours compared to up to a 4-day protocol

•Best signal-to-noise ratio

•Only 50,000 or fewer cells required (500–50,000 recommended)

•Single-nucleotide resolution possible

•Single-cell analysis is possible with adapted protocols utilizing flow cytometry/microfluidics 



•More expensive, requires a kit from Illumina (Nextera DNA Library Preparation Kit)

•Least established method and requires optimization of cell number and lysis conditions for specific cell types, tissues, and organisms to achieve ideal fragment distributions

• Cell number defines the quality of the data, with too few cells or too many cells resulting in over- or under-transposition that can skew results


Figure 2: ATAC-seq protocol. Our step-by-step guide to ATAC seq can be found here


Chromosome conformation techniques 

We can assess the three-dimensional chromatin architecture with chromatin contact mapping to reveal physical interactions between distant genomic regions. This type of mapping is made possible by the advent of chromatin conformation capture (3C) and subsequent methods developed based on this approach. Each of these approaches has particular strengths for particular applications, but selecting a method for a specific purpose can be challenging due to the sheer variety of methodologies.

Figure 3: Chromosome conformation techniques. Various steps of 3C, 4C, 5C, ChIA-PET, and Hi-C.


Chromatin conformation capture (3C)

3C uses formaldehyde cross-linking to lock the three-dimensional chromatin structure in place, followed by restriction enzyme digestion. Excised DNA fragments are then analyzed by qPCR and sequencing to identify where distant DNA regions are connected. This approach for analyzing 3D chromatin structure and interactions in vivo was first developed in 2002 (Dekker et al., 2002), and has since become the foundation for a host of related techniques that have been developed to achieve greater scale, throughput, or specificity.​​​


​Circularized chromosome conformation capture (4C)

4C enables identification of previously unknown DNA regions that interact with a locus of interest, which makes 4C ideal for discovering novel interactions within a specific region (Dekker et al., 2006).​​


4C helpful hints

Choose the right restriction enzymes. More frequent cutters (ie four bp recognition sites) are better for local interactions between the region of interest and nearby sequences on the same chromosome (van der Werken et al., 2012).


Optimize cross-linking. Lower formaldehyde concentrations promote undesirable region-of-interest self-ligations, but also prevent DNA "hairballs" that hinder restriction enzyme cutting. High formaldehyde concentrations lower self-ligation events but increase hairballs. An optimal formaldehyde concentration should be chosen for the specific experimental situation to balance these considerations. 1% formaldehyde treatment for 10 min is a good starting point for most experiments (van der Werken et al., 2012).


Carbon copy chromosome conformation capture (5C)

5C generates a library of any ligation products from DNA regions that associate with the target loci, which are then analyzed by NGS. 5C is ideal when great detail about all the interactions in a given region is needed, for example when diagramming a detailed interaction matrix of a particular chromosome. However, 5C is not truly genome-wide, since each 5C primer must be designed individually, so it is best suited to a specific regions (Dotsie and Dekker, 2007).


5C helpful hints

Select the right restriction enzyme. Choosing an enzyme that functions efficiently under your specific experimental conditions is essential. For example, BamHI is not recommended for most experiments due to inefficiency under 3C conditions (Dotsie et al., 2007).


Optimize primer design. 5C uses two primers: a forward 5C primer that binds upstream of the ligation site, and a reverse primer that binds immediately downstream. Primer length should be adjusted so that the annealing temperature is about 65°C to allow primers to anneal exactly with their restriction fragments. Ensure that 5C primers are synthesized with a phosphate at the 5' end for ligation.


Use a control template. This will control for differences in primer efficiency. A control library constructed from the entire genomic region under study is recommended. If this library is not constructed, then researchers should be aware that interaction frequencies would be less precise.

Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET)

ChIA-PET takes aspects of ChIP and 3C to analyze the interplay of distant DNA regions through a particular protein.

ChIA-PET is best used for discovery experiments involving a protein of interest and unknown DNA binding targets. Transcription factor binding sites, for example, are best studied with ChIA-PET since this technique requires the DNA to be bound by the transcription factor in vivo for the interaction to be called (Fullwood et al., 2009).


ChIA-PET helpful hints

Overlap PET tags to reduce background. Like most 3C technologies, background noise is a technical challenge. In ChIA-PET particularly, noise can make it difficult to find long-range interactions with the locus of interest. A useful tip to overcome this is to require PETs to overlap at both ends of the region to be a long-range interaction.


ChIP-loop is a mix of ChIP and 3C that employs antibodies targeted to proteins suspected to bind a DNA region of interest. ChIP-loop is ideal to find out if two known DNA regions interact via a protein of interest. ChIP-loop is also well suited to confirmation of suspected interactions, but not the discovery of novel ones (Horike et al., 2005).


ChIP-loop helpful hints

Avoid non-native loops. The biggest issue encountered with ChIP-loop is the formation of non-native loops forming during DNA concentration before ligation occurs. A simple way to avoid this is to choose a protocol that performs the precipitation after the ligation step (Simons et al., 2007).


Validate ChIP-loop interactions. Another challenge in ChIP-loop can be accurate quantitation of ligation products. 3C technologies, especially ChIP-loop, often capture random interactions. To combat this, consider performing a ChIP experiment in parallel and using it to validate the ChIP-loop interactions. If a DNA-protein-DNA interaction identified by ChIP-loop is indeed real, then both DNA-protein interactions should also appear in the ChIP data (Simons et al., 2007).


Hi-C amplifies ligation products from the entire genome and assesses their frequencies by high-throughput sequencing. Hi-C is a great choice when broad coverage of the entire genome is required, and the resolution is not of great concern, mapping the genome-wide changes in chromosome structure in tumor cells (Lieberman-Aiden et al., 2009), for example.


Hi-C helpful hints

Optimize library amplification. Hi-C library amplification must generate enough product for analysis, while avoiding PCR artefacts. To do this, the PCR cycle number should be optimized (in the range of 9–15 cycles). If enough product cannot be produced (50 ng of DNA), multiple PCR reactions should be pooled rather than the cycle number increased, five reactions are usually sufficient (Belton et al., 2012).

Balance read lengths. As with any sequence experiment, high-quality reads are paramount. The read length must be optimal to balance the need for long reads to map interactions, but not too long as to pass through the ligation junction into the partner fragment. Therefore, 50 bp reads are optimal in most cases (Belton et al., 2012).

Choose an appropriate bin size. This is critical for data analysis. Bin size should be inversely proportional to the number of expected interactions in a region. Use smaller bins for more frequent intra-chromosomal interactions and larger bins for less frequent inter-chromosomal interactions (Belton et al., 2012).



Capture-C uses a combination of 3C and oligonucleotide capture technology (OCT), together with high-throughput sequencing to study hundreds of loci at once. Capture-C is ideal when both high resolution and genomic-wide scale are required. For example, analyzing the functional effect of every disease-associated SNP in the genome on local chromatin structure (Hughes et al., 2014).


Capture-C helpful hints

Carefully choose probe positions. It's best to position probes close to the restriction enzyme sites, even overlapping when possible (Hughes et al., 2014).

Keep libraries complex. Maintaining library complexity is the top priority. A complex library means more high-quality interactions in the output. For this reason, anything that could decrease library complexity should be avoided, such as a Hi-C biotin capture (Hughes et al., 2014).

Watch for false interaction in duplicated regions. The mapping process can stimulate strong interactions between these regions (such as pseudogenes) that are actually artefacts (Hughes et al., 2014)




Belton JM, McCord RP, Gibcus JH, Naumova N, Zhan Y and Dekker J (2012). Hi-C: a comprehensive technique to capture the conformation of genomes. Methods, 58, 268-76.


Dekker J, Rippe K, Dekker M and Kleckner N (2002). Capturing chromosome conformation. Science, 295, 1306-1311.


Dekker J. (2006). The three ‘C’ s of chromosome conformation capture: controls, controls, controls. Nat Methods, 3, 17-21.


Dostie J and Dekker J (2007). Mapping networks of physical interactions between genomic elements using 5C technology. Nat Protoc, 2, 988-1002.


Dostie J, Zhan Y and Dekker J (2007). Chromosome conformation capture carbon copy technology. Curr Protoc Mol Biol, Chapter 21, Unit 21.14.


Horike S, Cai S, Miyano M, Cheng JF and Kohwi-Shigematsu T (2005). Loss of silent-chromatin looping and impaired imprinting of DLX5 in Rett syndrome. Nat Genet, 37, 31-40.


Fullwood MJ, et al. (2009). An oestrogen-receptor-alpha-bound human chromatin interactome. Nature, 462, 58-64.


Lieberman-Aiden E, et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326, 289-293.


Hughes JR (August 2014). Email interview.


Hughes JR, et al. (2014). Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat Genet, 46, 205-212.


Simonis M, Kooren J and de Laat W (2007). An evaluation of 3C-based methods to capture DNA interactions. Nat Methods, 11, 895-901.


van de Werken H, de Vree PJ, Splinter E, Holwerda SJ, Klous P, de Wit E and de Laat W (2012). 4C technology: protocols and data analysis. Methods Enzymol, 513, 89-112