Hi-C sequencing is a next-generation sequencing (NGS) method used to study the three-dimensional (3D) structure of genomes by analyzing the arrangement and interactions of chromatin within the nucleus.
It helps in understanding how DNA is organized inside the nucleus and how it functions.
This method was first introduced by Erez Lieberman Aiden in 2009.
It provides detailed information about the interactions between different regions of the genome.
Hi-C sequencing is widely used in studying gene expression, gene regulation, and chromatin folding.
The genome of most organisms is extremely long when fully stretched out but must be compacted to fit into the tiny space of the nucleus or cellular structure.
This compaction is made possible because DNA associates with different proteins to form a complex called chromatin, which is organized in a regulated 3D structure.
Understanding how DNA is arranged and interacts inside the cell is important for many biological processes such as DNA replication, cell division, and transcriptional regulation.
Two main methods are used to study DNA organization: microscopy and molecular methods.
Microscopy can show the shape and position of chromosomes in a single cell, but it cannot identify specific DNA sequences.
Molecular methods like 3C and Hi-C sequencing offer more detailed information about how different parts of the genome interact based on their 3D organization.
Principle of Hi-C Sequencing
Hi-C sequencing works by studying how different regions of DNA physically interact with each other.
Because the genome is folded and compacted in the nucleus, distant DNA regions can come into close proximity.
Hi-C sequencing identifies these interacting regions by chemically crosslinking them, cutting the DNA, and then reconnecting the pieces.
This process creates hybrid DNA fragments that are sequenced and analyzed to study chromatin interactions.
The Hi-C sequencing process starts with DNA crosslinking to preserve the spatial interactions of chromatin.
The crosslinked DNA is then digested using restriction enzymes to cut it into fragments.
The ends of these fragments are made blunt and labeled with biotin.
DNA fragments are ligated, forming hybrid fragments where interacting regions are joined together.
These hybrid DNA fragments are then prepared for sequencing by cutting them into smaller pieces.
Biotinylated fragments are selectively enriched using streptavidin-coated magnetic beads.
The enriched fragments are amplified using PCR (polymerase chain reaction).
Finally, the amplified DNA is sequenced to identify which regions of the genome were physically interacting within the nucleus.
Process of Hi-C Sequencing
Crosslinking:
The first step in Hi-C sequencing is treating cells with formaldehyde to crosslink DNA regions that are physically close in 3D space.
Formaldehyde preserves the interactions between different genome regions and maintains the natural 3D structure of chromatin.
Cell Lysis and Digestion:
After crosslinking, cells are lysed to release chromatin.
The sample is washed to remove non-crosslinked proteins and then digested using sequence-specific restriction enzymes.
Restriction digestion with endonucleases cuts DNA at specific sites, producing sticky ends or overhangs.
Biotin Labeling:
The DNA ends are repaired, and biotin-labeled bases are added during this repair step.
Biotin acts as a marker that helps in capturing and isolating ligated DNA fragments in later steps.
Proximity Ligation:
The labeled DNA fragments are ligated together to form hybrid DNA molecules containing sequences from different genomic regions.
This ligation captures long-range interactions between distant DNA regions within the nucleus.
Reverse Crosslinking and Protein Degradation:
After ligation, the crosslinking is reversed, and proteins associated with DNA are degraded, leaving behind purified DNA fragments.
Library Preparation:
The ligated DNA is further fragmented into smaller pieces suitable for library preparation.
Biotin-labeled fragments are isolated using magnetic beads coated with streptavidin, enriching for interaction-relevant DNA.
These fragments are then amplified using PCR and prepared for sequencing.
Sequencing:
The selected DNA fragments are sequenced using NGS platforms such as Illumina.
Paired-end sequencing is performed to sequence both ends of each DNA fragment, helping to identify interacting genomic regions.
Data Analysis:
Hi-C data analysis consists of preprocessing and downstream analysis.
Preprocessing starts with FASTQ files and includes removing low-quality reads and mapping reads to a reference genome.
Each read is aligned individually since they often originate from different parts of the genome.
Reads are filtered to eliminate noise and experimental artifacts.
The reads are grouped into genomic bins and normalized to correct for biases like uneven sequencing coverage.
Downstream analysis involves extracting biological insights such as identifying chromatin compartments, TADs (topologically associated domains), and chromatin loops.
It also detects structural genome variations like rearrangements and provides a detailed understanding of 3D genome organization.
Chromosome Conformation Capture (3C)
Chromosome Conformation Capture (3C) is a genomic proximity ligation technique used to study interactions between any two specific DNA loci using PCR-based detection methods.
The 3C process includes cross-linking DNA and proteins to preserve chromatin structure, cutting DNA with restriction enzymes, and re-ligating the DNA fragments.
The ligated DNA products, which represent interacting regions, are detected by PCR or sequencing.
A major limitation of 3C is its low throughput—it can only study interactions between two predefined loci at a time and is not suitable for genome-wide analysis.
To overcome this, several improved versions of 3C were developed, including 4C, 5C, and Hi-C, which analyze interactions at broader genomic scales.
All 3C-based methods share the same initial steps (crosslinking, digestion, and ligation) but differ in how they detect DNA interactions.
4C (Circular Chromosome Conformation Capture) identifies genome-wide interactions for a specific region of interest.
In 4C, the ligation products undergo a secondary digestion to further fragment DNA, forming circularized DNA molecules.
Inverse PCR is then used to amplify the circular fragments, followed by sequencing to identify interacting regions.
5C (Chromosome Conformation Capture Carbon Copy) allows the study of interactions among multiple genomic regions simultaneously.
It has higher throughput than 3C and 4C due to its use of multiplexed ligation-mediated amplification for detection.
Hi-C (High-throughput Chromosome Conformation Capture) is a genome-wide extension of 3C that captures all possible chromatin interactions across the genome.
In Hi-C, DNA fragments are labeled with biotin before sequencing, allowing for the isolation of ligated junctions and high-resolution interaction mapping.
Advantages of Hi-C Sequencing
Hi-C sequencing provides high efficiency and accurately identifies the organization of genomic sequences.
It offers detailed information about chromatin interactions and the structural arrangement of the genome.
Hi-C has high resolution and can detect both long-range and short-range chromatin interactions.
It can reveal interactions between distant DNA regions that are not adjacent in the linear DNA sequence.
Hi-C can detect complex structural features such as Topologically Associating Domains (TADs) and chromatin loops, which are essential for understanding gene expression and overall genome function.
Limitations of Hi-C Sequencing
Hi-C requires a large amount of starting material due to its multiple experimental steps, including crosslinking, digestion, and ligation.
Hi-C data analysis is complex and challenging, involving multiple steps such as read alignment, pairing, filtering, and normalization.
Since Hi-C is performed on fixed cells, it cannot track dynamic changes in chromatin over time or between individual cells.
Although single-cell Hi-C methods exist, the data is often incomplete and makes it difficult to detect rare chromatin interactions.
Hi-C captures only pairwise interactions because it relies on ligation between two DNA regions, limiting its ability to study complex, multi-way chromatin structures.
Hi-C data is prone to biases introduced during sequencing and experimental steps, which can affect data interpretation.
While Hi-C theoretically offers high resolution, achieving this requires very deep sequencing due to the vast number of possible interactions—making it costly and often impractical, which may reduce resolution in actual experiments.
Applications of Hi-C Sequencing
Hi-C sequencing is useful for understanding the 3D structure of genomes in various organisms, helping reveal how DNA is organized inside the nucleus.
Hi-C can be used to compare the 3D genome structures of different samples, providing insights into genome structure changes in different biological conditions and offering evolutionary information.
It helps in identifying A/B genomic compartments—with Compartment A being generally active and linked to gene expression, and Compartment B being inactive and associated with gene silencing.
Hi-C can be used to study interactions between genes and repetitive sequences in the genome, helping understand how repetitive DNA elements influence gene expression.
It is used to study chromatin changes related to diseases, aiding in the understanding of disease mechanisms and the identification of potential therapeutic targets.
Hi-C sequencing helps identify the hierarchical organization of chromatin, from large A/B compartments to smaller units like topologically associated domains (TADs) and chromatin loops, which is important for understanding genome interactions and gene regulation.
References
3C sequencing (Hi-C sequencing) – CD Genomics. (n.d.). Retrieved from https://rna.cd-genomics.com/chromosome-conformation-capture-sequencing-3c-seq.html
Barutcu, A. R., Fritz, A. J., Zaidi, S. K., van Wijnen, A. J., Lian, J. B., Stein, J. L., Nickerson, J. A., Imbalzano, A. N., & Stein, G. S. (2016). C-ing the Genome: A Compendium of Chromosome Conformation Capture Methods to Study Higher-Order Chromatin Organization. Journal of cellular physiology, 231(1), 31–35. https://doi.org/10.1002/jcp.25062
Belton, J., McCord, R. P., Gibcus, J. H., Naumova, N., Zhan, Y., & Dekker, J. (2012). Hi–C: A comprehensive technique to capture the conformation of genomes. Methods, 58(3), 268–276. https://doi.org/10.1016/j.ymeth.2012.05.001
Hi-C Sequencing Data Analysis: Introduction, methods, and Protocol – CD Genomics. (n.d.). Retrieved from https://bioinfo.cd-genomics.com/hi-c-sequencing-data-analysis-introduction-methods-and-protocol.html
Hi-C/3C-Seq/Capture-C. (n.d.). Retrieved from https://www.illumina.com/science/sequencing-method-explorer/kits-and-arrays/hi-c-3c-seq-capture-c.html
Lieberman-Aiden, E., Van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369
Overview of Hi-C sequencing – CD genomics. (n.d.). Retrieved from https://www.cd-genomics.com/resource-hic-sequencing.html
Pal, K., Forcato, M., & Ferrari, F. (2018). Hi-C analysis: from data generation to integration. Biophysical Reviews, 11(1), 67–78. https://doi.org/10.1007/s12551-018-0489-1
Smith, C. (September 8, 2022). Chromatin analysis methods. Retrieved from https://www.biocompare.com/Editorial-Articles/589725-Chromatin-Analysis-Methods/
Van Berkum, N. L., Lieberman-Aiden, E., Williams, L., Imakaev, M., Gnirke, A., Mirny, L. A., Dekker, J., Lander, E. S. (2010). Hi-C: a method to study the three-dimensional architecture of genomes. Journal of Visualized Experiments, (39), 1869. https://doi.org/10.3791/1869
Erez Lieberman-Aiden et al. ,Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome.Science326,289-293(2009).DOI:10.1126/science.1181369.