Starting point of DNA replication mystery solved

12 May 2025
Image
DNA illustration on blue background, double helix made of tiny blue dots.

The question of where DNA replication starts in the human genome has finally been addressed, with the conclusion it is largely random.

Research from the Earlham Institute and University of Oxford generated enormous long-read DNA sequences and used halogenated nucleotides to reveal locations where the double-stranded helix teases apart to begin copying, with around 80 per cent of the sites having never been reported before.

As replication start sites are thought to be hotspots for copying errors and mutations, the findings are likely to impact future research into genetic diseases, such as cancer.

Creating an accurate copy of the genome - the entire length of DNA in a cell - is the fundamental step before cells can divide, which is the basis of all life on Earth. Errors in that duplication process lead to genetic changes that underpin evolution, but also give rise to diseases.

There has been a long-standing question about where exactly the process of replication begins in the human genome.

Dr Conrad Nieduszynski, study author and group leader at the Earlham Institute, said: “This is important because identifying these sites is a first step in understanding how genetic changes arise - be they beneficial or deleterious.”

DNA exists as a double helix made of four nucleotide bases - Adenine, Cytosine, Guanine, and Thymine. They pair up such that an Adenine will always be connected to a Thymine, and Cytosine with Guanine.

When replication begins, a team of molecules ‘reads’ the DNA bases on the strand they are copying and fishes out the corresponding base from the surroundings.

The researchers were able to record the process of replication in the DNA itself using a modified DNA base called BrdU.

When it’s present, the replication machinery can chooses BrdU over Thymine, leaving a clear signature in the DNA that can be seen by using long-read sequencing approaches.

“This gives us the dynamics of how a single DNA molecule has been replicated based on the pattern of where BrdU appears,” said Dr Nieduszynski.

The group found starting points in previously hypothesised regions of the genome - areas where they expected to see DNA replication originating. But around 80 per cent of the starting points were dispersed across the genome, suggesting the process is largely random.

Previous approaches have generally relied on averaging data for millions of cells, which means potential start sites can be easily hidden or overlooked.

“Our approach looks at single DNA molecules from individual cells,” explained Dr Nieduszynski.

“When we looked at the replication start sites we’d identified, we discovered only 20 per cent - a fifth of them - had been reported previously. The majority were newly discovered with this approach.

“Yet this 80 per cent are responsible for replicating most of the genome, so they’re really important.”

In order to understand the process of replication across the whole genome, the group needed to generate extremely long reads of DNA.

A read is essentially an unbroken stretch of DNA that has been recorded, base by base, by a sequencing platform.

Creating reads that were long enough involved overcoming a number of technical hurdles and optimisations from the Technical Genomics Group at the Earlham Institute. In the end, they were able to produce reads up to around 800,000 bases in length.

“Long reads are generally somewhere in the tens of thousands of bases long,” explained Dr Karim Gharbi, Head of Technical Genomics at the Earlham Institute. “By generating reads in the hundreds of thousands of bases, we’ve been able to map out the replication start sites in unprecedented detail.

Errors in DNA replication are the cause of a number of health issues, from cancer to premature ageing. Learning more about where replication starts and stops could reveal new insights about how errors occur - and potential strategies to prevent or correct them.

The evolutionary reason for seemingly random - rather than concentrated - sites for replication to begin is unclear.

“There are vast areas of the genome that don’t seem to code for anything, but they’re still copied,” said Dr Nieduszynski.

“It’s possible the 20 per cent of sites previously reported represent important regions where you want to bring in more careful regulation.

“That could mean you see the tightest regulation over replication start sites in the most delicate regions of the genome - places where copying errors could be particularly harmful, or even fatal.”

This work was supported by the Biotechnology and Biological Sciences Research Council (BBSRC), part of UKRI, through core funding for the Earlham Institute and funding for the strategic research programme Cellular Genomics.

Sequencing and analysis was enabled by Transformative Genomics, a BBSRC-supported National Bioscience Research Infrastructure at the Earlham Institute.

Notes to editors.

The paper "Most human DNA replication initiation is dispersed throughout the genome with only a minority within previously identified initiation zones" is published in Genome Biology.

 

About the Earlham Institute

The Earlham Institute is a hub of life science research, training, and innovation focused on understanding the natural world through the lens of genomics.

Embracing the full breadth of life on Earth, our scientists specialise in developing and testing the latest tools and approaches needed to decode living systems and make predictions about biology.

The Earlham Institute is based within the Norwich Research Park and is one of eight institutes that receive strategic funding from the Biotechnology and Biological Sciences Research Council (BBSRC), part of UKRI, as well as support from other research funders.

Earlham Institute / earlhaminst.bsky.social