
Research from the Earlham Institute and University of Oxford generated enormous long-read DNA sequences and used halogenated nucleotides to reveal locations where the double-stranded helix teases apart to begin copying, with around 80 per cent of the sites having never been reported before.
As replication start sites are thought to be hotspots for copying errors and mutations, the findings are likely to impact future research into genetic diseases, such as cancer.
Creating an accurate copy of the genome - the entire length of DNA in a cell - is the fundamental step before cells can divide, which is the basis of all life on Earth. Errors in that duplication process lead to genetic changes that underpin evolution, but also give rise to diseases.
There has been a long-standing question about where exactly the process of replication begins in the human genome.
Dr Conrad Nieduszynski, study author and group leader at the Earlham Institute, said: “This is important because identifying these sites is a first step in understanding how genetic changes arise - be they beneficial or deleterious.”
DNA exists as a double helix made of four nucleotide bases - Adenine, Cytosine, Guanine, and Thymine. They pair up such that an Adenine will always be connected to a Thymine, and Cytosine with Guanine.
When replication begins, a team of molecules ‘reads’ the DNA bases on the strand they are copying and fishes out the corresponding base from the surroundings.
The researchers were able to record the process of replication in the DNA itself using a modified DNA base called BrdU.
When it’s present, the replication machinery can chooses BrdU over Thymine, leaving a clear signature in the DNA that can be seen by using long-read sequencing approaches.
“This gives us the dynamics of how a single DNA molecule has been replicated based on the pattern of where BrdU appears,” said Dr Nieduszynski.
The group found starting points in previously hypothesised regions of the genome - areas where they expected to see DNA replication originating. But around 80 per cent of the starting points were dispersed across the genome, suggesting the process is largely random.
Previous approaches have generally relied on averaging data for millions of cells, which means potential start sites can be easily hidden or overlooked.
“Our approach looks at single DNA molecules from individual cells,” explained Dr Nieduszynski.
“When we looked at the replication start sites we’d identified, we discovered only 20 per cent - a fifth of them - had been reported previously. The majority were newly discovered with this approach.
“Yet this 80 per cent are responsible for replicating most of the genome, so they’re really important.”