On the origin of errors: the causes and consequences of mistakes during DNA replication

While the basic idea is straightforward, the actual machinery governing the process of replication remains something of an enigma. We know it’s imperfect but we don’t know why. What causes mistakes to occur during the copying process, and are these errors predictable?

Professor Conrad Nieduszynski and his Group at the Earlham Institute are investigating how DNA replication impacts genome stability - work that could reveal the origins of genomic and phenotypic variation that can lead to death and disease.

All cells contain a complete copy of the organism's DNA, the genetic blueprint of life, packaged into chromosomes. Since any new cells need a copy of the genetic material, the chromosomes must be completely and accurately replicated before the cell can divide.

In reality, we’ve come to learn that complete and accurate replication is difficult to achieve. A variety of errors can and do occur. Some of these mistakes are easily tidied up by housekeeping proteins but, inevitably, others are missed.

Understanding how and why errors arise, and the resulting consequences for the organism is a key area of research for Professor Conrad Nieduszynski.

“We’re trying to achieve an element of predictability when it comes to replication errors,” explains Nieduszynski. “If you've got a particular number of DNA copies or bases in a row, why do they sometimes increase - becoming hundreds of copies and causing disease - and sometimes they're just fine?

“My Group does fundamental research but the potential applications are widespread, partly because we’re organism agnostic. We’re looking to understand where cellular pathways can go wrong and cause disease, which could lead to insights about cancer just as easily as plant health or the emergence of antimicrobial resistance.”

Prof Nieduszynski's Group at the Institute explores the impact of DNA replication on genome evolution and stability.

Pauses and breaks

Problems occur when the molecular machinery responsible for copying the DNA encounters an obstacle. This can cause things to slow down or briefly pause, which creates a narrow window of opportunity for mistakes to be made.

“The pausing of DNA replication leads to the accumulation of fragile, single-stranded DNA that is prone to base damage, slippage and double-strand breaks,” explains Nieduszynski. “Therefore, fork pausing is a major source of replicative errors, including point mutations, expansion/contraction of repeats, deletions and translocations.”

Although several checks and balances mean these obstacles normally get spotted and fixed during replication when things do go wrong the consequences can be catastrophic for the cell.

These rare but serious events are difficult to detect since most DNA replication is regular. Spotting them is something of a 'needle in a haystack' problem for researchers.

Nieduszynski’s Group has developed a high-throughput DNA sequencing technology that enables them to study the kinetics of DNA replication ‘in vivo’ on single molecules.

“This technology allows us to rapidly search for the 'needle in the haystack' and identify key molecular events, such as the slowing down or pausing of the DNA replication machinery,” he explains.

The Group is now applying this approach to determine what DNA sequences create obstacles to the DNA replication machinery, which protein factors assist in overcoming these hurdles, and how exactly pauses link to errors during the copying process.

“These approaches allow us to identify and characterise rare DNA replication mistakes, prioritising what determines and causes these mistakes and their resulting consequences,” says Nieduszynski.

“This is so important because a single DNA replication error on one chromosome in a single cell division has the potential to be harmless, lead to a subtle effect, or potentially be detrimental to the organism.”

Hiding in plain sight

Errors in DNA replication occur on single molecules in individual cells; however, these errors are hidden from view in genomic approaches that look at data from populations of several million cells.

“Recently, we developed the first single-molecule DNA sequencing method for the study of genome replication that can detect important events hidden in population data,” explains Nieduszynski.

“We’ve advanced high-resolution single-molecule approaches to analyse DNA replication, based on Oxford Nanopore sequencing and advanced computational analysis. This has allowed us to go in and look at a single molecule without needing amplification like usual genome sequencing methods.

“We’re bypassing the problem of heterogeneity created when there are lots of different molecules with a multitude of events happening at once. Now, we can simplify and assess what’s going on one molecule at a time. It’s still a high-throughput sequencing method that looks at hundreds of thousands of molecules in total but we've got the best of both worlds with the ultimate resolution.”

Their approach will allow them to locate and measure the duration of replication fork pausing sites throughout the genome. This will systematically determine the nature of naturally occurring 'difficult-to-replicate' sequences.

One particular obstacle they want to investigate is the barrier created by short tandem repeats, both at the level of replication fork progress and repeat copy number stability.

Alongside this, the Group will be cataloguing the role of accessory proteins that support replication as it navigates its way through these more challenging regions of DNA.

“Together, these experiments will provide the first high-resolution, whole-genome view of DNA replication fork progression on single molecules.”

DNA replication is the result of two molecules containing both new and inherited DNA code repeating in cell division.

Dr Conrad Nieduszynski, Group Leader in DNA Replication at the Earlham Institute

Technology development

A crucial benefit of Prof Nieduszynski conducting his research at the Earlham Institute is the ability to collaborate with scientists who are pushing the boundaries of the technology needed to address these challenges.

“With innovation over the last few years, we've been able to get to a point where we can start to study these processes on single molecules, rather than having to study them in populations of thousands or millions of molecules of cells,” says Nieduszynski.

“We’re carrying out our research with Nanopore sequencing, working alongside the Earlham Institute’s Genomics Pipelines Group and developing technology algorithms with the Leggett Group, while Post-doc Sathish Thiyagarajan in my Group is doing bioinformatics data analysis.

“As a next step, it would be really exciting for us to combine studying at a genomic scale with single cells from the Macaulay Group’s research and single molecules with my lab - as each has its strengths for different types of biological questions.”

Fast changes

Understanding, and perhaps one day predicting, the emergence of replication errors has potential in the treatment and prevention of human diseases. Thanks to the organism agnostic approach, the research could eventually lead to the use of single-molecule analysis for genotyping.

This has the potential to help in areas of crop development, where applying these methods to study genome variation in different isolates of food staples - such as wheat and bananas - could result in findings of industrial or commercial value as Agri breeding tools.

“Normally, we think of evolution as a very slow process and therefore genetic changes are at a similar pace,” says Nieduszynski. “Although, when disease occurs in the duration of someone's lifespan, these genetic changes happen fast.

“That's our motivation - to expect the equivalent happening in natural populations or in agricultural populations where there could be important phenotypes that impact the organism’s health.”

On the origin of errors: the causes and consequences of mistakes during DNA replication

Pauses and breaks

Hiding in plain sight

Technology development

Fast changes

Related reading.

5 ways single-cell genomics can transform our understanding of biology

Could long-read RNA sequencing be the future of drug discovery?

What's in a genome?