Non-coding RNA in Eukaryotes

Finding functional loci within an ocean of non-functional DNA

Project Summary.

Led by: Wilfried Haerty

Protein coding genes occupy about 1% of the human genome, among the remaining non coding part of our genome, between 7% and 9% of the sequences are also likely to be functional owing to their conservation across species and significant evidence for selection. These non-coding functional sequences, scattered among an ocean of non-functional DNA are known to be significantly enriched in variants directly associated with traits and disease in human and other model organisms and are generally associated with gene expression regulation. Many of these sequences are transcribed into long non-coding RNAs (lncRNAs) and to date tens of thousands of lncRNAs have been annotated in both animals and plants. Only a handful of these loci have been characterized so far, some of which proved to be fundamental to an organism biology, but little is known for the vast majority of these lncRNAs including the proportion of those loci that are truly functional and important for an organism biology.

Our work focuses on using computational approaches to annotate and characterize lncRNAs features in order to identify those that are likely functional. Through the use of comparative genomics and population genetics we aim to identify the potential impact of mutations within these loci and through collaborations experimentally validate these loci.

Impact statement.

The identification of functional noncoding sequences within any genomes is of primary importance to understand how genes are regulated but most importantly to decipher the impact of mutations within these sequences on an organism biology. Thus far tens of thousands of long noncoding RNAs have been annotated among within eukaryotic genomes. Our work focuses on identifying potentially functional lncRNAs through the integration of large scale multi-’omic’ data sets (RNA-Seq, CAGE-Seq, ChIP-Seq, single nucleotide variants, genome sequences, variants from genome-wide association studies ) in human, fishes, worm and plant allowing the prioritization of loci for experimental validation using genetic engineering. This workflow allows to identify functional loci and to investigate the fitness impact of mutations within these lncRNAs.