Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
Nat Commun ; 15(1): 5278, 2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38937428

ABSTRACT

Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we develop CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5' capped, full-length transcripts. In our study, we evaluate the performance of CapTrap-seq alongside other widely used RNA-seq library preparation protocols in human and mouse tissues, employing both ONT and PacBio sequencing technologies. To explore the quantitative capabilities of CapTrap-seq and its accuracy in reconstructing full-length RNA molecules, we implement a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation. Our benchmarks, incorporating the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP) data, demonstrate that CapTrap-seq is a competitive, platform-agnostic RNA library preparation method for generating full-length transcript sequences.


Subject(s)
Gene Library , Sequence Analysis, RNA , Animals , Humans , Mice , Sequence Analysis, RNA/methods , High-Throughput Nucleotide Sequencing/methods , RNA/genetics , RNA Caps/genetics
3.
bioRxiv ; 2023 Jun 18.
Article in English | MEDLINE | ID: mdl-37398314

ABSTRACT

Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we developed CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5'capped, full-length transcripts, together with the data processing pipeline LyRic. We benchmarked CapTrap-seq and other popular RNA-seq library preparation protocols in a number of human tissues using both ONT and PacBio sequencing. To assess the accuracy of the transcript models produced, we introduced a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation in RNA spike-in molecules. We found that the vast majority (up to 90%) of transcript models that LyRic derives from CapTrap-seq reads are full-length. This makes it possible to produce highly accurate annotations with minimal human intervention.

4.
Life Sci Alliance ; 6(1)2023 01.
Article in English | MEDLINE | ID: mdl-36283702

ABSTRACT

Most mitochondrial proteins are encoded by nuclear genes, synthetized in the cytosol and targeted into the organelle. To characterize the spatial organization of mitochondrial gene products in zebrafish (Danio rerio), we sequenced RNA from different cellular fractions. Our results confirmed the presence of nuclear-encoded mRNAs in the mitochondrial fraction, which in unperturbed conditions, are mainly transcripts encoding large proteins with specific properties, like transmembrane domains. To further explore the principles of mitochondrial protein compartmentalization in zebrafish, we quantified the transcriptomic changes for each subcellular fraction triggered by the chchd4a -/- mutation, causing the disorders in the mitochondrial protein import. Our results indicate that the proteostatic stress further restricts the population of transcripts on the mitochondrial surface, allowing only the largest and the most evolutionary conserved proteins to be synthetized there. We also show that many nuclear-encoded mitochondrial transcripts translated by the cytosolic ribosomes stay resistant to the global translation shutdown. Thus, vertebrates, in contrast to yeast, are not likely to use localized translation to facilitate synthesis of mitochondrial proteins under proteostatic stress conditions.


Subject(s)
Genes, Mitochondrial , Zebrafish , Animals , Zebrafish/genetics , Mitochondrial Proteins/genetics , Mitochondrial Proteins/metabolism , Mitochondria/genetics , Mitochondria/metabolism , RNA, Messenger/genetics , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Nuclear Proteins/genetics
5.
Nucleic Acids Res ; 51(D1): D942-D949, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36420896

ABSTRACT

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Subject(s)
Computational Biology , Genome, Human , Humans , Animals , Mice , Molecular Sequence Annotation , Computational Biology/methods , Genome, Human/genetics , Transcriptome/genetics , Gene Expression Profiling , Databases, Genetic
6.
Elife ; 102021 07 20.
Article in English | MEDLINE | ID: mdl-34292154

ABSTRACT

Mitochondria are organelles with their own genomes, but they rely on the import of nuclear-encoded proteins that are translated by cytosolic ribosomes. Therefore, it is important to understand whether failures in the mitochondrial uptake of these nuclear-encoded proteins can cause proteotoxic stress and identify response mechanisms that may counteract it. Here, we report that upon impairments in mitochondrial protein import, high-risk precursor and immature forms of mitochondrial proteins form aberrant deposits in the cytosol. These deposits then cause further cytosolic accumulation and consequently aggregation of other mitochondrial proteins and disease-related proteins, including α-synuclein and amyloid ß. This aggregation triggers a cytosolic protein homeostasis imbalance that is accompanied by specific molecular chaperone responses at both the transcriptomic and protein levels. Altogether, our results provide evidence that mitochondrial dysfunction, specifically protein import defects, contributes to impairments in protein homeostasis, thus revealing a possible molecular mechanism by which mitochondria are involved in neurodegenerative diseases.


Subject(s)
Alzheimer Disease/metabolism , Cytosol/metabolism , Mitochondria/metabolism , Mitochondrial Proteins/metabolism , Protein Aggregates , Proteostasis , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Alzheimer Disease/genetics , Animals , Caenorhabditis elegans/genetics , Caenorhabditis elegans/metabolism , Caenorhabditis elegans Proteins/genetics , Caenorhabditis elegans Proteins/metabolism , Databases, Genetic , Heat-Shock Proteins/genetics , Heat-Shock Proteins/metabolism , Humans , Mitochondria/genetics , Mitochondrial Proteins/genetics , Molecular Chaperones/genetics , Molecular Chaperones/metabolism , Protein Transport , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics
7.
Methods Mol Biol ; 2254: 133-159, 2021.
Article in English | MEDLINE | ID: mdl-33326074

ABSTRACT

Metazoan genomes produce thousands of long-noncoding RNAs (lncRNAs), of which just a small fraction have been well characterized. Understanding their biological functions requires accurate annotations, or maps of the precise location and structure of genes and transcripts in the genome. Current lncRNA annotations are limited by compromises between quality and size, with many gene models being fragmentary or uncatalogued. To overcome this, the GENCODE consortium has developed RNA capture long-read sequencing (CLS), an approach combining targeted RNA capture with third-generation long-read sequencing. CLS provides accurate annotations at high-throughput rates. It eliminates the need for noisy transcriptome assembly from short reads, and requires minimal manual curation. The full-length transcript models produced are of quality comparable to present-day manually curated annotations. Here we describe a detailed CLS protocol, from probe design through long-read sequencing to creation of final annotations.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Molecular Sequence Annotation/methods , RNA, Long Noncoding/genetics , Animals , Computational Biology/methods , Data Curation , Sequence Analysis, RNA
8.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33270111

ABSTRACT

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Subject(s)
COVID-19/prevention & control , Computational Biology/methods , Databases, Genetic , Genomics/methods , Molecular Sequence Annotation/methods , SARS-CoV-2/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Epidemics , Humans , Internet , Mice , Pseudogenes/genetics , RNA, Long Noncoding/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Transcription, Genetic/genetics
9.
Pharmacol Res ; 161: 105249, 2020 11.
Article in English | MEDLINE | ID: mdl-33068730

ABSTRACT

The molecular complexity of human breast cancer (BC) renders the clinical management of the disease challenging. Long non-coding RNAs (lncRNAs) are promising biomarkers for BC patient stratification, early detection, and disease monitoring. Here, we identified the involvement of the long intergenic non-coding RNA 01087 (LINC01087) in breast oncogenesis. LINC01087 appeared significantly downregulated in triple-negative BCs (TNBCs) and upregulated in the luminal BC subtypes in comparison to mammary samples from cancer-free women and matched normal cancer pairs. Interestingly, deregulation of LINC01087 allowed to accurately distinguish between luminal and TNBC specimens, independently of the clinicopathological parameters, and of the histological and TP53 or BRCA1/2 mutational status. Moreover, increased expression of LINC01087 predicted a better prognosis in luminal BCs, while TNBC tumors that harbored lower levels of LINC01087 were associated with reduced relapse-free survival. Furthermore, bioinformatics analyses were performed on TNBC and luminal BC samples and suggested that the putative tumor suppressor activity of LINC01087 may rely on interferences with pathways involved in cell survival, proliferation, adhesion, invasion, inflammation and drug sensitivity. Altogether, these data suggest that the assessment of LINC01087 deregulation could represent a novel, specific and promising biomarker not only for the diagnosis and prognosis of luminal BC subtypes and TNBCs, but also as a predictive biomarker of pharmacological interventions.


Subject(s)
Biomarkers, Tumor/metabolism , Breast Neoplasms/metabolism , RNA, Long Noncoding/metabolism , Triple Negative Breast Neoplasms/metabolism , Biomarkers, Tumor/genetics , Breast Neoplasms/drug therapy , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Humans , MCF-7 Cells , Neoplasm Metastasis , Neoplasm Recurrence, Local , Progression-Free Survival , Protein Interaction Maps , RNA, Long Noncoding/genetics , Signal Transduction , Time Factors , Transcriptome , Triple Negative Breast Neoplasms/drug therapy , Triple Negative Breast Neoplasms/genetics , Triple Negative Breast Neoplasms/pathology
10.
NPJ Genom Med ; 4: 31, 2019.
Article in English | MEDLINE | ID: mdl-31814998

ABSTRACT

The developmental and epileptic encephalopathies (DEE) are a group of rare, severe neurodevelopmental disorders, where even the most thorough sequencing studies leave 60-65% of patients without a molecular diagnosis. Here, we explore the incompleteness of transcript models used for exome and genome analysis as one potential explanation for a lack of current diagnoses. Therefore, we have updated the GENCODE gene annotation for 191 epilepsy-associated genes, using human brain-derived transcriptomic libraries and other data to build 3,550 putative transcript models. Our annotations increase the transcriptional 'footprint' of these genes by over 674 kb. Using SCN1A as a case study, due to its close phenotype/genotype correlation with Dravet syndrome, we screened 122 people with Dravet syndrome or a similar phenotype with a panel of exon sequences representing eight established genes and identified two de novo SCN1A variants that now - through improved gene annotation - are ascribed to residing among our exons. These two (from 122 screened people, 1.6%) molecular diagnoses carry significant clinical implications. Furthermore, we identified a previously classified SCN1A intronic Dravet syndrome-associated variant that now lies within a deeply conserved exon. Our findings illustrate the potential gains of thorough gene annotation in improving diagnostic yields for genetic disorders.

11.
J Cell Sci ; 132(8)2019 04 26.
Article in English | MEDLINE | ID: mdl-31028152

ABSTRACT

The production of newly synthesized proteins is vital for all cellular functions and is a determinant of cell growth and proliferation. The synthesis of polypeptide chains from mRNA molecules requires sophisticated machineries and mechanisms that need to be tightly regulated, and adjustable to current needs of the cell. Failures in the regulation of translation contribute to the loss of protein homeostasis, which can have deleterious effects on cellular function and organismal health. Unsurprisingly, the regulation of translation appears to be a crucial element in stress response mechanisms. This review provides an overview of mechanisms that modulate cytosolic protein synthesis upon cellular stress, with a focus on the attenuation of translation in response to mitochondrial stress. We then highlight links between mitochondrion-derived reactive oxygen species and the attenuation of reversible cytosolic translation through the oxidation of ribosomal proteins at their cysteine residues. We also discuss emerging concepts of how cellular mechanisms to stress are adapted, including the existence of alternative ribosomes and stress granules, and the regulation of co-translational import upon organelle stress.


Subject(s)
Mitochondria/metabolism , Protein Biosynthesis , Ribosomes/metabolism , Cell Growth Processes , Cysteine/metabolism , Humans , Mitochondria/genetics , Oxidative Stress , Proteostasis , Reactive Oxygen Species/metabolism , Ribosomal Proteins/metabolism , Ribosomes/genetics , Signal Transduction
13.
Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30357393

ABSTRACT

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.


Subject(s)
Databases, Genetic , Genome, Human/genetics , Genomics , Pseudogenes/genetics , Animals , Computational Biology , Humans , Internet , Mice , Molecular Sequence Annotation , Software
14.
PLoS Genet ; 14(11): e1007743, 2018 11.
Article in English | MEDLINE | ID: mdl-30457989

ABSTRACT

Development and function of tissues and organs are powered by the activity of mitochondria. In humans, inherited genetic mutations that lead to progressive mitochondrial pathology often manifest during infancy and can lead to death, reflecting the indispensable nature of mitochondrial biogenesis and function. Here, we describe a zebrafish mutant for the gene mia40a (chchd4a), the life-essential homologue of the evolutionarily conserved Mia40 oxidoreductase which drives the biogenesis of cysteine-rich mitochondrial proteins. We report that mia40a mutant animals undergo progressive cellular respiration defects and develop enlarged mitochondria in skeletal muscles before their ultimate death at the larval stage. We generated a deep transcriptomic and proteomic resource that allowed us to identify abnormalities in the development and physiology of endodermal organs, in particular the liver and pancreas. We identify the acinar cells of the exocrine pancreas to be severely affected by mutations in the MIA pathway. Our data contribute to a better understanding of the molecular, cellular and organismal effects of mitochondrial deficiency, important for the accurate diagnosis and future treatment strategies of mitochondrial diseases.

15.
Nat Rev Genet ; 19(9): 535-548, 2018 09.
Article in English | MEDLINE | ID: mdl-29795125

ABSTRACT

Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.


Subject(s)
Chromosome Mapping , Gene Expression Profiling , Genome, Human , RNA, Long Noncoding , Transcriptome/physiology , Genome-Wide Association Study , Humans , RNA, Long Noncoding/biosynthesis , RNA, Long Noncoding/genetics
16.
Int J Oncol ; 52(3): 656-678, 2018 Mar.
Article in English | MEDLINE | ID: mdl-29286103

ABSTRACT

Acute myeloid leukemia (AML) is the most common and severe form of acute leukemia diagnosed in adults. Owing to its heterogeneity, AML is divided into classes associated with different treatment outcomes and specific gene expression profiles. Based on previous studies on AML, in this study, we designed and generated an AML-array containing 900 oligonucleotide probes complementary to human genes implicated in hematopoietic cell differentiation and maturation, proliferation, apoptosis and leukemic transformation. The AML-array was used to hybridize 118 samples from 33 patients with AML of the M1 and M2 subtypes of the French-American­British (FAB) classification and 15 healthy volunteers (HV). Rigorous analysis of the microarray data revealed that 83 genes were differentially expressed between the patients with AML and the HV, including genes not yet discussed in the context of AML pathogenesis. The most overexpressed genes in AML were STMN1, KITLG, CDK6, MCM5, KRAS, CEBPA, MYC, ANGPT1, SRGN, RPLP0, ENO1 and SET, whereas the most underexpressed genes were IFITM1, LTB, FCN1, BIRC3, LYZ, ADD3, S100A9, FCER1G, PTRPE, CD74 and TMSB4X. The overexpression of the CPA3 gene was specific for AML with mutated NPM1 and FLT3. Although the microarray-based method was insufficient to differentiate between any other AML subgroups, quantitative PCR approaches enabled us to identify 3 genes (ANXA3, S100A9 and WT1) whose expression can be used to discriminate between the 2 studied AML FAB subtypes. The expression levels of the ANXA3 and S100A9 genes were increased, whereas those of WT1 were decreased in the AML-M2 compared to the AML-M1 group. We also examined the association between the STMN1, CAT and ABL1 genes, and the FLT3 and NPM1 mutation status. FLT3+/NPM1- AML was associated with the highest expression of STMN1, and ABL1 was upregulated in FLT3+ AML and CAT in FLT3- AML, irrespectively of the NPM1 mutation status. Moreover, our results indicated that CAT and WT1 gene expression levels correlated with the response to therapy. CAT expression was highest in patients who remained longer under complete remission, whereas WT1 expression increased with treatment resistance. On the whole, this study demonstrates that the AML-array can potentially serve as a first-line screening tool, and may be helpful for the diagnosis of AML, whereas the differentiation between AML subgroups can be more successfully performed with PCR-based analysis of a few marker genes.


Subject(s)
Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Biomarkers, Tumor/genetics , Gene Expression Profiling/methods , Leukemia, Myeloid, Acute/genetics , Oligonucleotide Array Sequence Analysis/methods , Adolescent , Adult , Aged , Catalase/genetics , Catalase/metabolism , Drug Resistance, Neoplasm/genetics , Female , Humans , Leukemia, Myeloid, Acute/diagnosis , Leukemia, Myeloid, Acute/drug therapy , Leukemia, Myeloid, Acute/pathology , Male , Middle Aged , Mutation , Nucleophosmin , Prognosis , Real-Time Polymerase Chain Reaction/methods , Remission Induction/methods , Sequence Analysis, RNA/methods , Treatment Outcome , WT1 Proteins/genetics , WT1 Proteins/metabolism , Young Adult
17.
Nat Genet ; 49(12): 1731-1740, 2017 Dec.
Article in English | MEDLINE | ID: mdl-29106417

ABSTRACT

Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.


Subject(s)
Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Molecular Sequence Annotation/methods , RNA, Long Noncoding/genetics , Animals , Gene Expression Profiling/methods , Genomics/methods , Humans , Mice , Open Reading Frames/genetics , Reproducibility of Results
19.
Nat Commun ; 7: 12339, 2016 08 17.
Article in English | MEDLINE | ID: mdl-27531712

ABSTRACT

Long non-coding RNAs (lncRNAs) constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome. Such characterization requires a comprehensive, high-quality annotation of their gene structure and boundaries, which is currently lacking. Here we describe RACE-Seq, an experimental workflow designed to address this based on RACE (rapid amplification of cDNA ends) and long-read RNA sequencing. We apply RACE-Seq to 398 human lncRNA genes in seven tissues, leading to the discovery of 2,556 on-target, novel transcripts. About 60% of the targeted loci are extended in either 5' or 3', often reaching genomic hallmarks of gene boundaries. Analysis of the novel transcripts suggests that lncRNAs are as long, have as many exons and undergo as much alternative splicing as protein-coding genes, contrary to current assumptions. Overall, we show that RACE-Seq is an effective tool to annotate an organism's deep transcriptome, and compares favourably to other targeted sequencing techniques.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Polymerase Chain Reaction/methods , RNA, Long Noncoding/genetics , Sequence Analysis, RNA/methods , Exons/genetics , Genetic Loci , Humans , Molecular Sequence Annotation , Organ Specificity/genetics , Proof of Concept Study , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA Splice Sites/genetics , RNA, Long Noncoding/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Transcriptome/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...