Search | VHL Regional Portal

A joint NCBI and EMBL-EBI transcript set for clinical genomics and research.

Morales, Joannella; Pujar, Shashikant; Loveland, Jane E; Astashyn, Alex; Bennett, Ruth; Berry, Andrew; Cox, Eric; Davidson, Claire; Ermolaeva, Olga; Farrell, Catherine M; Fatima, Reham; Gil, Laurent; Goldfarb, Tamara; Gonzalez, Jose M; Haddad, Diana; Hardy, Matthew; Hunt, Toby; Jackson, John; Joardar, Vinita S; Kay, Michael; Kodali, Vamsi K; McGarvey, Kelly M; McMahon, Aoife; Mudge, Jonathan M; Murphy, Daniel N; Murphy, Michael R; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Thibaud-Nissen, Françoise; Threadgold, Glen; Vatsan, Anjana R; Wallin, Craig; Webb, David; Flicek, Paul; Birney, Ewan; Pruitt, Kim D; Frankish, Adam; Cunningham, Fiona; Murphy, Terence D.

Nature ; 604(7905): 310-315, 2022 04.

Article in English | MEDLINE | ID: mdl-35388217

ABSTRACT

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.

Subject(s)

Computational Biology , Databases, Genetic , Genomics , Genome , Humans , Information Dissemination , Molecular Sequence Annotation , National Library of Medicine (U.S.) , United States

Locus Reference Genomic: reference sequences for the reporting of clinically relevant sequence variants.

MacArthur, Jacqueline A L; Morales, Joannella; Tully, Ray E; Astashyn, Alex; Gil, Laurent; Bruford, Elspeth A; Larsson, Pontus; Flicek, Paul; Dalgleish, Raymond; Maglott, Donna R; Cunningham, Fiona.

Nucleic Acids Res ; 42(Database issue): D873-8, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24285302

ABSTRACT

Locus Reference Genomic (LRG; http://www.lrg-sequence.org/) records contain internationally recognized stable reference sequences designed specifically for reporting clinically relevant sequence variants. Each LRG is contained within a single file consisting of a stable 'fixed' section and a regularly updated 'updatable' section. The fixed section contains stable genomic DNA sequence for a genomic region, essential transcripts and proteins for variant reporting and an exon numbering system. The updatable section contains mapping information, annotation of all transcripts and overlapping genes in the region and legacy exon and amino acid numbering systems. LRGs provide a stable framework that is vital for reporting variants, according to Human Genome Variation Society (HGVS) conventions, in genomic DNA, transcript or protein coordinates. To enable translation of information between LRG and genomic coordinates, LRGs include mapping to the human genome assembly. LRGs are compiled and maintained by the National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). LRG reference sequences are selected in collaboration with the diagnostic and research communities, locus-specific database curators and mutation consortia. Currently >700 LRGs have been created, of which >400 are publicly available. The aim is to create an LRG for every locus with clinical implications.

Subject(s)

Databases, Genetic , Genetic Variation , Genome, Human , Exons , Genetic Loci , Genomics/standards , Humans , Internet , Proteins/genetics , RNA, Messenger/chemistry , Reference Standards

Locus Reference Genomic sequences: an improved basis for describing human DNA variants.

Dalgleish, Raymond; Flicek, Paul; Cunningham, Fiona; Astashyn, Alex; Tully, Raymond E; Proctor, Glenn; Chen, Yuan; McLaren, William M; Larsson, Pontus; Vaughan, Brendan W; Béroud, Christophe; Dobson, Glen; Lehväslaiho, Heikki; Taschner, Peter Em; den Dunnen, Johan T; Devereau, Andrew; Birney, Ewan; Brookes, Anthony J; Maglott, Donna R.

Genome Med ; 2(4): 24, 2010 Apr 15.

Article in English | MEDLINE | ID: mdl-20398331

ABSTRACT

As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specific purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-file record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)-approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants affecting human health. Further information can be found on the LRG web site: http://www.lrg-sequence.org.

The completion of the Mammalian Gene Collection (MGC).

Temple, Gary; Gerhard, Daniela S; Rasooly, Rebekah; Feingold, Elise A; Good, Peter J; Robinson, Cristen; Mandich, Allison; Derge, Jeffrey G; Lewis, Jeanne; Shoaf, Debonny; Collins, Francis S; Jang, Wonhee; Wagner, Lukas; Shenmen, Carolyn M; Misquitta, Leonie; Schaefer, Carl F; Buetow, Kenneth H; Bonner, Tom I; Yankie, Linda; Ward, Ming; Phan, Lon; Astashyn, Alex; Brown, Garth; Farrell, Catherine; Hart, Jennifer; Landrum, Melissa; Maidak, Bonnie L; Murphy, Michael; Murphy, Terence; Rajput, Bhanu; Riddick, Lillian; Webb, David; Weber, Janet; Wu, Wendy; Pruitt, Kim D; Maglott, Donna; Siepel, Adam; Brejova, Brona; Diekhans, Mark; Harte, Rachel; Baertsch, Robert; Kent, Jim; Haussler, David; Brent, Michael; Langton, Laura; Comstock, Charles L G; Stevens, Michael; Wei, Chaochun; van Baren, Marijke J; Salehi-Ashtiani, Kourosh.

Genome Res ; 19(12): 2324-33, 2009 Dec.

Article in English | MEDLINE | ID: mdl-19767417

ABSTRACT

Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.

Subject(s)

Cloning, Molecular/methods , Computational Biology/methods , DNA, Complementary/genetics , Gene Library , Genes/genetics , Mammals/genetics , Animals , DNA/biosynthesis , Humans , Mice , National Institutes of Health (U.S.) , Rats , Reverse Transcriptase Polymerase Chain Reaction , United States

The genome sequence of taurine cattle: a window to ruminant biology and evolution.

Elsik, Christine G; Tellam, Ross L; Worley, Kim C; Gibbs, Richard A; Muzny, Donna M; Weinstock, George M; Adelson, David L; Eichler, Evan E; Elnitski, Laura; Guigó, Roderic; Hamernik, Debora L; Kappes, Steve M; Lewin, Harris A; Lynn, David J; Nicholas, Frank W; Reymond, Alexandre; Rijnkels, Monique; Skow, Loren C; Zdobnov, Evgeny M; Schook, Lawrence; Womack, James; Alioto, Tyler; Antonarakis, Stylianos E; Astashyn, Alex; Chapple, Charles E; Chen, Hsiu-Chuan; Chrast, Jacqueline; Câmara, Francisco; Ermolaeva, Olga; Henrichsen, Charlotte N; Hlavina, Wratko; Kapustin, Yuri; Kiryutin, Boris; Kitts, Paul; Kokocinski, Felix; Landrum, Melissa; Maglott, Donna; Pruitt, Kim; Sapojnikov, Victor; Searle, Stephen M; Solovyev, Victor; Souvorov, Alexandre; Ucla, Catherine; Wyss, Carine; Anzola, Juan M; Gerlach, Daniel; Elhaik, Eran; Graur, Dan; Reese, Justin T; Edgar, Robert C.

Science ; 324(5926): 522-8, 2009 Apr 24.

Article in English | MEDLINE | ID: mdl-19390049

ABSTRACT

To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.

Subject(s)

Biological Evolution , Genome , Alternative Splicing , Animals , Animals, Domestic , Cattle , Evolution, Molecular , Female , Genetic Variation , Humans , Male , MicroRNAs/genetics , Molecular Sequence Data , Proteins/genetics , Sequence Analysis, DNA , Species Specificity , Synteny

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL