Search | VHL Regional Portal

1.

Multi-cancer risk stratification based on national health data: a retrospective modelling and validation study.

Jung, Alexander W; Holm, Peter C; Gaurav, Kumar; Hjaltelin, Jessica Xin; Placido, Davide; Mortensen, Laust Hvas; Birney, Ewan; Brunak, S Ren; Gerstung, Moritz.

Lancet Digit Health ; 6(6): e396-e406, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38789140

ABSTRACT

BACKGROUND: Health care is experiencing a drive towards digitisation, and many countries are implementing national health data resources. Although a range of cancer risk models exists, the utility on a population level for risk stratification across cancer types has not been fully explored. We aimed to close this gap by evaluating pan-cancer risk models built on electronic health records across the Danish population with validation in the UK Biobank. METHODS: In this retrospective modelling and validation study, data for model development and internal validation were derived from the following Danish health registries: the Central Person Registry, the Danish National Patient Registry, the death registry, the cancer registry, and full-text medical records from secondary care records in the capital region. The development data included adults aged 16-86 years without previous malignant cancers in the time period from Jan 1, 1995, to Dec 31, 2014. The internal validation period was from Jan 1, 2015, to April 10, 2018, and the data included all adults without a previous indication of cancer aged 16-75 years on Dec 31, 2014. The external validation cohort from the UK Biobank included all adults without a previous indication of cancer aged 50-75 years. We used time-dependent Bayesian Cox hazard models built on the combined medical history of Danish individuals. A set of 1392 covariates from available clinical disease trajectories, text-mined basic health factors, and family histories were used to train predictive models of 20 major cancer types. The models were validated on cancer incidence between 2015 and 2018 across Denmark and on individuals in the UK Biobank. The primary outcomes were discrimination and calibration performance. FINDINGS: From the Danish registries, we included 6â732â553 individuals covering 60 million hospital visits, 90 million diagnoses, and a total of 193 million life-years between Jan 1, 1978, and April 10, 2018. Danish registry data covering the period from Jan 1, 2015, to April 10, 2018, were used to internally validate risk models, containing a total of 4â248â491 individuals who remained at risk of a primary malignant cancer diagnosis and 67â401 cancer cases recorded. For the external validation, we evaluated the same time period in the UK Biobank covering 377â004 individuals with 11â486 cancer cases. The predictive performance of the models on Danish data showed good discrimination (concordance index 0·81 [SD 0·08], ranging from 0·66 [95% CI 0·65-0·67] for cervix uteri cancer to 0·91 [0·90-0·92] for liver cancer). Performance was similar on the UK Biobank in a direct transfer when controlling for shifts in the age distribution (concordance index 0·66 [SD 0·08], ranging from 0·55 [95% CI 0·44-0·66] for cervix uteri cancer to 0·78 [0·77-0·79] for lung cancer). Cancer risks were associated, in addition to heritable components, with a broad range of preceding diagnoses and health factors. The best overall performance was seen for cancers of the digestive system (oesophageal, stomach, colorectal, liver, and pancreatic) but also thyroid, kidney, and uterine cancers. INTERPRETATION: Data available in national electronic health databases can be used to approximate cancer risk factors and enable risk predictions in most cancer types. Model predictions generalise between the Danish and UK health-care systems. With the emergence of multi-cancer early detection tests, electronic health record-based risk models could supplement screening efforts. FUNDING: Novo Nordisk Foundation and the Danish Innovation Foundation.

Subject(s)

Electronic Health Records , Neoplasms , Humans , Middle Aged , Aged , Adult , Denmark/epidemiology , Female , Retrospective Studies , Male , Neoplasms/epidemiology , Adolescent , Risk Assessment/methods , Young Adult , Aged, 80 and over , United Kingdom/epidemiology , Registries , Bayes Theorem , Proportional Hazards Models , Risk Factors

2.

A society-wide conversation is needed about germline genome editing using CRISPR.

Birney, Ewan.

Nat Med ; 30(1): 30-32, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38177854

Subject(s)

Clustered Regularly Interspaced Short Palindromic Repeats , Gene Editing , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , CRISPR-Cas Systems/genetics , Germ Cells

3.

Benchmarking of computational methods for m6A profiling with Nanopore direct RNA sequencing.

Maestri, Simone; Furlan, Mattia; Mulroney, Logan; Coscujuela Tarrero, Lucia; Ugolini, Camilla; Dalla Pozza, Fabio; Leonardi, Tommaso; Birney, Ewan; Nicassio, Francesco; Pelizzola, Mattia.

Brief Bioinform ; 25(2)2024 Jan 22.

Article in English | MEDLINE | ID: mdl-38279646

ABSTRACT

N6-methyladenosine (m6A) is the most abundant internal eukaryotic mRNA modification, and is involved in the regulation of various biological processes. Direct Nanopore sequencing of native RNA (dRNA-seq) emerged as a leading approach for its identification. Several software were published for m6A detection and there is a strong need for independent studies benchmarking their performance on data from different species, and against various reference datasets. Moreover, a computational workflow is needed to streamline the execution of tools whose installation and execution remains complicated. We developed NanOlympicsMod, a Nextflow pipeline exploiting containerized technology for comparing 14 tools for m6A detection on dRNA-seq data. NanOlympicsMod was tested on dRNA-seq data generated from in vitro (un)modified synthetic oligos. The m6A hits returned by each tool were compared to the m6A position known by design of the oligos. In addition, NanOlympicsMod was used on dRNA-seq datasets from wild-type and m6A-depleted yeast, mouse and human, and each tool's hits were compared to reference m6A sets generated by leading orthogonal methods. The performance of the tools markedly differed across datasets, and methods adopting different approaches showed different preferences in terms of precision and recall. Changing the stringency cut-offs allowed for tuning the precision-recall trade-off towards user preferences. Finally, we determined that precision and recall of tools are markedly influenced by sequencing depth, and that additional sequencing would likely reveal additional m6A sites. Thanks to the possibility of including novel tools, NanOlympicsMod will streamline the benchmarking of m6A detection tools on dRNA-seq data, improving future RNA modification characterization.

Subject(s)

Adenine/analogs & derivatives , Nanopore Sequencing , Nanopores , Humans , Animals , Mice , RNA/genetics , Benchmarking , Sequence Analysis, RNA/methods

4.

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences.

Varadi, Mihaly; Bertoni, Damian; Magana, Paulyna; Paramval, Urmila; Pidruchna, Ivanna; Radhakrishnan, Malarvizhi; Tsenkov, Maxim; Nair, Sreenath; Mirdita, Milot; Yeo, Jingi; Kovalevskiy, Oleg; Tunyasuvunakool, Kathryn; Laydon, Agata; Zídek, Augustin; Tomlinson, Hamish; Hariharan, Dhavanthi; Abrahamson, Josh; Green, Tim; Jumper, John; Birney, Ewan; Steinegger, Martin; Hassabis, Demis; Velankar, Sameer.

Nucleic Acids Res ; 52(D1): D368-D375, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37933859

ABSTRACT

The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.

The AlphaFold Protein Structure Database (AlphaFold DB) is a massive digital library of predicted protein structures, with over 214 million entries, marking a 500-times expansion in size since its initial release in 2021. The structures are predicted using Google DeepMind's AlphaFold 2 artificial intelligence (AI) system. Our new report highlights the latest updates we have made to this database. We have added more data on specific organisms and proteins related to global health and expanded to cover almost the complete UniProt database, a primary data resource of protein sequences. We also made it easier for our users to access the data by directly downloading files or using advanced cloud-based tools. Finally, we have also improved how users view and search through these protein structures, making the user experience smoother and more informative. In short, AlphaFold DB has been growing rapidly and has become more user-friendly and robust to support the broader scientific community.

Subject(s)

Artificial Intelligence , Protein Structure, Secondary , Proteome , Amino Acid Sequence , Databases, Protein , Search Engine , Proteins/chemistry

5.

Natural genetic variation quantitatively regulates heart rate and dimension.

Gierten, Jakob; Welz, Bettina; Fitzgerald, Tomas; Thumberger, Thomas; Hummel, Oliver; Leger, Adrien; Weber, Philipp; Hassel, David; Hübner, Norbert; Birney, Ewan; Wittbrodt, Joachim.

bioRxiv ; 2023 Nov 02.

Article in English | MEDLINE | ID: mdl-37693611

ABSTRACT

The polygenic contribution to heart development and function along the health-disease continuum remains unresolved. To gain insight into the genetic basis of quantitative cardiac phenotypes, we utilize highly inbred Japanese rice fish models, Oryzias latipes, and Oryzias sakaizumii. Employing automated quantification of embryonic heart rates as core metric, we profiled phenotype variability across five inbred strains. We observed maximal phenotypic contrast between individuals of the HO5 and the HdrR strain. HO5 showed elevated heart rates associated with embryonic ventricular hypoplasia and impaired adult cardiac function. This contrast served as the basis for genome-wide mapping. In a segregation population of 1192 HO5 x HdrR F2 embryos, we mapped 59 loci (173 genes) associated with heart rate. Experimental validation of the top 12 candidate genes in loss-of-function models revealed their causal and distinct impact on heart rate, development, ventricle size, and arrhythmia. Our study uncovers new diagnostic and therapeutic targets for developmental and electrophysiological cardiac diseases and provides a novel scalable approach to investigate the intricate genetic architecture of the vertebrate heart.

6.

A multilayered approach to the analysis of genetic data from individuals with suspected albinism.

Sergouniotis, Panagiotis I; Michaud, Vincent; Lasseaux, Eulalie; Campbell, Christopher; Plaisant, Claudio; Javerzat, Sophie; Birney, Ewan; Ramsden, Simon C; Black, Graeme C; Arveiler, Benoit.

J Med Genet ; 60(12): 1245-1249, 2023 Nov 27.

Article in English | MEDLINE | ID: mdl-37460203

ABSTRACT

Albinism is a clinically and genetically heterogeneous group of conditions characterised by visual abnormalities and variable degrees of hypopigmentation. Multiple studies have demonstrated the clinical utility of genetic investigations in individuals with suspected albinism. Despite this, the variation in the provision of genetic testing for albinism remains significant. One key issue is the lack of a standardised approach to the analysis of genomic data from affected individuals. For example, there is variation in how different clinical genetic laboratories approach genotypes that involve incompletely penetrant alleles, including the common, 'hypomorphic' TYR c.1205G>A (p.Arg402Gln) [rs1126809] variant. Here, we discuss the value of genetic testing as a frontline diagnostic tool in individuals with features of albinism and propose a practice pattern for the analysis of genomic data from affected families.

Subject(s)

Albinism, Oculocutaneous , Albinism , Humans , Albinism/genetics , Albinism/diagnosis , Albinism, Oculocutaneous/diagnosis , Albinism, Oculocutaneous/genetics , Genetic Testing , Genotype , Alleles

7.

Long-read sequencing of diagnosis and post-therapy medulloblastoma reveals complex rearrangement patterns and epigenetic signatures.

Rausch, Tobias; Snajder, Rene; Leger, Adrien; Simovic, Milena; Giurgiu, Madalina; Villacorta, Laura; Henssen, Anton G; Fröhling, Stefan; Stegle, Oliver; Birney, Ewan; Bonder, Marc Jan; Ernst, Aurelie; Korbel, Jan O.

Cell Genom ; 3(4): 100281, 2023 Apr 12.

Article in English | MEDLINE | ID: mdl-37082141

ABSTRACT

Cancer genomes harbor a broad spectrum of structural variants (SVs) driving tumorigenesis, a relevant subset of which escape discovery using short-read sequencing. We employed Oxford Nanopore Technologies (ONT) long-read sequencing in a paired diagnostic and post-therapy medulloblastoma to unravel the haplotype-resolved somatic genetic and epigenetic landscape. We assembled complex rearrangements, including a 1.55-Mbp chromothripsis event, and we uncover a complex SV pattern termed templated insertion (TI) thread, characterized by short (mostly <1 kb) insertions showing prevalent self-concatenation into highly amplified structures of up to 50 kbp in size. TI threads occur in 3% of cancers, with a prevalence up to 74% in liposarcoma, and frequent colocalization with chromothripsis. We also perform long-read-based methylome profiling and discover allele-specific methylation (ASM) effects, complex rearrangements exhibiting differential methylation, and differential promoter methylation in cancer-driver genes. Our study shows the advantage of long-read sequencing in the discovery and characterization of complex somatic rearrangements.

8.

Sub-cellular level resolution of common genetic variation in the photoreceptor layer identifies continuum between rare disease and common variation.

Currant, Hannah; Fitzgerald, Tomas W; Patel, Praveen J; Khawaja, Anthony P; Webster, Andrew R; Mahroo, Omar A; Birney, Ewan.

PLoS Genet ; 19(2): e1010587, 2023 02.

Article in English | MEDLINE | ID: mdl-36848389

ABSTRACT

Photoreceptor cells (PRCs) are the light-detecting cells of the retina. Such cells can be non-invasively imaged using optical coherence tomography (OCT) which is used in clinical settings to diagnose and monitor ocular diseases. Here we present the largest genome-wide association study of PRC morphology to date utilising quantitative phenotypes extracted from OCT images within the UK Biobank. We discovered 111 loci associated with the thickness of one or more of the PRC layers, many of which had prior associations to ocular phenotypes and pathologies, and 27 with no prior associations. We further identified 10 genes associated with PRC thickness through gene burden testing using exome data. In both cases there was a significant enrichment for genes involved in rare eye pathologies, in particular retinitis pigmentosa. There was evidence for an interaction effect between common genetic variants, VSX2 involved in eye development and PRPH2 known to be involved in retinal dystrophies. We further identified a number of genetic variants with a differential effect across the macular spatial field. Our results suggest a continuum between common and rare variation which impacts retinal structure, sometimes leading to disease.

Subject(s)

Genome-Wide Association Study , Rare Diseases , Humans , Rare Diseases/pathology , Retina/pathology , Photoreceptor Cells , Genetic Variation

9.

Using Nanocompore to Identify RNA Modifications from Direct RNA Nanopore Sequencing Data.

Mulroney, Logan; Birney, Ewan; Leonardi, Tommaso; Nicassio, Francesco.

Curr Protoc ; 3(2): e683, 2023 Feb.

Article in English | MEDLINE | ID: mdl-36840709

ABSTRACT

RNA modifications can alter the behavior of RNA molecules depending on where they are located on the strands. Traditionally, RNA modifications have been detected and characterized by biophysical assays, mass spectrometry, or specific next-generation sequencing techniques, but are limited to specific modifications or are low throughput. Nanopore is a platform capable of sequencing RNA strands directly, which permits transcriptome-wide detection of RNA modifications. RNA modifications alter the nanopore raw signal relative to the canonical form of the nucleotide, and several software tools detect these signal alterations. One such tool is Nanocompore, which compares the ionic current features between two different experimental conditions (i.e., with and without RNA modifications) to detect RNA modifications. Nanocompore is not limited to a single type of RNA modification, has a high specificity for detecting RNA modifications, and does not require model training. To use Nanocompore, the following steps are needed: (i) the data must be basecalled and aligned to the reference transcriptome, then the raw ionic current signals are aligned to the sequences and transformed into a Nanocompore-compatible format; (ii) finally, the statistical testing is conducted on the transformed data and produces a table of p-value predictions for the positions of the RNA modifications. These steps can be executed with several different methods, and thus we have also included two alternative protocols for running Nanocompore. Once the positions of RNA modifications are determined by Nanocompore, users can investigate their function in various metabolic pathways. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: RNA modification detection by Nanocompore Alternate Protocol 1: RNA modification detection by Nanocompore with f5c Alternate Protocol 2: RNA modification detection by Nanocompore using Nextflow.

Subject(s)

Nanopore Sequencing , Nanopores , Nanopore Sequencing/methods , RNA/chemistry , RNA/genetics , RNA/metabolism , Sequence Analysis, RNA , High-Throughput Nucleotide Sequencing/methods

10.

Dynamic, adaptive sampling during nanopore sequencing using Bayesian experimental design.

Weilguny, Lukas; De Maio, Nicola; Munro, Rory; Manser, Charlotte; Birney, Ewan; Loose, Matthew; Goldman, Nick.

Nat Biotechnol ; 41(7): 1018-1025, 2023 Jul.

Article in English | MEDLINE | ID: mdl-36593407

ABSTRACT

Nanopore sequencers can select which DNA molecules to sequence, rejecting a molecule after analysis of a small initial part. Currently, selection is based on predetermined regions of interest that remain constant throughout an experiment. Sequencing efforts, thus, cannot be re-focused on molecules likely contributing most to experimental success. Here we present BOSS-RUNS, an algorithmic framework and software to generate dynamically updated decision strategies. We quantify uncertainty at each genome position with real-time updates from data already observed. For each DNA fragment, we decide whether the expected decrease in uncertainty that it would provide warrants fully sequencing it, thus optimizing information gain. BOSS-RUNS mitigates coverage bias between and within members of a microbial community, leading to improved variant calling; for example, low-coverage sites of a species at 1% abundance were reduced by 87.5%, with 12.5% more single-nucleotide polymorphisms detected. Such data-driven updates to molecule selection are applicable to many sequencing scenarios, such as enriching for regions with increased divergence or low coverage, reducing time-to-answer.

Subject(s)

Nanopore Sequencing , Nanopores , Research Design , Bayes Theorem , Genome , Software , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA

11.

Using machine learning to model older adult inpatient trajectories from electronic health records data.

Herrero-Zazo, Maria; Fitzgerald, Tomas; Taylor, Vince; Street, Helen; Chaudhry, Afzal N; Bradley, John R; Birney, Ewan; Keevil, Victoria L.

iScience ; 26(1): 105876, 2023 Jan 20.

Article in English | MEDLINE | ID: mdl-36691609

ABSTRACT

Electronic Health Records (EHR) data can provide novel insights into inpatient trajectories. Blood tests and vital signs from de-identified patients' hospital admission episodes (AE) were represented as multivariate time-series (MVTS) to train unsupervised Hidden Markov Models (HMM) and represent each AE day as one of 17 states. All HMM states were clinically interpreted based on their patterns of MVTS variables and relationships with clinical information. Visualization differentiated patients progressing toward stable 'discharge-like' states versus those remaining at risk of inpatient mortality (IM). Chi-square tests confirmed these relationships (two states associated with IM; 12 states with ≥1 diagnosis). Logistic Regression and Random Forest (RF) models trained with MVTS data rather than states had higher prediction performances of IM, but results were comparable (best RF model AUC-ROC: MVTS data = 0.85; HMM states = 0.79). ML models extracted clinically interpretable signals from hospital data. The potential of ML to develop decision-support tools for EHR systems warrants investigation.

12.

From genetic variation to precision medicine.

Sergouniotis, Panagiotis I; Fitzgerald, Tomas; Birney, Ewan.

Camb Prism Precis Med ; 1: e7, 2023.

Article in English | MEDLINE | ID: mdl-38550939

ABSTRACT

Genetics has been an important tool for discovering new aspects of biology across life. In humans, there is growing momentum behind the application of this knowledge to drive innovation in clinical care, most notably through developments in precision medicine. Nowhere has the impact of genetics on clinical practice been more striking than in the field of rare disorders. For most of these conditions, individual disease susceptibility is influenced by DNA sequence variation in a single or a small number of genes. In contrast, most common disorders are multifactorial and are caused by a complex interplay of multiple genetic, environmental and stochastic factors. The longstanding division of human disease genetics into rare and common components has obscured the continuum of human traits and echoes aspects of the century-old debate between the Mendelian and biometric views of human genetics. In this article, we discuss the differences in data and concepts between rare and common disease genetics. Opportunities to unify these two areas are noted and the importance of adopting a holistic perspective that integrates diverse genetic and environmental factors is discussed.

13.

GREENER principles for environmentally sustainable computational science.

Lannelongue, Loïc; Aronson, Hans-Erik G; Bateman, Alex; Birney, Ewan; Caplan, Talia; Juckes, Martin; McEntyre, Johanna; Morris, Andrew D; Reilly, Gerry; Inouye, Michael.

Nat Comput Sci ; 3(6): 514-521, 2023 Jun.

Article in English | MEDLINE | ID: mdl-38177425

ABSTRACT

The carbon footprint of scientific computing is substantial, but environmentally sustainable computational science (ESCS) is a nascent field with many opportunities to thrive. To realize the immense green opportunities and continued, yet sustainable, growth of computer science, we must take a coordinated approach to our current challenges, including greater awareness and transparency, improved estimation and wider reporting of environmental impacts. Here, we present a snapshot of where ESCS stands today and introduce the GREENER set of principles, as well as guidance for best practices moving forward.

14.

The contribution of common regulatory and protein-coding TYR variants to the genetic architecture of albinism.

Michaud, Vincent; Lasseaux, Eulalie; Green, David J; Gerrard, Dave T; Plaisant, Claudio; Fitzgerald, Tomas; Birney, Ewan; Arveiler, Benoît; Black, Graeme C; Sergouniotis, Panagiotis I.

Nat Commun ; 13(1): 3939, 2022 07 08.

Article in English | MEDLINE | ID: mdl-35803923

ABSTRACT

Genetic diseases have been historically segregated into rare Mendelian disorders and common complex conditions. Large-scale studies using genome sequencing are eroding this distinction and are gradually unmasking the underlying complexity of human traits. Here, we analysed data from the Genomics England 100,000 Genomes Project and from a cohort of 1313 individuals with albinism aiming to gain insights into the genetic architecture of this archetypal rare disorder. We investigated the contribution of protein-coding and regulatory variants both rare and common. We focused on TYR, the gene encoding tyrosinase, and found that a high-frequency promoter variant, TYR c.-301C>T [rs4547091], modulates the penetrance of a prevalent, albinism-associated missense change, TYR c.1205G>A (p.Arg402Gln) [rs1126809]. We also found that homozygosity for a haplotype formed by three common, functionally-relevant variants, TYR c.[-301C;575C>A;1205G>A], is associated with a high probability of receiving an albinism diagnosis (OR>82). This genotype is also associated with reduced visual acuity and with increased central retinal thickness in UK Biobank participants. Finally, we report how the combined analysis of rare and common variants can increase diagnostic yield and can help inform genetic counselling in families with albinism.

Subject(s)

Albinism, Oculocutaneous , Albinism , Albinism/genetics , Albinism, Oculocutaneous/genetics , Genotype , Humans , Monophenol Monooxygenase/genetics , Mutant Proteins/genetics , Pedigree , Phenotype

15.

Publisher Correction: Genomic reconstruction of the SARS CoV-2 epidemic in England.

Vöhringer, Harald S; Sanderson, Theo; Sinnott, Matthew; De Maio, Nicola; Nguyen, Thuy; Goater, Richard; Schwach, Frank; Harrison, Ian; Hellewell, Joel; Ariani, Cristina V; Gonçalves, Sonia; Jackson, David K; Johnston, Ian; Jung, Alexander W; Saint, Callum; Sillitoe, John; Suciu, Maria; Goldman, Nick; Panovska-Griffiths, Jasmina; Birney, Ewan; Volz, Erik; Funk, Sebastian; Kwiatkowski, Dominic; Chand, Meera; Martincorena, Inigo; Barrett, Jeffrey C; Gerstung, Moritz.

Nature ; 606(7915): E18, 2022 Jun.

Article in English | MEDLINE | ID: mdl-35701578

16.

The Gene Curation Coalition: A global effort to harmonize gene-disease evidence resources.

DiStefano, Marina T; Goehringer, Scott; Babb, Lawrence; Alkuraya, Fowzan S; Amberger, Joanna; Amin, Mutaz; Austin-Tse, Christina; Balzotti, Marie; Berg, Jonathan S; Birney, Ewan; Bocchini, Carol; Bruford, Elspeth A; Coffey, Alison J; Collins, Heather; Cunningham, Fiona; Daugherty, Louise C; Einhorn, Yaron; Firth, Helen V; Fitzpatrick, David R; Foulger, Rebecca E; Goldstein, Jennifer; Hamosh, Ada; Hurles, Matthew R; Leigh, Sarah E; Leong, Ivone U S; Maddirevula, Sateesh; Martin, Christa L; McDonagh, Ellen M; Olry, Annie; Puzriakova, Arina; Radtke, Kelly; Ramos, Erin M; Rath, Ana; Riggs, Erin Rooney; Roberts, Angharad M; Rodwell, Charlotte; Snow, Catherine; Stark, Zornitza; Tahiliani, Jackie; Tweedie, Susan; Ware, James S; Weller, Phillip; Williams, Eleanor; Wright, Caroline F; Yates, Thabo Michael; Rehm, Heidi L.

Genet Med ; 24(8): 1732-1742, 2022 08.

Article in English | MEDLINE | ID: mdl-35507016

ABSTRACT

PURPOSE: Several groups and resources provide information that pertains to the validity of gene-disease relationships used in genomic medicine and research; however, universal standards and terminologies to define the evidence base for the role of a gene in disease and a single harmonized resource were lacking. To tackle this issue, the Gene Curation Coalition (GenCC) was formed. METHODS: The GenCC drafted harmonized definitions for differing levels of gene-disease validity on the basis of existing resources, and performed a modified Delphi survey with 3 rounds to narrow the list of terms. The GenCC also developed a unified database to display curated gene-disease validity assertions from its members. RESULTS: On the basis of 241 survey responses from the genetics community, a consensus term set was chosen for grading gene-disease validity and database submissions. As of December 2021, the database contained 15,241 gene-disease assertions on 4569 unique genes from 12 submitters. When comparing submissions to the database from distinct sources, conflicts in assertions of gene-disease validity ranged from 5.3% to 13.4%. CONCLUSION: Terminology standardization, sharing of gene-disease validity classifications, and resolution of curation conflicts will facilitate collaborations across international curation efforts and in turn, improve consistency in genetic testing and variant interpretation.

Subject(s)

Databases, Genetic , Genomics , Genetic Testing , Genetic Variation , Humans

17.

Selective clonal persistence of human retroviruses in vivo: Radial chromatin organization, integration site, and host transcription.

Melamed, Anat; Fitzgerald, Tomas W; Wang, Yuchuan; Ma, Jian; Birney, Ewan; Bangham, Charles R M.

Sci Adv ; 8(17): eabm6210, 2022 Apr 29.

Article in English | MEDLINE | ID: mdl-35486737

ABSTRACT

The human retroviruses HTLV-1 (human T cell leukemia virus type 1) and HIV-1 persist in vivo as a reservoir of latently infected T cell clones. It is poorly understood what determines which clones survive in the reservoir. We compared >160,000 HTLV-1 integration sites (>40,000 HIV-1 sites) from T cells isolated ex vivo from naturally infected individuals with >230,000 HTLV-1 integration sites (>65,000 HIV-1 sites) from in vitro infection to identify genomic features that determine selective clonal survival. Three statistically independent factors together explained >40% of the observed variance in HTLV-1 clonal survival in vivo: the radial intranuclear position of the provirus, its genomic distance from the centromere, and the intensity of local host genome transcription. The radial intranuclear position of the provirus and its distance from the centromere also explained ~7% of clonal persistence of HIV-1 in vivo. Selection for the intranuclear and intrachromosomal location of the provirus and host transcription intensity favors clonal persistence of human retroviruses in vivo.

18.

A joint NCBI and EMBL-EBI transcript set for clinical genomics and research.

Morales, Joannella; Pujar, Shashikant; Loveland, Jane E; Astashyn, Alex; Bennett, Ruth; Berry, Andrew; Cox, Eric; Davidson, Claire; Ermolaeva, Olga; Farrell, Catherine M; Fatima, Reham; Gil, Laurent; Goldfarb, Tamara; Gonzalez, Jose M; Haddad, Diana; Hardy, Matthew; Hunt, Toby; Jackson, John; Joardar, Vinita S; Kay, Michael; Kodali, Vamsi K; McGarvey, Kelly M; McMahon, Aoife; Mudge, Jonathan M; Murphy, Daniel N; Murphy, Michael R; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Thibaud-Nissen, Françoise; Threadgold, Glen; Vatsan, Anjana R; Wallin, Craig; Webb, David; Flicek, Paul; Birney, Ewan; Pruitt, Kim D; Frankish, Adam; Cunningham, Fiona; Murphy, Terence D.

Nature ; 604(7905): 310-315, 2022 04.

Article in English | MEDLINE | ID: mdl-35388217

ABSTRACT

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.

Subject(s)

Computational Biology , Databases, Genetic , Genomics , Genome , Humans , Information Dissemination , Molecular Sequence Annotation , National Library of Medicine (U.S.) , United States

19.

Nanopore ReCappable sequencing maps SARS-CoV-2 5' capping sites and provides new insights into the structure of sgRNAs.

Ugolini, Camilla; Mulroney, Logan; Leger, Adrien; Castelli, Matteo; Criscuolo, Elena; Williamson, Maia Kavanagh; Davidson, Andrew D; Almuqrin, Abdulaziz; Giambruno, Roberto; Jain, Miten; Frigè, Gianmaria; Olsen, Hugh; Tzertzinis, George; Schildkraut, Ira; Wulf, Madalee G; Corrêa, Ivan R; Ettwiller, Laurence; Clementi, Nicola; Clementi, Massimo; Mancini, Nicasio; Birney, Ewan; Akeson, Mark; Nicassio, Francesco; Matthews, David A; Leonardi, Tommaso.

Nucleic Acids Res ; 50(6): 3475-3489, 2022 04 08.

Article in English | MEDLINE | ID: mdl-35244721

ABSTRACT

The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested subgenomic RNAsused to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5' cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.

Subject(s)

COVID-19 , Nanopores , RNA, Guide, Kinetoplastida/chemistry , COVID-19/genetics , Genome, Viral/genetics , Humans , RNA Caps , RNA, Viral/genetics , RNA, Viral/metabolism , SARS-CoV-2/genetics

20.

The Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel.

Fitzgerald, Tomas; Brettell, Ian; Leger, Adrien; Wolf, Nadeshda; Kusminski, Natalja; Monahan, Jack; Barton, Carl; Herder, Cathrin; Aadepu, Narendar; Gierten, Jakob; Becker, Clara; Hammouda, Omar T; Hasel, Eva; Lischik, Colin; Lust, Katharina; Sokolova, Natalia; Suzuki, Risa; Tsingos, Erika; Tavhelidse, Tinatini; Thumberger, Thomas; Watson, Philip; Welz, Bettina; Khouja, Nadia; Naruse, Kiyoshi; Birney, Ewan; Wittbrodt, Joachim; Loosli, Felix.

Genome Biol ; 23(1): 59, 2022 02 21.

Article in English | MEDLINE | ID: mdl-35189950

ABSTRACT

BACKGROUND: Unraveling the relationship between genetic variation and phenotypic traits remains a fundamental challenge in biology. Mapping variants underlying complex traits while controlling for confounding environmental factors is often problematic. To address this, we establish a vertebrate genetic resource specifically to allow for robust genotype-to-phenotype investigations. The teleost medaka (Oryzias latipes) is an established genetic model system with a long history of genetic research and a high tolerance to inbreeding from the wild. RESULTS: Here we present the Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel: the first near-isogenic panel of 80 inbred lines in a vertebrate model derived from a wild founder population. Inbred lines provide fixed genomes that are a prerequisite for the replication of studies, studies which vary both the genetics and environment in a controlled manner, and functional testing. The MIKK panel will therefore enable phenotype-to-genotype association studies of complex genetic traits while allowing for careful control of interacting factors, with numerous applications in genetic research, human health, drug development, and fundamental biology. CONCLUSIONS: Here we present a detailed characterization of the genetic variation across the MIKK panel, which provides a rich and unique genetic resource to the community by enabling large-scale experiments for mapping complex traits.

Subject(s)

Oryzias , Animals , Genome , Inbreeding , Oryzias/genetics , Phenotype

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL