Search | VHL Regional Portal

Guidelines for releasing a variant effect predictor.

Livesey, Benjamin J; Badonyi, Mihaly; Dias, Mafalda; Frazer, Jonathan; Kumar, Sushant; Lindorff-Larsen, Kresten; McCandlish, David M; Orenbuch, Rose; Shearer, Courtney A; Muffley, Lara; Foreman, Julia; Glazer, Andrew M; Lehner, Ben; Marks, Debora S; Roth, Frederick P; Rubin, Alan F; Starita, Lea M; Marsh, Joseph A.

ArXiv ; 2024 Apr 16.

Article in English | MEDLINE | ID: mdl-38699161

ABSTRACT

Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs. Emphasising open-source availability, transparent methodologies, clear variant effect score interpretations, standardised scales, accessible predictions, and rigorous training data disclosure, we aim to improve the usability and interpretability of VEPs, and promote their integration into analysis and evaluation pipelines. We also provide a large, categorised list of currently available VEPs, aiming to facilitate the discovery and encourage the usage of novel methods within the scientific community.

Deep generative modeling of the human proteome reveals over a hundred novel genes involved in rare genetic disorders.

Orenbuch, Rose; Kollasch, Aaron W; Spinner, Hansen D; Shearer, Courtney A; Hopf, Thomas A; Franceschi, Dinko; Dias, Mafalda; Frazer, Jonathan; Marks, Debora S.

Res Sq ; 2024 Jan 04.

Article in English | MEDLINE | ID: mdl-38260496

ABSTRACT

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome1-6. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data7 and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders8 from potentially healthy individuals9. popEVE identifies 442 genes in patients this developmental disorder cohort, including evidence of 123 novel genetic disorders, many without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. A majority of these variants are close to interacting partners in 3D complexes. Preliminary analyses on child exomes indicate that popEVE can identify candidate variants without the need for inheritance labels. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable.

Deep generative modeling of the human proteome reveals over a hundred novel genes involved in rare genetic disorders.

Orenbuch, Rose; Kollasch, Aaron W; Spinner, Hansen D; Shearer, Courtney A; Hopf, Thomas A; Franceschi, Dinko; Dias, Mafalda; Frazer, Jonathan; Marks, Debora S.

medRxiv ; 2023 Nov 28.

Article in English | MEDLINE | ID: mdl-38076790

ABSTRACT

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders from potentially healthy individuals. popEVE identifies 442 genes in a cohort of developmental disorder cases, including evidence of 119 novel genetic disorders without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable. Interactive web viewer and downloads available at pop.evemodel.org.

Evaluation of Homology-Independent CRISPR-Cas9 Off-Target Assessment Methods.

Chaudhari, Hemangi G; Penterman, Jon; Whitton, Holly J; Spencer, Sarah J; Flanagan, Nicole; Lei Zhang, Maria C; Huang, Elaine; Khedkar, Aditya S; Toomey, J Mike; Shearer, Courtney A; Needham, Alexander W; Ho, Tony W; Kulman, John D; Cradick, T J; Kernytsky, Andrew.

CRISPR J ; 3(6): 440-453, 2020 12.

Article in English | MEDLINE | ID: mdl-33346710

ABSTRACT

The ability to alter genomes specifically by CRISPR-Cas gene editing has revolutionized biological research, biotechnology, and medicine. Broad therapeutic application of this technology, however, will require thorough preclinical assessment of off-target editing by homology-based prediction coupled with reliable methods for detecting off-target editing. Several off-target site nomination assays exist, but careful comparison is needed to ascertain their relative strengths and weaknesses. In this study, HEK293T cells were treated with Streptococcus pyogenes Cas9 and eight guide RNAs with varying levels of predicted promiscuity in order to compare the performance of three homology-independent off-target nomination methods: the cell-based assay, GUIDE-seq, and the biochemical assays CIRCLE-seq and SITE-seq. The three methods were benchmarked by sequencing 75,000 homology-nominated sites using hybrid capture followed by high-throughput sequencing, providing the most comprehensive assessment of such methods to date. The three methods performed similarly in nominating sequence-confirmed off-target sites, but with large differences in the total number of sites nominated. When combined with homology-dependent nomination methods and confirmation by sequencing, all three off-target nomination methods provide a comprehensive assessment of off-target activity. GUIDE-seq's low false-positive rate and the high correlation of its signal with observed editing highlight its suitability for nominating off-target sites for ex vivo CRISPR-Cas therapies.

Subject(s)

Gene Editing/ethics , Gene Editing/methods , Gene Editing/trends , Artifacts , CRISPR-Cas Systems/genetics , Clustered Regularly Interspaced Short Palindromic Repeats , Genome, Human/genetics , Genomic Instability/genetics , HEK293 Cells , High-Throughput Nucleotide Sequencing/methods , Humans , RNA, Guide, Kinetoplastida/genetics , Streptococcus pyogenes/genetics , Streptococcus pyogenes/pathogenicity

Uncovering biomarker genes with enriched classification potential from Hallmark gene sets.

Targonski, Colin A; Shearer, Courtney A; Shealy, Benjamin T; Smith, Melissa C; Feltus, F Alex.

Sci Rep ; 9(1): 9747, 2019 07 05.

Article in English | MEDLINE | ID: mdl-31278367

ABSTRACT

Given the complex relationship between gene expression and phenotypic outcomes, computationally efficient approaches are needed to sift through large high-dimensional datasets in order to identify biologically relevant biomarkers. In this report, we describe a method of identifying the most salient biomarker genes in a dataset, which we call "candidate genes", by evaluating the ability of gene combinations to classify samples from a dataset, which we call "classification potential". Our algorithm, Gene Oracle, uses a neural network to test user defined gene sets for polygenic classification potential and then uses a combinatorial approach to further decompose selected gene sets into candidate and non-candidate biomarker genes. We tested this algorithm on curated gene sets from the Molecular Signatures Database (MSigDB) quantified in RNAseq gene expression matrices obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) data repositories. First, we identified which MSigDB Hallmark subsets have significant classification potential for both the TCGA and GTEx datasets. Then, we identified the most discriminatory candidate biomarker genes in each Hallmark gene set and provide evidence that the improved biomarker potential of these genes may be due to reduced functional complexity.

Subject(s)

Biomarkers, Tumor/genetics , Genetic Association Studies , Genetic Predisposition to Disease , Oncogenes , Algorithms , Computational Biology/methods , Databases, Genetic , Gene Expression Profiling , Gene Ontology , Humans

Poor feed efficiency in sheep is associated with several structural abnormalities in the community metabolic network of their ruminal microbes.

Patil, Rocky D; Ellison, Melinda J; Wolff, Sara M; Shearer, Courtney; Wright, Anna M; Cockrum, Rebecca R; Austin, Kathy J; Lamberson, William R; Cammack, Kristi M; Conant, Gavin C.

J Anim Sci ; 96(6): 2113-2124, 2018 Jun 04.

Article in English | MEDLINE | ID: mdl-29788417

ABSTRACT

Ruminant animals have a symbiotic relationship with the microorganisms in their rumens. In this relationship, rumen microbes efficiently degrade complex plant-derived compounds into smaller digestible compounds, a process that is very likely associated with host animal feed efficiency. The resulting simpler metabolites can then be absorbed by the host and converted into other compounds by host enzymes. We used a microbial community metabolic network inferred from shotgun metagenomics data to assess how this metabolic system differs between animals that are able to turn ingested feedstuffs into body mass with high efficiency and those that are not. We conducted shotgun sequencing of microbial DNA from the rumen contents of 16 sheep that differed in their residual feed intake (RFI), a measure of feed efficiency. Metagenomic reads from each sheep were mapped onto a database-derived microbial metabolic network, which was linked to the sheep metabolic network by interface metabolites (metabolites transferred from microbes to host). No single enzyme was identified as being significantly different in abundance between the low and high RFI animals (P > 0.05, Wilcoxon test). However, when we analyzed the metabolic network as a whole, we found several differences between efficient and inefficient animals. Microbes from low RFI (efficient) animals use a suite of enzymes closer in network space to the host's reactions than those of the high RFI (inefficient) animals. Similarly, low RFI animals have microbial metabolic networks that, on average, contain reactions using shorter carbon chains than do those of high RFI animals, potentially allowing the host animals to extract metabolites more efficiently. Finally, the efficient animals possess community networks with greater Shannon diversity among their enzymes than do inefficient ones. Thus, our system approach to the ruminal microbiome identified differences attributable to feed efficiency in the structure of the microbes' community metabolic network that were undetected at the level of individual microbial taxa or reactions.

Subject(s)

Animal Feed/analysis , Gastrointestinal Microbiome , Metabolic Networks and Pathways , Metagenomics , Sheep/physiology , Animals , Female , Rumen/metabolism , Rumen/microbiology , Sheep/microbiology

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL