Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
Add more filters










Publication year range
1.
medRxiv ; 2024 May 29.
Article in English | MEDLINE | ID: mdl-38854034

ABSTRACT

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

3.
Sci Rep ; 14(1): 8842, 2024 04 17.
Article in English | MEDLINE | ID: mdl-38632317

ABSTRACT

Sarcopenia is a serious systemic disease that reduces overall survival. TAVI is selectively performed in patients with severe aortic stenosis who are not indicated for open cardiac surgery due to severe polymorbidity. Artificial intelligence-assisted body composition assessment from available CT scans appears to be a simple tool to stratify these patients into low and high risk based on future estimates of all-cause mortality. Within our study, the segmentation of preprocedural CT scans at the level of the lumbar third vertebra in patients undergoing TAVI was performed using a neural network (AutoMATiCA). The obtained parameters (area and density of skeletal muscles and intramuscular, visceral, and subcutaneous adipose tissue) were analyzed using Cox univariate and multivariable models for continuous and categorical variables to assess the relation of selected variables with all-cause mortality. 866 patients were included (median(interquartile range)): age 79.7 (74.9-83.3) years; BMI 28.9 (25.9-32.6) kg/m2. Survival analysis was performed on all automatically obtained parameters of muscle and fat density and area. Skeletal muscle index (SMI in cm2/m2), visceral (VAT in HU) and subcutaneous adipose tissue (SAT in HU) density predicted the all-cause mortality in patients after TAVI expressed as hazard ratio (HR) with 95% confidence interval (CI): SMI HR 0.986, 95% CI (0.975-0.996); VAT 1.015 (1.002-1.028) and SAT 1.014 (1.004-1.023), all p < 0.05. Automatic body composition assessment can estimate higher all-cause mortality risk in patients after TAVI, which may be useful in preoperative clinical reasoning and stratification of patients.


Subject(s)
Sarcopenia , Humans , Aged , Artificial Intelligence , Adipose Tissue , Muscle, Skeletal , Subcutaneous Fat , Body Composition/physiology , Retrospective Studies
4.
Hum Genet ; 2024 Jan 03.
Article in English | MEDLINE | ID: mdl-38170232

ABSTRACT

Variants which disrupt splicing are a frequent cause of rare disease that have been under-ascertained clinically. Accurate and efficient methods to predict a variant's impact on splicing are needed to interpret the growing number of variants of unknown significance (VUS) identified by exome and genome sequencing. Here, we present the results of the CAGI6 Splicing VUS challenge, which invited predictions of the splicing impact of 56 variants ascertained clinically and functionally validated to determine splicing impact. The performance of 12 prediction methods, along with SpliceAI and CADD, was compared on the 56 functionally validated variants. The maximum accuracy achieved was 82% from two different approaches, one weighting SpliceAI scores by minor allele frequency, and one applying the recently published Splicing Prediction Pipeline (SPiP). SPiP performed optimally in terms of sensitivity, while an ensemble method combining multiple prediction tools and information from databases exceeded all others for specificity. Several challenge methods equalled or exceeded the performance of SpliceAI, with ultimate choice of prediction method likely to depend on experimental or clinical aims. One quarter of the variants were incorrectly predicted by at least 50% of the methods, highlighting the need for further improvements to splicing prediction methods for successful clinical application.

5.
medRxiv ; 2024 Feb 26.
Article in English | MEDLINE | ID: mdl-37503093

ABSTRACT

Objective: Large Language Models such as GPT-4 previously have been applied to differential diagnostic challenges based on published case reports. Published case reports have a sophisticated narrative style that is not readily available from typical electronic health records (EHR). Furthermore, even if such a narrative were available in EHRs, privacy requirements would preclude sending it outside the hospital firewall. We therefore tested a method for parsing clinical texts to extract ontology terms and programmatically generating prompts that by design are free of protected health information. Materials and Methods: We investigated different methods to prepare prompts from 75 recently published case reports. We transformed the original narratives by extracting structured terms representing phenotypic abnormalities, comorbidities, treatments, and laboratory tests and creating prompts programmatically. Results: Performance of all of these approaches was modest, with the correct diagnosis ranked first in only 5.3-17.6% of cases. The performance of the prompts created from structured data was substantially worse than that of the original narrative texts, even if additional information was added following manual review of term extraction. Moreover, different versions of GPT-4 demonstrated substantially different performance on this task. Discussion: The sensitivity of the performance to the form of the prompt and the instability of results over two GPT-4 versions represent important current limitations to the use of GPT-4 to support diagnosis in real-life clinical settings. Conclusion: Research is needed to identify the best methods for creating prompts from typically available clinical data to support differential diagnostics.

6.
Bioinformatics ; 39(12)2023 12 01.
Article in English | MEDLINE | ID: mdl-38001031

ABSTRACT

MOTIVATION: Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS: Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION: Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.


Subject(s)
Algorithms , Language , Humans , Sequence Alignment , Electronic Health Records , Publications
7.
Article in English | MEDLINE | ID: mdl-37684057

ABSTRACT

We identified a de novo heterozygous transient receptor potential cation channel subfamily M (melastatin) member 3 (TRPM3) missense variant, p.(Asn1126Asp), in a patient with developmental delay and manifestations of cerebral palsy (CP) using phenotype-driven prioritization analysis of whole-genome sequencing data with Exomiser. The variant is localized in the functionally important ion transport domain of the TRPM3 protein and predicted to impact the protein structure. Our report adds TRPM3 to the list of Mendelian disease-associated genes that can be associated with CP and provides further evidence for the pathogenicity of the variant p.(Asn1126Asp).


Subject(s)
Cerebral Palsy , Intellectual Disability , Nervous System Malformations , TRPM Cation Channels , Humans , Cerebral Palsy/genetics , Intellectual Disability/genetics , Mutation, Missense/genetics , Phenotype , TRPM Cation Channels/genetics
8.
bioRxiv ; 2023 Oct 11.
Article in English | MEDLINE | ID: mdl-37398049

ABSTRACT

Numerous factors regulate alternative splicing of human genes at a co-transcriptional level. However, how alternative splicing depends on the regulation of gene expression is poorly understood. We leveraged data from the Genotype-Tissue Expression (GTEx) project to show a significant association of gene expression and splicing for 6874 (4.9%) of 141,043 exons in 1106 (13.3%) of 8314 genes with substantially variable expression in ten GTEx tissues. About half of these exons demonstrate higher inclusion with higher gene expression, and half demonstrate higher exclusion, with the observed direction of coupling being highly consistent across different tissues and in external datasets. The exons differ with respect to sequence characteristics, enriched sequence motifs, RNA polymerase II binding, and inferred transcription rate of downstream introns. The exons were enriched for hundreds of isoform-specific Gene Ontology annotations, suggesting that the coupling of expression and alternative splicing described here may provide an important gene regulatory mechanism that might be used in a variety of biological contexts. In particular, higher inclusion exons could play an important role during cell division.

9.
PLoS One ; 18(5): e0285433, 2023.
Article in English | MEDLINE | ID: mdl-37196000

ABSTRACT

The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Phenopacket-tools can be used to validate the syntax and semantics of phenopackets as well as to assess adherence to additional user-defined requirements. The documentation includes examples showing how to use the Java library and the command-line tool to create and validate phenopackets. We demonstrate how to create, convert, and validate phenopackets using the library or the command-line application. Source code, API documentation, comprehensive user guide and a tutorial can be found at https://github.com/phenopackets/phenopacket-tools. The library can be installed from the public Maven Central artifact repository and the application is available as a standalone archive. The phenopacket-tools library helps developers implement and standardize the collection and exchange of phenotypic and other clinical data for use in phenotype-driven genomic diagnostics, translational research, and precision medicine applications.


Subject(s)
Neoplasms , Software , Humans , Genomics , Databases, Factual , Gene Library
10.
Adv Genet (Hoboken) ; 4(1): 2200016, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36910590

ABSTRACT

The Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis. The Phenopacket Schema, together with other GA4GH data and technical standards, will enable data exchange and provide a foundation for the computational analysis of disease and phenotype information to improve our ability to diagnose and conduct research on all types of disorders, including cancer and rare diseases.

12.
Genome Med ; 14(1): 44, 2022 04 28.
Article in English | MEDLINE | ID: mdl-35484572

ABSTRACT

Structural variants (SVs) are implicated in the etiology of Mendelian diseases but have been systematically underascertained owing to sequencing technology limitations. Long-read sequencing enables comprehensive detection of SVs, but approaches for prioritization of candidate SVs are needed. Structural variant Annotation and analysis (SvAnna) assesses all classes of SVs and their intersection with transcripts and regulatory sequences, relating predicted effects on gene function with clinical phenotype data. SvAnna places 87% of deleterious SVs in the top ten ranks. The interpretable prioritizations offered by SvAnna will facilitate the widespread adoption of long-read sequencing in diagnostic genomics. SvAnna is available at https://github.com/TheJacksonLaboratory/SvAnn a .


Subject(s)
Genomics , Base Sequence , Chromosome Mapping , Humans , Sequence Analysis, DNA , Virulence
13.
Hum Mutat ; 43(8): 1071-1081, 2022 08.
Article in English | MEDLINE | ID: mdl-35391505

ABSTRACT

Rare disease diagnostics and disease gene discovery have been revolutionized by whole-exome and genome sequencing but identifying the causative variant(s) from the millions in each individual remains challenging. The use of deep phenotyping of patients and reference genotype-phenotype knowledge, alongside variant data such as allele frequency, segregation, and predicted pathogenicity, has proved an effective strategy to tackle this issue. Here we review the numerous tools that have been developed to automate this approach and demonstrate the power of such an approach on several thousand diagnosed cases from the 100,000 Genomes Project. Finally, we discuss the challenges that need to be overcome if we are going to improve detection rates and help the majority of patients that still remain without a molecular diagnosis after state-of-the-art genomic interpretation.


Subject(s)
Exome , Rare Diseases , Exome/genetics , Genomics , Humans , Phenotype , Rare Diseases/diagnosis , Rare Diseases/genetics , Exome Sequencing
15.
Am J Hum Genet ; 108(9): 1564-1577, 2021 09 02.
Article in English | MEDLINE | ID: mdl-34289339

ABSTRACT

A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5' and 3' ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings.


Subject(s)
Algorithms , Data Curation/methods , Genetic Diseases, Inborn/genetics , RNA Splice Sites , RNA Splicing , Software , Base Sequence , Computational Biology/methods , Exome , Exons , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/pathology , High-Throughput Nucleotide Sequencing , Humans , Introns , Mutation , Exome Sequencing
16.
Int J Pediatr Otorhinolaryngol ; 140: 110499, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33234331

ABSTRACT

Waardenburg syndrome (WS) is a clinically and genetically heterogeneous group of inherited disorders manifesting with sensorineural hearing loss and pigmentary anomalies. Here we present two Caucasian families with novel variants in EDNRB and SOX10 representing both sides of phenotype spectrum in WS. The c.521G>A variant in EDNRB identified in Family 1 leads to disruption of the cysteine disulfide bridge between extracellular segments of endothelin receptor type B and causes relatively mild phenotype of WS type II with low penetrance. The novel nonsense variant c.900C>A in SOX10 detected in Family 2 leads to PCWH syndrome and was found to be lethal.


Subject(s)
Waardenburg Syndrome , Humans , Mutation , Phenotype , Receptor, Endothelin B/genetics , SOXE Transcription Factors/genetics , Syndrome , Waardenburg Syndrome/genetics
17.
Nucleic Acids Res ; 49(D1): D1207-D1217, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33264411

ABSTRACT

The Human Phenotype Ontology (HPO, https://hpo.jax.org) was launched in 2008 to provide a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human disease. The HPO is now a worldwide standard for phenotype exchange. The HPO has grown steadily since its inception due to considerable contributions from clinical experts and researchers from a diverse range of disciplines. Here, we present recent major extensions of the HPO for neurology, nephrology, immunology, pulmonology, newborn screening, and other areas. For example, the seizure subontology now reflects the International League Against Epilepsy (ILAE) guidelines and these enhancements have already shown clinical validity. We present new efforts to harmonize computational definitions of phenotypic abnormalities across the HPO and multiple phenotype ontologies used for animal models of disease. These efforts will benefit software such as Exomiser by improving the accuracy and scope of cross-species phenotype matching. The computational modeling strategy used by the HPO to define disease entities and phenotypic features and distinguish between them is explained in detail.We also report on recent efforts to translate the HPO into indigenous languages. Finally, we summarize recent advances in the use of HPO in electronic health record systems.


Subject(s)
Biological Ontologies , Computational Biology/methods , Databases, Factual , Disease/genetics , Genome , Phenotype , Software , Animals , Disease Models, Animal , Genotype , Humans , Infant, Newborn , International Cooperation , Internet , Neonatal Screening/methods , Pharmacogenetics/methods , Terminology as Topic
18.
Am J Hum Genet ; 107(3): 403-417, 2020 09 03.
Article in English | MEDLINE | ID: mdl-32755546

ABSTRACT

Human Phenotype Ontology (HPO)-based analysis has become standard for genomic diagnostics of rare diseases. Current algorithms use a variety of semantic and statistical approaches to prioritize the typically long lists of genes with candidate pathogenic variants. These algorithms do not provide robust estimates of the strength of the predictions beyond the placement in a ranked list, nor do they provide measures of how much any individual phenotypic observation has contributed to the prioritization result. However, given that the overall success rate of genomic diagnostics is only around 25%-50% or less in many cohorts, a good ranking cannot be taken to imply that the gene or disease at rank one is necessarily a good candidate. Here, we present an approach to genomic diagnostics that exploits the likelihood ratio (LR) framework to provide an estimate of (1) the posttest probability of candidate diagnoses, (2) the LR for each observed HPO phenotype, and (3) the predicted pathogenicity of observed genotypes. LIkelihood Ratio Interpretation of Clinical AbnormaLities (LIRICAL) placed the correct diagnosis within the first three ranks in 92.9% of 384 case reports comprising 262 Mendelian diseases, and the correct diagnosis had a mean posttest probability of 67.3%. Simulations show that LIRICAL is robust to many typically encountered forms of genomic and phenomic noise. In summary, LIRICAL provides accurate, clinically interpretable results for phenotype-driven genomic diagnostics.


Subject(s)
Computational Biology , Databases, Genetic , Genomics , Rare Diseases/diagnosis , Algorithms , Exome/genetics , Humans , Phenotype , Rare Diseases/genetics , Software
19.
Genome Biol ; 21(1): 171, 2020 07 13.
Article in English | MEDLINE | ID: mdl-32660516

ABSTRACT

We present Hierarchical Bayesian Analysis of Differential Expression and ALternative Splicing (HBA-DEALS), which simultaneously characterizes differential expression and splicing in cohorts. HBA-DEALS attains state of the art or better performance for both expression and splicing and allows genes to be characterized as having differential gene expression, differential alternative splicing, both, or neither. HBA-DEALS analysis of GTEx data demonstrated sets of genes that show predominant DGE or DAST across multiple tissue types. These sets have pervasive differences with respect to gene structure, function, membership in protein complexes, and promoter architecture.


Subject(s)
Alternative Splicing , Gene Expression , Models, Biological , Sequence Analysis, RNA , Software , Bayes Theorem
20.
Gigascience ; 9(5)2020 05 01.
Article in English | MEDLINE | ID: mdl-32444882

ABSTRACT

BACKGROUND: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data. RESULTS: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version. CONCLUSIONS: parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF.


Subject(s)
Computational Biology/methods , Genetic Predisposition to Disease , Genetic Variation , Genome-Wide Association Study/methods , Software , Algorithms , Databases, Genetic , Genomics/methods , Humans , Machine Learning , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...