Search | VHL Regional Portal

Validation of a computational phenotype for finding patients eligible for genetic testing for pathogenic PTEN variants across three centers.

Kothari, Cartik; Srivastava, Siddharth; Kousa, Youssef; Izem, Rima; Gierdalski, Marcin; Kim, Dongkyu; Good, Amy; Dies, Kira A; Geisel, Gregory; Morizono, Hiroki; Gallo, Vittorio; Pomeroy, Scott L; Garden, Gwenn A; Guay-Woodford, Lisa; Sahin, Mustafa; Avillach, Paul.

J Neurodev Disord ; 14(1): 24, 2022 03 23.

Article in English | MEDLINE | ID: mdl-35321655

ABSTRACT

BACKGROUND: Computational phenotypes are most often combinations of patient billing codes that are highly predictive of disease using electronic health records (EHR). In the case of rare diseases that can only be diagnosed by genetic testing, computational phenotypes identify patient cohorts for genetic testing and possible diagnosis. This article details the validation of a computational phenotype for PTEN hamartoma tumor syndrome (PHTS) against the EHR of patients at three collaborating clinical research centers: Boston Children's Hospital, Children's National Hospital, and the University of Washington. METHODS: A combination of billing codes from the International Classification of Diseases versions 9 and 10 (ICD-9 and ICD-10) for diagnostic criteria postulated by a research team at Cleveland Clinic was used to identify patient cohorts for genetic testing from the clinical data warehouses at the three research centers. Subsequently, the EHR-including billing codes, clinical notes, and genetic reports-of these patients were reviewed by clinical experts to identify patients with PHTS. RESULTS: The PTEN genetic testing yield of the computational phenotype, the number of patients who needed to be genetically tested for incidence of pathogenic PTEN gene variants, ranged from 82 to 94% at the three centers. CONCLUSIONS: Computational phenotypes have the potential to enable the timely and accurate diagnosis of rare genetic diseases such as PHTS by identifying patient cohorts for genetic sequencing and testing.

Subject(s)

Genetic Testing , Hamartoma Syndrome, Multiple , Electronic Health Records , Hamartoma Syndrome, Multiple/diagnosis , Hamartoma Syndrome, Multiple/genetics , Hamartoma Syndrome, Multiple/pathology , Humans , PTEN Phosphohydrolase/genetics , Phenotype

GenoPheno: cataloging large-scale phenotypic and next-generation sequencing data within human datasets.

Gutiérrez-Sacristán, Alba; De Niz, Carlos; Kothari, Cartik; Kong, Sek Won; Mandl, Kenneth D; Avillach, Paul.

Brief Bioinform ; 22(1): 55-65, 2021 01 18.

Article in English | MEDLINE | ID: mdl-32249310

ABSTRACT

Precision medicine promises to revolutionize treatment, shifting therapeutic approaches from the classical one-size-fits-all to those more tailored to the patient's individual genomic profile, lifestyle and environmental exposures. Yet, to advance precision medicine's main objective-ensuring the optimum diagnosis, treatment and prognosis for each individual-investigators need access to large-scale clinical and genomic data repositories. Despite the vast proliferation of these datasets, locating and obtaining access to many remains a challenge. We sought to provide an overview of available patient-level datasets that contain both genotypic data, obtained by next-generation sequencing, and phenotypic data-and to create a dynamic, online catalog for consultation, contribution and revision by the research community. Datasets included in this review conform to six specific inclusion parameters that are: (i) contain data from more than 500 human subjects; (ii) contain both genotypic and phenotypic data from the same subjects; (iii) include whole genome sequencing or whole exome sequencing data; (iv) include at least 100 recorded phenotypic variables per subject; (v) accessible through a website or collaboration with investigators and (vi) make access information available in English. Using these criteria, we identified 30 datasets, reviewed them and provided results in the release version of a catalog, which is publicly available through a dynamic Web application and on GitHub. Users can review as well as contribute new datasets for inclusion (Web: https://avillachlab.shinyapps.io/genophenocatalog/; GitHub: https://github.com/hms-dbmi/GenoPheno-CatalogShiny).

Subject(s)

Databases, Genetic , High-Throughput Nucleotide Sequencing/methods , Phenotype , Precision Medicine/methods , Genetic Predisposition to Disease , Humans , Whole Genome Sequencing/methods

Phelan-McDermid syndrome data network: Integrating patient reported outcomes with clinical notes and curated genetic reports.

Kothari, Cartik; Wack, Maxime; Hassen-Khodja, Claire; Finan, Sean; Savova, Guergana; O'Boyle, Megan; Bliss, Geraldine; Cornell, Andria; Horn, Elizabeth J; Davis, Rebecca; Jacobs, Jacquelyn; Kohane, Isaac; Avillach, Paul.

Am J Med Genet B Neuropsychiatr Genet ; 177(7): 613-624, 2018 10.

Article in English | MEDLINE | ID: mdl-28862395

ABSTRACT

The heterogeneity of patient phenotype data are an impediment to the research into the origins and progression of neuropsychiatric disorders. This difficulty is compounded in the case of rare disorders such as Phelan-McDermid Syndrome (PMS) by the paucity of patient clinical data. PMS is a rare syndromic genetic cause of autism and intellectual deficiency. In this paper, we describe the Phelan-McDermid Syndrome Data Network (PMS_DN), a platform that facilitates research into phenotype-genotype correlation and progression of PMS by: a) integrating knowledge of patient phenotypes extracted from Patient Reported Outcomes (PRO) data and clinical notes-two heterogeneous, underutilized sources of knowledge about patient phenotypes-with curated genetic information from the same patient cohort and b) making this integrated knowledge, along with a suite of statistical tools, available free of charge to authorized investigators on a Web portal https://pmsdn.hms.harvard.edu. PMS_DN is a Patient Centric Outcomes Research Initiative (PCORI) where patients and their families are involved in all aspects of the management of patient data in driving research into PMS. To foster collaborative research, PMS_DN also makes patient aggregates from this knowledge available to authorized investigators using distributed research networks such as the PCORnet PopMedNet. PMS_DN is hosted on a scalable cloud based environment and complies with all patient data privacy regulations. As of October 31, 2016, PMS_DN integrates high-quality knowledge extracted from the clinical notes of 112 patients and curated genetic reports of 176 patients with preprocessed PRO data from 415 patients.

Subject(s)

Data Mining/methods , Genetic Association Studies/methods , Information Storage and Retrieval/methods , Autism Spectrum Disorder/genetics , Chromosome Deletion , Chromosome Disorders/genetics , Chromosome Disorders/physiopathology , Chromosomes, Human, Pair 22/genetics , Cohort Studies , Databases, Genetic , Female , Humans , Intellectual Disability/genetics , Male , Medical Records , Nerve Tissue Proteins/genetics , Patient Reported Outcome Measures , Phenotype

A database of human exposomes and phenomes from the US National Health and Nutrition Examination Survey.

Patel, Chirag J; Pho, Nam; McDuffie, Michael; Easton-Marks, Jeremy; Kothari, Cartik; Kohane, Isaac S; Avillach, Paul.

Sci Data ; 3: 160096, 2016 10 25.

Article in English | MEDLINE | ID: mdl-27779619

ABSTRACT

The National Health and Nutrition Examination Survey (NHANES) is a population survey implemented by the Centers for Disease Control and Prevention (CDC) to monitor the health of the United States whose data is publicly available in hundreds of files. This Data Descriptor describes a single unified and universally accessible data file, merging across 255 separate files and stitching data across 4 surveys, encompassing 41,474 individuals and 1,191 variables. The variables consist of phenotype and environmental exposure information on each individual, specifically (1) demographic information, physical exam results (e.g., height, body mass index), laboratory results (e.g., cholesterol, glucose, and environmental exposures), and (4) questionnaire items. Second, the data descriptor describes a dictionary to enable analysts find variables by category and human-readable description. The datasets are available on DataDryad and a hands-on analytics tutorial is available on GitHub. Through a new big data platform, BD2K Patient Centered Information Commons (http://pic-sure.org), we provide a new way to browse the dataset via a web browser (https://nhanes.hms.harvard.edu) and provide application programming interface for programmatic access.

Subject(s)

Databases, Factual , Nutrition Surveys , Centers for Disease Control and Prevention, U.S. , Environmental Exposure , Humans , United States

A Metadata based Knowledge Discovery Methodology for Seeding Translational Research.

Kothari, Cartik R; Payne, Philip R O.

Stud Health Technol Inform ; 216: 1071, 2015.

Article in English | MEDLINE | ID: mdl-26262370

ABSTRACT

In this paper, we present a semantic, metadata based knowledge discovery methodology for identifying teams of researchers from diverse backgrounds who can collaborate on interdisciplinary research projects: projects in areas that have been identified as high-impact areas at The Ohio State University. This methodology involves the semantic annotation of keywords and the postulation of semantic metrics to improve the efficiency of the path exploration algorithm as well as to rank the results. Results indicate that our methodology can discover groups of experts from diverse areas who can collaborate on translational research projects.

Subject(s)

Databases, Bibliographic/classification , Knowledge Discovery/methods , Metadata/classification , Patient Care Team/organization & administration , Personnel Staffing and Scheduling/organization & administration , Translational Research, Biomedical , Data Mining/methods , Interdisciplinary Studies , Knowledge Bases , Machine Learning , Natural Language Processing , Workforce

Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature.

Dahdul, Wasila M; Balhoff, James P; Engeman, Jeffrey; Grande, Terry; Hilton, Eric J; Kothari, Cartik; Lapp, Hilmar; Lundberg, John G; Midford, Peter E; Vision, Todd J; Westerfield, Monte; Mabee, Paula M.

PLoS One ; 5(5): e10708, 2010 May 20.

Article in English | MEDLINE | ID: mdl-20505755

ABSTRACT

BACKGROUND: The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. METHODOLOGY/PRINCIPAL FINDINGS: We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. CONCLUSIONS/SIGNIFICANCE: The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.

Subject(s)

Biological Evolution , Computational Biology/methods , Databases, Genetic , Publications , Systems Biology , Animals , Fishes/growth & development , Phenotype

Phenex: ontological annotation of phenotypic diversity.

Balhoff, James P; Dahdul, Wasila M; Kothari, Cartik R; Lapp, Hilmar; Lundberg, John G; Mabee, Paula; Midford, Peter E; Westerfield, Monte; Vision, Todd J.

PLoS One ; 5(5): e10500, 2010 May 05.

Article in English | MEDLINE | ID: mdl-20463926

ABSTRACT

BACKGROUND: Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge. METHODOLOGY/PRINCIPAL FINDINGS: Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be configured to load only those ontologies pertinent to a taxonomic group of interest. The graphical user interface was optimized for evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices. CONCLUSIONS/SIGNIFICANCE: Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to integrate phenotypic data from different organisms and studies, leveraging decades of work in systematics and comparative morphology.

Subject(s)

Biodiversity , Computational Biology/methods , Software , Biological Evolution , Internet , Phenotype , Semantics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL