Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 50
Filter
1.
mSystems ; : e0141523, 2024 May 31.
Article in English | MEDLINE | ID: mdl-38819130

ABSTRACT

Wastewater surveillance has emerged as a crucial public health tool for population-level pathogen surveillance. Supported by funding from the American Rescue Plan Act of 2021, the FDA's genomic epidemiology program, GenomeTrakr, was leveraged to sequence SARS-CoV-2 from wastewater sites across the United States. This initiative required the evaluation, optimization, development, and publication of new methods and analytical tools spanning sample collection through variant analyses. Version-controlled protocols for each step of the process were developed and published on protocols.io. A custom data analysis tool and a publicly accessible dashboard were built to facilitate real-time visualization of the collected data, focusing on the relative abundance of SARS-CoV-2 variants and sub-lineages across different samples and sites throughout the project. From September 2021 through June 2023, a total of 3,389 wastewater samples were collected, with 2,517 undergoing sequencing and submission to NCBI under the umbrella BioProject, PRJNA757291. Sequence data were released with explicit quality control (QC) tags on all sequence records, communicating our confidence in the quality of data. Variant analysis revealed wide circulation of Delta in the fall of 2021 and captured the sweep of Omicron and subsequent diversification of this lineage through the end of the sampling period. This project successfully achieved two important goals for the FDA's GenomeTrakr program: first, contributing timely genomic data for the SARS-CoV-2 pandemic response, and second, establishing both capacity and best practices for culture-independent, population-level environmental surveillance for other pathogens of interest to the FDA. IMPORTANCE: This paper serves two primary objectives. First, it summarizes the genomic and contextual data collected during a Covid-19 pandemic response project, which utilized the FDA's laboratory network, traditionally employed for sequencing foodborne pathogens, for sequencing SARS-CoV-2 from wastewater samples. Second, it outlines best practices for gathering and organizing population-level next generation sequencing (NGS) data collected for culture-free, surveillance of pathogens sourced from environmental samples.

2.
Cell Genom ; 4(1): 100466, 2024 Jan 10.
Article in English | MEDLINE | ID: mdl-38190108

ABSTRACT

The data-intensive fields of genomics and machine learning (ML) are in an early stage of convergence. Genomics researchers increasingly seek to harness the power of ML methods to extract knowledge from their data; conversely, ML scientists recognize that genomics offers a wealth of large, complex, and well-annotated datasets that can be used as a substrate for developing biologically relevant algorithms and applications. The National Human Genome Research Institute (NHGRI) inquired with researchers working in these two fields to identify common challenges and receive recommendations to better support genomic research efforts using ML approaches. Those included increasing the amount and variety of training datasets by integrating genomic with multiomics, context-specific (e.g., by cell type), and social determinants of health datasets; reducing the inherent biases of training datasets; prioritizing transparency and interpretability of ML methods; and developing privacy-preserving technologies for research participants' data.


Subject(s)
Bioethics , Genomics , Humans , Algorithms , Privacy , Machine Learning
3.
Nature ; 606(7913): 382-388, 2022 06.
Article in English | MEDLINE | ID: mdl-35614220

ABSTRACT

Mitochondria are epicentres of eukaryotic metabolism and bioenergetics. Pioneering efforts in recent decades have established the core protein componentry of these organelles1 and have linked their dysfunction to more than 150 distinct disorders2,3. Still, hundreds of mitochondrial proteins lack clear functions4, and the underlying genetic basis for approximately 40% of mitochondrial disorders remains unresolved5. Here, to establish a more complete functional compendium of human mitochondrial proteins, we profiled more than 200 CRISPR-mediated HAP1 cell knockout lines using mass spectrometry-based multiomics analyses. This effort generated approximately 8.3 million distinct biomolecule measurements, providing a deep survey of the cellular responses to mitochondrial perturbations and laying a foundation for mechanistic investigations into protein function. Guided by these data, we discovered that PIGY upstream open reading frame (PYURF) is an S-adenosylmethionine-dependent methyltransferase chaperone that supports both complex I assembly and coenzyme Q biosynthesis and is disrupted in a previously unresolved multisystemic mitochondrial disorder. We further linked the putative zinc transporter SLC30A9 to mitochondrial ribosomes and OxPhos integrity and established RAB5IF as the second gene harbouring pathogenic variants that cause cerebrofaciothoracic dysplasia. Our data, which can be explored through the interactive online MITOMICS.app resource, suggest biological roles for many other orphan mitochondrial proteins that still lack robust functional characterization and define a rich cell signature of mitochondrial dysfunction that can support the genetic diagnosis of mitochondrial diseases.


Subject(s)
Mitochondria , Mitochondrial Proteins , Cation Transport Proteins , Cell Cycle Proteins , Energy Metabolism , Humans , Mass Spectrometry , Mitochondria/genetics , Mitochondria/metabolism , Mitochondrial Diseases/genetics , Mitochondrial Diseases/metabolism , Mitochondrial Proteins/genetics , Mitochondrial Proteins/metabolism , Transcription Factors , rab5 GTP-Binding Proteins
4.
Am J Respir Cell Mol Biol ; 67(4): 430-437, 2022 Oct.
Article in English | MEDLINE | ID: mdl-35580164

ABSTRACT

Chromosome 17q12-q21 is the most replicated genetic locus for childhood-onset asthma. Polymorphisms in this locus containing ∼10 genes interact with a variety of environmental exposures in the home and outdoors to modify asthma risk. However, the functional basis for these associations and their linkages to the environment have remained enigmatic. Within this extended region, regulation of GSDMB (gasdermin B) expression in airway epithelial cells has emerged as the primary mechanism underlying the 17q12-q21 genome-wide association study signal. Asthma-associated SNPs influence the abundance of GSDMB transcripts as well as the functional properties of GSDMB protein in airway epithelial cells. GSDMB is a member of the gasdermin family of proteins, which regulate pyroptosis and inflammatory responses to microbial infections. The aims of this review are to synthesize recent studies on the relationship of 17q12-q21 SNPs to childhood asthma and the evidence pointing to GSDMB gene expression or protein function as the underlying mechanism and to explore the potential functions of GSDMB that may influence the risk of developing asthma during childhood.


Subject(s)
Asthma , Genome-Wide Association Study , Pore Forming Cytotoxic Proteins/genetics , Asthma/genetics , Asthma/metabolism , Genetic Loci , Genetic Predisposition to Disease , Humans , Neoplasm Proteins/genetics , Neoplasm Proteins/metabolism , Polymorphism, Single Nucleotide
5.
BMC Genomics ; 21(1): 771, 2020 Nov 09.
Article in English | MEDLINE | ID: mdl-33167865

ABSTRACT

BACKGROUND: Deep neural networks (DNN) are a particular case of artificial neural networks (ANN) composed by multiple hidden layers, and have recently gained attention in genome-enabled prediction of complex traits. Yet, few studies in genome-enabled prediction have assessed the performance of DNN compared to traditional regression models. Strikingly, no clear superiority of DNN has been reported so far, and results seem highly dependent on the species and traits of application. Nevertheless, the relatively small datasets used in previous studies, most with fewer than 5000 observations may have precluded the full potential of DNN. Therefore, the objective of this study was to investigate the impact of the dataset sample size on the performance of DNN compared to Bayesian regression models for genome-enable prediction of body weight in broilers by sub-sampling 63,526 observations of the training set. RESULTS: Predictive performance of DNN improved as sample size increased, reaching a plateau at about 0.32 of prediction correlation when 60% of the entire training set size was used (i.e., 39,510 observations). Interestingly, DNN showed superior prediction correlation using up to 3% of training set, but poorer prediction correlation after that compared to Bayesian Ridge Regression (BRR) and Bayes Cπ. Regardless of the amount of data used to train the predictive machines, DNN displayed the lowest mean square error of prediction compared to all other approaches. The predictive bias was lower for DNN compared to Bayesian models, across all dataset sizes, with estimates close to one with larger sample sizes. CONCLUSIONS: DNN had worse prediction correlation compared to BRR and Bayes Cπ, but improved mean square error of prediction and bias relative to both Bayesian models for genome-enabled prediction of body weight in broilers. Such findings, highlights advantages and disadvantages between predictive approaches depending on the criterion used for comparison. Furthermore, the inclusion of more data per se is not a guarantee for the DNN to outperform the Bayesian regression methods commonly used for genome-enabled prediction. Nonetheless, further analysis is necessary to detect scenarios where DNN can clearly outperform Bayesian benchmark models.


Subject(s)
Chickens , Multifactorial Inheritance , Animals , Bayes Theorem , Body Weight , Chickens/genetics , Neural Networks, Computer , Sample Size
6.
AMIA Jt Summits Transl Sci Proc ; 2020: 98-107, 2020.
Article in English | MEDLINE | ID: mdl-32477628

ABSTRACT

Asthma is a prevalent chronic respiratory condition, and acute exacerbations represent a significant fraction of the economic and health-related costs associated with asthma. We present results from a novel study that is focused on modeling asthma exacerbations from data contained in patients' electronic health records. This work makes the following contributions: (i) we develop an algorithm for phenotyping asthma exacerbations from EHRs, (ii) we determine that models learned via supervised learning approaches can predict asthma exacerbations in the near future (AUC ≈ 0.77), and (iii) we develop an approach, based on mixtures of semi-Markov models, that is able to identify subpopula-tions of asthma patients sharing distinct temporal and seasonal patterns in their exacerbation susceptibility.

7.
AMIA Jt Summits Transl Sci Proc ; 2020: 477-486, 2020.
Article in English | MEDLINE | ID: mdl-32477669

ABSTRACT

Diabetes mellitus is the putative cause of a number of pathologies occurring in the bony and soft tissues of the maxillo-facial region and is known to exacerbate other oral diseases such as periodontitis.We present the first use of clinical panoramic radiographs for a secondary analysis of disease, with a focus on identifying hotspots in the maxillofacial region that are associated with diabetes. We developed a curated data set using Consensus Landmark Points (CLPs) and used that data to develop an analysis pipeline. This pipeline entailed automatic data cleansing, registration, and intensity normalization. The pipeline was used to process 7280 uncurated images that were subsequently analyzed using pixel-wise methods for a case/control study of patients with a history of diabetes. We detected statistically significant clusters of pixels that demarcated anatomical hotspots specific to the diabetic patients.

8.
Prev Med ; 136: 106061, 2020 07.
Article in English | MEDLINE | ID: mdl-32179026

ABSTRACT

Just under half of the 85.7 million US adults with hypertension have uncontrolled blood pressure using a hypertension threshold of systolic pressure ≥ 140 or diastolic pressure ≥ 90. Uncontrolled hypertension increases risks of death, stroke, heart failure, and myocardial infarction. Guidelines on hypertension management include lifestyle modification such as diet and exercise. In order to improve hypertension control, it is important to identify predictors of lifestyle modification assessment or advice to tailor future interventions using these effective, low-risk interventions. Electronic health record data from 14,360 adult hypertension patients at an academic medical center were analyzed using statistical and machine learning methods to identify predictors and timing of lifestyle modification. Multiple variables were statistically significant in analysis of lifestyle modification documentation at multiple time points. Random Forest was the best machine learning method to classify lifestyle modification documentation at any time with Area Under the Receiver Operator Curve (AUROC) 0.831. Logistic regression was the best machine learning method for classifying lifestyle modification documentation at ≤3 months with an AUROC of 0.685. Analyzing narrative and coded data from electronic health records can improve understanding of timing of lifestyle modification and patient, clinic and provider characteristics that are correlated with or predictive of documentation of lifestyle modification for hypertension. This information can inform improvement efforts in hypertension care processes, treatment implementation, and ultimately hypertension control.


Subject(s)
Electronic Health Records , Hypertension , Adult , Documentation , Humans , Hypertension/prevention & control , Life Style , Machine Learning
9.
J Comput Biol ; 27(3): 403-417, 2020 03.
Article in English | MEDLINE | ID: mdl-32053004

ABSTRACT

Advances in systems biology have made clear the importance of network models for capturing knowledge about complex relationships in gene regulation, metabolism, and cellular signaling. A common approach to uncovering biological networks involves performing perturbations on elements of the network, such as gene knockdown experiments, and measuring how the perturbation affects some reporter of the process under study. In this article, we develop context-specific nested effects models (CSNEMs), an approach to inferring such networks that generalizes nested effects models (NEMs). The main contribution of this work is that CSNEMs explicitly model the participation of a gene in multiple contexts, meaning that a gene can appear in multiple places in the network. Biologically, the representation of regulators in multiple contexts may indicate that these regulators have distinct roles in different cellular compartments or cell cycle phases. We present an evaluation of the method on simulated data as well as on data from a study of the sodium chloride stress response in Saccharomyces cerevisiae.


Subject(s)
Saccharomyces cerevisiae/drug effects , Sodium Chloride/pharmacology , Systems Biology/methods , Algorithms , Gene Expression Regulation, Fungal/drug effects , Gene Regulatory Networks/drug effects , Models, Genetic , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics
10.
PLoS One ; 15(2): e0228105, 2020.
Article in English | MEDLINE | ID: mdl-32023271

ABSTRACT

The use of natural language data for animal population surveillance represents a valuable opportunity to gather information about potential disease outbreaks, emerging zoonotic diseases, or bioterrorism threats. In this study, we evaluate machine learning methods for conducting syndromic surveillance using free-text veterinary necropsy reports. We train a system to detect if a necropsy report from the Wisconsin Veterinary Diagnostic Laboratory contains evidence of gastrointestinal, respiratory, or urinary pathology. We evaluate the performance of several machine learning algorithms including deep learning with a long short-term memory network. Although no single algorithm was superior, random forest using feature vectors of TF-IDF statistics ranked among the top-performing models with F1 scores of 0.923 (gastrointestinal), 0.960 (respiratory), and 0.888 (urinary). This model was applied to over 33,000 necropsy reports and was used to describe temporal and spatial features of diseases within a 14-year period, exposing epidemiological trends and detecting a potential focus of gastrointestinal disease from a single submitting producer in the fall of 2016.


Subject(s)
Gastrointestinal Diseases/pathology , Lung Diseases/pathology , Machine Learning , Urologic Diseases/pathology , Animals , Area Under Curve , Deep Learning , Logistic Models , ROC Curve , Supervised Machine Learning
11.
J Surg Res ; 246: 160-169, 2020 02.
Article in English | MEDLINE | ID: mdl-31586890

ABSTRACT

BACKGROUND: A major roadblock to reducing the mortality of colorectal cancer (CRC) is prompt detection and treatment, and a simple blood test is likely to have higher compliance than all of the current methods. The purpose of this report is to examine the utility of a mass spectrometry-based blood serum protein biomarker test for detection of CRC. MATERIALS AND METHODS: Blood was drawn from individuals (n = 213) before colonoscopy or from patients with nonmetastatic CRC (n = 50) before surgery. Proteins were isolated from the serum of patients using targeted liquid chromatography-tandem mass spectrometry. We designed a machine-learning statistical model to assess these proteins. RESULTS: When considered individually, over 70% of the selected biomarkers showed significance by Mann-Whitney testing for distinguishing cancer-bearing cases from cancer-free cases. Using machine-learning methods, peptides derived from epidermal growth factor receptor and leucine-rich alpha-2-glycoprotein 1 were consistently identified as highly predictive for detecting CRC from cancer-free cases. A five-marker panel consisting of leucine-rich alpha-2-glycoprotein 1, epidermal growth factor receptor, inter-alpha-trypsin inhibitor heavy-chain family member 4, hemopexin, and superoxide dismutase 3 performed the best with 70% specificity at over 89% sensitivity (area under the curve = 0.86) in the validation set. For distinguishing regional from localized cancers, cross-validation within the training set showed that a panel of four proteins consisting of CD44 molecule, GC-vitamin D-binding protein, C-reactive protein, and inter-alpha-trypsin inhibitor heavy-chain family member 3 yielded the highest performance (area under the curve = 0.75). CONCLUSIONS: The minimally invasive blood biomarker panels identified here could serve as screening/detection alternatives for CRC in a human population and potentially useful for staging of existing cancer.


Subject(s)
Biomarkers, Tumor/blood , Colorectal Neoplasms/diagnosis , Early Detection of Cancer/methods , Lymphatic Metastasis/diagnosis , Mass Screening/methods , Adult , Aged , Aged, 80 and over , Colectomy , Colonoscopy , Colorectal Neoplasms/blood , Colorectal Neoplasms/pathology , Colorectal Neoplasms/surgery , Cross-Sectional Studies , Feasibility Studies , Female , Humans , Lymphatic Metastasis/pathology , Male , Mass Spectrometry/methods , Middle Aged , Neoplasm Staging , Pilot Projects , Prospective Studies , ROC Curve
12.
PLoS Comput Biol ; 15(6): e1006758, 2019 06.
Article in English | MEDLINE | ID: mdl-31246951

ABSTRACT

Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference.


Subject(s)
Computational Biology/methods , Data Mining/methods , Gene Regulatory Networks/genetics , Protein Interaction Mapping/methods , Protein Interaction Maps/genetics , Databases, Genetic , HIV , HIV Infections/genetics , HIV Infections/virology , Humans
13.
Respir Res ; 20(1): 115, 2019 Jun 10.
Article in English | MEDLINE | ID: mdl-31182091

ABSTRACT

BACKGROUND: Single birth cohort studies have been the basis for many discoveries about early life risk factors for childhood asthma but are limited in scope by sample size and characteristics of the local environment and population. The Children's Respiratory and Environmental Workgroup (CREW) was established to integrate multiple established asthma birth cohorts and to investigate asthma phenotypes and associated causal pathways (endotypes), focusing on how they are influenced by interactions between genetics, lifestyle, and environmental exposures during the prenatal period and early childhood. METHODS AND RESULTS: CREW is funded by the NIH Environmental influences on Child Health Outcomes (ECHO) program, and consists of 12 individual cohorts and three additional scientific centers. The CREW study population is diverse in terms of race, ethnicity, geographical distribution, and year of recruitment. We hypothesize that there are phenotypes in childhood asthma that differ based on clinical characteristics and underlying molecular mechanisms. Furthermore, we propose that asthma endotypes and their defining biomarkers can be identified based on personal and early life environmental risk factors. CREW has three phases: 1) to pool and harmonize existing data from each cohort, 2) to collect new data using standardized procedures, and 3) to enroll new families during the prenatal period to supplement and enrich extant data and enable unified systems approaches for identifying asthma phenotypes and endotypes. CONCLUSIONS: The overall goal of CREW program is to develop a better understanding of how early life environmental exposures and host factors interact to promote the development of specific asthma endotypes.


Subject(s)
Asthma/diagnosis , Asthma/epidemiology , Environmental Exposure/analysis , Life Style , Population Surveillance/methods , Adolescent , Asthma/genetics , Child , Child, Preschool , Cohort Studies , Environmental Exposure/prevention & control , Female , Humans , Infant , Male , Young Adult
14.
Bioinformatics ; 35(15): 2657-2659, 2019 08 01.
Article in English | MEDLINE | ID: mdl-30534948

ABSTRACT

SUMMARY: Understanding the regulatory roles of non-coding genetic variants has become a central goal for interpreting results of genome-wide association studies. The regulatory significance of the variants may be interrogated by assessing their influence on transcription factor binding. We have developed atSNP Search, a comprehensive web database for evaluating motif matches to the human genome with both reference and variant alleles and assessing the overall significance of the variant alterations on the motif matches. Convenient search features, comprehensive search outputs and a useful help menu are key components of atSNP Search. atSNP Search enables convenient interpretation of regulatory variants by statistical significance testing and composite logo plots, which are graphical representations of motif matches with the reference and variant alleles. Existing motif-based regulatory variant discovery tools only consider a limited pool of variants due to storage or other limitations. In contrast, atSNP Search users can test more than 37 billion variant-motif pairs with marginal significance in motif matches or match alteration. Computational evidence from atSNP Search, when combined with experimental validation, may help with the discovery of underlying disease mechanisms. AVAILABILITY AND IMPLEMENTATION: atSNP Search is freely available at http://atsnp.biostat.wisc.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Software , Genetic Variation , Humans , Protein Binding , Transcription Factors
15.
Res Comput Mol Biol ; 10812: 194-210, 2018 Apr.
Article in English | MEDLINE | ID: mdl-30680375

ABSTRACT

Advances in systems biology have made clear the importance of network models for capturing knowledge about complex relationships in gene regulation, metabolism, and cellular signaling. A common approach to uncovering biological networks involves performing perturbations on elements of the network, such as gene knockdown experiments, and measuring how the perturbation affects some reporter of the process under study. In this paper, we develop context-specific nested effects models (CSNEMs), an approach to inferring such networks that generalizes nested effect models (NEMs). The main contribution of this work is that CSNEMs explicitly model the participation of a gene in multiple contexts, meaning that a gene can appear in multiple places in the network. Biologically, the representation of regulators in multiple contexts may indicate that these regulators have distinct roles in different cellular compartments or cell cycle phases. We present an evaluation of the method on simulated data as well as on data from a study of the sodium chloride stress response in Saccharomyces cerevisiae.

16.
PLoS Comput Biol ; 13(6): e1005466, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28570593

ABSTRACT

Various types of biological knowledge describe networks of interactions among elementary entities. For example, transcriptional regulatory networks consist of interactions among proteins and genes. Current knowledge about the exact structure of such networks is highly incomplete, and laboratory experiments that manipulate the entities involved are conducted to test hypotheses about these networks. In recent years, various automated approaches to experiment selection have been proposed. Many of these approaches can be characterized as active machine learning algorithms. Active learning is an iterative process in which a model is learned from data, hypotheses are generated from the model to propose informative experiments, and the experiments yield new data that is used to update the model. This review describes the various models, experiment selection strategies, validation techniques, and successful applications described in the literature; highlights common themes and notable distinctions among methods; and identifies likely directions of future research and open problems in the area.


Subject(s)
Computational Biology/methods , Supervised Machine Learning , Algorithms , Gene Regulatory Networks , Metabolic Networks and Pathways , Research Design
17.
Surgery ; 161(4): 1113-1121, 2017 04.
Article in English | MEDLINE | ID: mdl-27989606

ABSTRACT

BACKGROUND: Parathyroidectomy offers the only cure for primary hyperparathyroidism, but today only 50% of primary hyperparathyroidism patients are referred for operation, in large part, because the condition is widely under-recognized. The diagnosis of primary hyperparathyroidism can be especially challenging with mild biochemical indices. Machine learning is a collection of methods in which computers build predictive algorithms based on labeled examples. With the aim of facilitating diagnosis, we tested the ability of machine learning to distinguish primary hyperparathyroidism from normal physiology using clinical and laboratory data. METHODS: This retrospective cohort study used a labeled training set and 10-fold cross-validation to evaluate accuracy of the algorithm. Measures of accuracy included area under the receiver operating characteristic curve, precision (sensitivity), and positive and negative predictive value. Several different algorithms and ensembles of algorithms were tested using the Weka platform. Among 11,830 patients managed operatively at 3 high-volume endocrine surgery programs from March 2001 to August 2013, 6,777 underwent parathyroidectomy for confirmed primary hyperparathyroidism, and 5,053 control patients without primary hyperparathyroidism underwent thyroidectomy. Test-set accuracies for machine learning models were determined using 10-fold cross-validation. Age, sex, and serum levels of preoperative calcium, phosphate, parathyroid hormone, vitamin D, and creatinine were defined as potential predictors of primary hyperparathyroidism. Mild primary hyperparathyroidism was defined as primary hyperparathyroidism with normal preoperative calcium or parathyroid hormone levels. RESULTS: After testing a variety of machine learning algorithms, Bayesian network models proved most accurate, classifying correctly 95.2% of all primary hyperparathyroidism patients (area under receiver operating characteristic = 0.989). Omitting parathyroid hormone from the model did not decrease the accuracy significantly (area under receiver operating characteristic = 0.985). In mild disease cases, however, the Bayesian network model classified correctly 71.1% of patients with normal calcium and 92.1% with normal parathyroid hormone levels preoperatively. Bayesian networking and AdaBoost improved the accuracy of all parathyroid hormone patients to 97.2% cases (area under receiver operating characteristic = 0.994), and 91.9% of primary hyperparathyroidism patients with mild disease. This was significantly improved relative to Bayesian networking alone (P < .0001). CONCLUSION: Machine learning can diagnose accurately primary hyperparathyroidism without human input even in mild disease. Incorporation of this tool into electronic medical record systems may aid in recognition of this under-diagnosed disorder.


Subject(s)
Algorithms , Hyperparathyroidism, Primary/diagnosis , Machine Learning , Parathyroid Hormone/blood , Quality Improvement , Bayes Theorem , Case-Control Studies , Databases, Factual , Female , Humans , Hyperparathyroidism, Primary/surgery , Male , Middle Aged , Parathyroidectomy/methods , Predictive Value of Tests , ROC Curve , Sensitivity and Specificity , Severity of Illness Index
18.
Surgery ; 160(6): 1666-1674, 2016 12.
Article in English | MEDLINE | ID: mdl-27769659

ABSTRACT

BACKGROUND: Many studies have evaluated predictors of postoperative complications, yet little is known about the development of multiple complications. The goal of this study was to assess complication timing in cascades of multiple complications and the risk of future complications given a patient's first complication. METHODS: This study includes 30-day, postoperative complications from the American College of Surgeons National Surgical Quality Improvement Program for all patients who underwent major inpatient and outpatient operative procedures from 2005-2013. The timing and sequencing of complications were evaluated using χ2 analysis and pairwise comparisons. RESULTS: More severe postoperative complications (cardiac arrest or myocardial infarction, renal insufficiency or failure, stroke, intubation, septic shock, coma) had the greatest impact on the risk for developing further complications, increasing the relative risk of developing future, specific, severe complications by more than 40-fold. These more severe complications occur within a few days of other complications (whether as a preceding factor or an outcome), while less severe complications, such as surgical site infection and urinary tract infection, are linked less tightly to complication cascades. CONCLUSION: This analysis highlights both the risk for secondary complications after an initial complication and when those future complications are likely to occur. Physicians can use this information to target interventions to prevent high-risk complications.


Subject(s)
Postoperative Complications/epidemiology , Adolescent , Adult , Aged , Female , Health Status , Humans , Male , Middle Aged , Prevalence , Quality Improvement , Retrospective Studies , Risk Factors , Time Factors , United States/epidemiology , Young Adult
19.
J Virol ; 90(18): 8115-31, 2016 09 15.
Article in English | MEDLINE | ID: mdl-27384650

ABSTRACT

UNLABELLED: Herpes simplex virus 1 (HSV-1) most commonly causes recrudescent labial ulcers; however, it is also the leading cause of infectious blindness in developed countries. Previous research in animal models has demonstrated that the severity of HSV-1 ocular disease is influenced by three main factors: host innate immunity, host immune response, and viral strain. We have previously shown that mixed infection with two avirulent HSV-1 strains (OD4 and CJ994) results in recombinants with a wide range of ocular disease phenotype severity. Recently, we developed a quantitative trait locus (QTL)-based computational approach (vQTLmap) to identify viral single nucleotide polymorphisms (SNPs) predicted to influence the severity of the ocular disease phenotypes. We have now applied vQTLmap to identify HSV-1 SNPs associated with corneal neovascularization and mean peak percentage weight loss (MPWL) using 65 HSV-1 OD4-CJ994 recombinants. The vQTLmap analysis using Random Forest for neovascularization identified phenotypically meaningful nonsynonymous SNPs in the ICP4, UL41 (VHS), UL42, UL46 (VP11/12), UL47 (VP13/14), UL48 (VP22), US3, US4 (gG), US6 (gD), and US7 (gI) coding regions. The ICP4 gene was previously identified as a corneal neovascularization determinant, validating the vQTLmap method. Further analysis detected an epistatic interaction for neovascularization between a segment of the unique long (UL) region and a segment of the inverted repeat short (IRS)/unique short (US) region. Ridge regression was used to identify MPWL-associated nonsynonymous SNPs in the UL1 (gL), UL2, UL4, UL49 (VP22), UL50, and ICP4 coding regions. The data provide additional insights into virulence gene and epistatic interaction discovery in HSV-1. IMPORTANCE: Herpes simplex virus 1 (HSV-1) typically causes recurrent cold sores; however, it is also the leading source of infectious blindness in developed countries. Corneal neovascularization is critical for the progression of blinding ocular disease, and weight loss is a measure of infection severity. Previous HSV-1 animal virulence studies have shown that the severity of ocular disease is partially due to the viral strain. In the current study, we used a recently described computational quantitative trait locus (QTL) approach in conjunction with 65 HSV-1 recombinants to identify viral single nucleotide polymorphisms (SNPs) involved in neovascularization and weight loss. Neovascularization SNPs were identified in the ICP4, VHS, UL42, VP11/12, VP13/14, VP22, gG, US3, gD, and gI genes. Further analysis revealed an epistatic interaction between the UL and US regions. MPWL-associated SNPs were detected in the UL1 (gL), UL2, UL4, VP22, UL50, and ICP4 genes. This approach will facilitate future HSV virulence studies.


Subject(s)
Corneal Neovascularization/pathology , Epistasis, Genetic , Genes, Viral , Herpes Simplex/pathology , Herpesvirus 1, Human/pathogenicity , Virulence Factors/genetics , Weight Loss , Animals , Genetic Loci , Herpes Simplex/virology , Mice , Polymorphism, Single Nucleotide
20.
Ann Surg ; 263(6): 1213-8, 2016 06.
Article in English | MEDLINE | ID: mdl-27167563

ABSTRACT

OBJECTIVES: To evaluate the association between multiple complications and postoperative outcomes and to assess which complications occur together in patients with multiple complications. BACKGROUND: Patients who suffer multiple complications have increased risk of prolonged hospital stay and mortality. However, little is known about what places patients at risk for multiple complications or which complications tend to occur in these patients. METHODS: Surgical patients were identified from the American College of Surgeons National Quality Improvement Program (ACS NSQIP) database from 2005 to 2011. The frequency of postoperative complications was assessed. Patients with less than two complications were compared with patients who had multiple complications using χ and logistic regression analysis. Relationships among postoperative complications were explored by learning a Bayesian network model. RESULTS: The study population consisted of 470,108 general surgery patients. The overall complication rate was 15% with multiple complications in 27,032 (6%) patients. Patients with multiple complications had worse postoperative outcomes (P < 0.001). The strongest predictors for developing multiple complications were admission from chronic care facility or nursing home, dependent functional status, and higher American Society of Anesthesiologist Physical Status classification. In patients with multiple complications, the most common complication was sepsis (42%), followed by failure to wean ventilator (31%), and organ space surgical site infection (27%). We found that severe complications were most strongly associated with development of multiple complications. Using a Bayesian network, we were able to identify how strongly associated specific complications were in patients who developed multiple complications. CONCLUSIONS: Almost half (40%) of patients with complications suffer multiple complications. Patient factors such as frailty and comorbidity strongly predict the development of multiple complications. The results of our Bayesian analysis identify targets for interventions aimed at disrupting the cascade of multiple complications in high-risk patients.


Subject(s)
General Surgery , Postoperative Complications/epidemiology , Adolescent , Adult , Aged , Bayes Theorem , Comorbidity , Databases, Factual , Female , Hospital Mortality , Humans , Length of Stay/statistics & numerical data , Male , Middle Aged , Postoperative Complications/mortality , Prognosis , Risk Factors , United States/epidemiology
SELECTION OF CITATIONS
SEARCH DETAIL
...