Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
J Med Internet Res ; 23(11): e32900, 2021 11 26.
Artigo em Inglês | MEDLINE | ID: mdl-34842542

RESUMO

BACKGROUND: Multimorbidity clinical risk scores allow clinicians to quickly assess their patients' health for decision making, often for recommendation to care management programs. However, these scores are limited by several issues: existing multimorbidity scores (1) are generally limited to one data group (eg, diagnoses, labs) and may be missing vital information, (2) are usually limited to specific demographic groups (eg, age), and (3) do not formally provide any granularity in the form of more nuanced multimorbidity risk scores to direct clinician attention. OBJECTIVE: Using diagnosis, lab, prescription, procedure, and demographic data from electronic health records (EHRs), we developed a physiologically diverse and generalizable set of multimorbidity risk scores. METHODS: Using EHR data from a nationwide cohort of patients, we developed the total health profile, a set of six integrated risk scores reflecting five distinct organ systems and overall health. We selected the occurrence of an inpatient hospital visitation over a 2-year follow-up window, attributable to specific organ systems, as our risk endpoint. Using a physician-curated set of features, we trained six machine learning models on 794,294 patients to predict the calibrated probability of the aforementioned endpoint, producing risk scores for heart, lung, neuro, kidney, and digestive functions and a sixth score for combined risk. We evaluated the scores using a held-out test cohort of 198,574 patients. RESULTS: Study patients closely matched national census averages, with a median age of 41 years, a median income of $66,829, and racial averages by zip code of 73.8% White, 5.9% Asian, and 11.9% African American. All models were well calibrated and demonstrated strong performance with areas under the receiver operating curve (AUROCs) of 0.83 for the total health score (THS), 0.89 for heart, 0.86 for lung, 0.84 for neuro, 0.90 for kidney, and 0.83 for digestive functions. There was consistent performance of this scoring system across sexes, diverse patient ages, and zip code income levels. Each model learned to generate predictions by focusing on appropriate clinically relevant patient features, such as heart-related hospitalizations and chronic hypertension diagnosis for the heart model. The THS outperformed the other commonly used multimorbidity scoring systems, specifically the Charlson Comorbidity Index (CCI) and the Elixhauser Comorbidity Index (ECI) overall (AUROCs: THS=0.823, CCI=0.735, ECI=0.649) as well as for every age, sex, and income bracket. Performance improvements were most pronounced for middle-aged and lower-income subgroups. Ablation tests using only diagnosis, prescription, social determinants of health, and lab feature groups, while retaining procedure-related features, showed that the combination of feature groups has the best predictive performance, though only marginally better than the diagnosis-only model on at-risk groups. CONCLUSIONS: Massive retrospective EHR data sets have made it possible to use machine learning to build practical multimorbidity risk scores that are highly predictive, personalizable, intuitive to explain, and generalizable across diverse patient populations.


Assuntos
Aprendizado de Máquina , Multimorbidade , Adulto , Estudos de Coortes , Registros Eletrônicos de Saúde , Humanos , Pessoa de Meia-Idade , Estudos Retrospectivos , Fatores de Risco
2.
Proc Natl Acad Sci U S A ; 115(14): 3686-3691, 2018 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-29555771

RESUMO

Reducing premature mortality associated with age-related chronic diseases, such as cancer and cardiovascular disease, is an urgent priority. We report early results using genomics in combination with advanced imaging and other clinical testing to proactively screen for age-related chronic disease risk among adults. We enrolled active, symptom-free adults in a study of screening for age-related chronic diseases associated with premature mortality. In addition to personal and family medical history and other clinical testing, we obtained whole-genome sequencing (WGS), noncontrast whole-body MRI, dual-energy X-ray absorptiometry (DXA), global metabolomics, a new blood test for prediabetes (Quantose IR), echocardiography (ECHO), ECG, and cardiac rhythm monitoring to identify age-related chronic disease risks. Precision medicine screening using WGS and advanced imaging along with other testing among active, symptom-free adults identified a broad set of complementary age-related chronic disease risks associated with premature mortality and strengthened WGS variant interpretation. This and other similarly designed screening approaches anchored by WGS and advanced imaging may have the potential to extend healthy life among active adults through improved prevention and early detection of age-related chronic diseases (and their risk factors) associated with premature mortality.


Assuntos
Doença/genética , Predisposição Genética para Doença , Processamento de Imagem Assistida por Computador/métodos , Mutação , Medicina de Precisão/métodos , Sequenciamento Completo do Genoma/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Doenças Cardiovasculares/diagnóstico por imagem , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/patologia , Doença/classificação , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Neoplasias/diagnóstico por imagem , Neoplasias/genética , Neoplasias/patologia , Doenças do Sistema Nervoso/diagnóstico por imagem , Doenças do Sistema Nervoso/genética , Doenças do Sistema Nervoso/patologia , Medição de Risco , Análise de Sequência de RNA , Adulto Jovem
3.
Proc Natl Acad Sci U S A ; 114(38): 10166-10171, 2017 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-28874526

RESUMO

Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.


Assuntos
Confidencialidade , Impressões Digitais de DNA , Modelos Genéticos , Fenótipo , Sequenciamento Completo do Genoma , Adulto , Fatores Etários , Algoritmos , Tamanho Corporal , Estudos de Coortes , Anonimização de Dados , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Pigmentação/genética , Adulto Jovem
4.
Elife ; 4: e06974, 2015 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-26175406

RESUMO

The eukaryotic phylum Apicomplexa encompasses thousands of obligate intracellular parasites of humans and animals with immense socio-economic and health impacts. We sequenced nuclear genomes of Chromera velia and Vitrella brassicaformis, free-living non-parasitic photosynthetic algae closely related to apicomplexans. Proteins from key metabolic pathways and from the endomembrane trafficking systems associated with a free-living lifestyle have been progressively and non-randomly lost during adaptation to parasitism. The free-living ancestor contained a broad repertoire of genes many of which were repurposed for parasitic processes, such as extracellular proteins, components of a motility apparatus, and DNA- and RNA-binding protein families. Based on transcriptome analyses across 36 environmental conditions, Chromera orthologs of apicomplexan invasion-related motility genes were co-regulated with genes encoding the flagellar apparatus, supporting the functional contribution of flagella to the evolution of invasion machinery. This study provides insights into how obligate parasites with diverse life strategies arose from a once free-living phototrophic marine alga.


Assuntos
Alveolados/genética , DNA de Algas/química , DNA de Algas/genética , Evolução Molecular , Análise de Sequência de DNA , Perfilação da Expressão Gênica , Dados de Sequência Molecular
5.
Proteomics ; 15(15): 2618-28, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25867681

RESUMO

Proteomics data can supplement genome annotation efforts, for example being used to confirm gene models or correct gene annotation errors. Here, we present a large-scale proteogenomics study of two important apicomplexan pathogens: Toxoplasma gondii and Neospora caninum. We queried proteomics data against a panel of official and alternate gene models generated directly from RNASeq data, using several newly generated and some previously published MS datasets for this meta-analysis. We identified a total of 201 996 and 39 953 peptide-spectrum matches for T. gondii and N. caninum, respectively, at a 1% peptide FDR threshold. This equated to the identification of 30 494 distinct peptide sequences and 2921 proteins (matches to official gene models) for T. gondii, and 8911 peptides/1273 proteins for N. caninum following stringent protein-level thresholding. We have also identified 289 and 140 loci for T. gondii and N. caninum, respectively, which mapped to RNA-Seq-derived gene models used in our analysis and apparently absent from the official annotation (release 10 from EuPathDB) of these species. We present several examples in our study where the RNA-Seq evidence can help in correction of the current gene model and can help in discovery of potential new genes. The findings of this study have been integrated into the EuPathDB. The data have been deposited to the ProteomeXchange with identifiers PXD000297and PXD000298.


Assuntos
Genômica/métodos , Neospora/genética , Neospora/metabolismo , Proteômica/métodos , Toxoplasma/genética , Toxoplasma/metabolismo , Sequência de Aminoácidos , Apicomplexa/genética , Apicomplexa/metabolismo , Bases de Dados Genéticas , Genes de Protozoários/genética , Anotação de Sequência Molecular/métodos , Dados de Sequência Molecular , Peptídeos/genética , Peptídeos/metabolismo , Proteoma/genética , Proteoma/metabolismo , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo , Análise de Sequência de RNA/métodos , Homologia de Sequência de Aminoácidos , Espectrometria de Massas em Tandem/métodos
6.
Bioinformatics ; 28(12): 1571-8, 2012 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-22513996

RESUMO

MOTIVATION: Gene-model curation creates consensus gene models by combining multiple sources of protein-coding evidence that may be incomplete or inconsistent. To date, manual curation still produces the highest quality models. However, manual curation is too slow and costly to be completed even for the most important organisms. In recent years, machine-learned ensemble gene predictors have become a viable alternative to manual curation. Current approaches make use of signal and genomic region consistency among sources and some voting scheme to resolve conflicts in the evidence. As a further step in that direction, we have developed eCRAIG (ensemble CRAIG), an automated curation tool that combines multiple sources of evidence using global discriminative training. This allows efficient integration of different types of genomic evidence with complex statistical dependencies to maximize directly annotation accuracy. Our method goes beyond previous work in integrating novel non-linear annotation agreement features, as well as combinations of intrinsic features of the target sequence and extrinsic annotation features. RESULTS: We achieved significant improvements over the best ensemble predictors available for Homo sapiens, Caenorhabditis elegans and Arabidopsis thaliana. In particular, eCRAIG achieved a relative mean improvement of 5.1% over Jigsaw, the best published ensemble predictor in all our experiments. AVAILABILITY: The source code and datasets are both available at http://www.seas.upenn.edu/abernal/ecraig.tgz.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Modelos Genéticos , Algoritmos , Animais , Arabidopsis/genética , Caenorhabditis elegans/genética , Genômica , Humanos
7.
PLoS Comput Biol ; 3(3): e54, 2007 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-17367206

RESUMO

Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns.


Assuntos
Inteligência Artificial , Fases de Leitura Aberta/genética , Reconhecimento Automatizado de Padrão/métodos , Proteínas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Análise Discriminante , Éxons , Sensibilidade e Especificidade
8.
Genome Res ; 12(10): 1556-63, 2002 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-12368248

RESUMO

Draft sequencing is a rapid and efficient method for determining the near-complete sequence of microbial genomes. Here we report a comparative analysis of one complete and two draft genome sequences of the phytopathogenic bacterium, Xylella fastidiosa, which causes serious disease in plants, including citrus, almond, and oleander. We present highlights of an in silico analysis based on a comparison of reconstructions of core biological subsystems. Cellular pathway reconstructions have been used to identify a small number of genes, which are likely to reside within the draft genomes but are not captured in the draft assembly. These represented only a small fraction of all genes and were predominantly large and small ribosomal subunit protein components. By using this approach, some of the inherent limitations of draft sequence can be significantly reduced. Despite the incomplete nature of the draft genomes, it is possible to identify several phage-related genes, which appear to be absent from the draft genomes and not the result of insufficient sequence sampling. This region may therefore identify potential host-specific functions. Based on this first functional reconstruction of a phytopathogenic microbe, we spotlight an unusual respiration machinery as a potential target for biological control. We also predicted and developed a new defined growth medium for Xylella.


Assuntos
Genoma Bacteriano , Genômica/métodos , Proteobactérias/genética , Análise de Sequência de DNA/métodos , Sítios de Ligação Microbiológicos/genética , Bacteriófagos/genética , Composição de Bases/genética , Meios de Cultura/química , Meios de Cultura/metabolismo , Reparo do DNA/genética , Replicação do DNA/genética , DNA Bacteriano/genética , Genes Bacterianos/genética , Genes Bacterianos/fisiologia , Dados de Sequência Molecular , Fases de Leitura Aberta/genética , Fases de Leitura Aberta/fisiologia , Plasmídeos/genética , Biossíntese de Proteínas/genética , Proteobactérias/crescimento & desenvolvimento , Proteobactérias/patogenicidade , Proteobactérias/fisiologia , Recombinação Genética/genética , Especificidade da Espécie
9.
Proc Natl Acad Sci U S A ; 99(19): 12403-8, 2002 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-12205291

RESUMO

Xylella fastidiosa (Xf) causes wilt disease in plants and is responsible for major economic and crop losses globally. Owing to the public importance of this phytopathogen we embarked on a comparative analysis of the complete genome of Xf pv citrus and the partial genomes of two recently sequenced strains of this species: Xf pv almond and Xf pv oleander, which cause leaf scorch in almond and oleander plants, respectively. We report a reanalysis of the previously sequenced Xf 9a5c (CVC, citrus) strain and the two "gapped" Xf genomes revealing ORFs encoding critical functions in pathogenicity and conjugative transfer. Second, a detailed whole-genome functional comparison was based on the three sequenced Xf strains, identifying the unique genes present in each strain, in addition to those shared between strains. Third, an "in silico" cellular reconstruction of these organisms was made, based on a comparison of their core functional subsystems that led to a characterization of their conjugative transfer machinery, identification of potential differences in their adhesion mechanisms, and highlighting of the absence of a classical quorum-sensing mechanism. This study demonstrates the effectiveness of comparative analysis strategies in the interpretation of genomes that are closely related.


Assuntos
Gammaproteobacteria/genética , Gammaproteobacteria/patogenicidade , Genoma Bacteriano , Doenças das Plantas/microbiologia , Proteínas de Bactérias/genética , Metabolismo dos Carboidratos , Citrus/microbiologia , Conjugação Genética , Evolução Molecular , Gammaproteobacteria/metabolismo , Dados de Sequência Molecular , Família Multigênica , Nerium/microbiologia , Fases de Leitura Aberta , Prunus/microbiologia , Especificidade da Espécie , Virulência/genética
10.
J Bacteriol ; 184(16): 4555-72, 2002 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-12142426

RESUMO

Novel drug targets are required in order to design new defenses against antibiotic-resistant pathogens. Comparative genomics provides new opportunities for finding optimal targets among previously unexplored cellular functions, based on an understanding of related biological processes in bacterial pathogens and their hosts. We describe an integrated approach to identification and prioritization of broad-spectrum drug targets. Our strategy is based on genetic footprinting in Escherichia coli followed by metabolic context analysis of essential gene orthologs in various species. Genes required for viability of E. coli in rich medium were identified on a whole-genome scale using the genetic footprinting technique. Potential target pathways were deduced from these data and compared with a panel of representative bacterial pathogens by using metabolic reconstructions from genomic data. Conserved and indispensable functions revealed by this analysis potentially represent broad-spectrum antibacterial targets. Further target prioritization involves comparison of the corresponding pathways and individual functions between pathogens and the human host. The most promising targets are validated by direct knockouts in model pathogens. The efficacy of this approach is illustrated using examples from metabolism of adenylate cofactors NAD(P), coenzyme A, and flavin adenine dinucleotide. Several drug targets within these pathways, including three distantly related adenylyltransferases (orthologs of the E. coli genes nadD, coaD, and ribF), are discussed in detail.


Assuntos
Coenzima A/biossíntese , Escherichia coli/metabolismo , Flavina-Adenina Dinucleotídeo/biossíntese , NADP/biossíntese , Antibacterianos , Pegada de DNA , Elementos de DNA Transponíveis , Desenho de Fármacos , Farmacorresistência Bacteriana , Escherichia coli/efeitos dos fármacos , Escherichia coli/genética , Mononucleotídeo de Flavina/biossíntese , Genoma Bacteriano , Mutagênese Insercional , Nicotinamida-Nucleotídeo Adenililtransferase/metabolismo , Fosfotransferases (Aceptor do Grupo Álcool)/genética , Especificidade por Substrato
11.
J Bacteriol ; 184(7): 2005-18, 2002 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-11889109

RESUMO

We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite (http://www.integratedgenomics.com). The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H(2)S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth.


Assuntos
Fusobacterium nucleatum/genética , Genoma Bacteriano , Biossíntese de Proteínas , Transcrição Gênica , Aminoácidos/metabolismo , Proteínas da Membrana Bacteriana Externa/metabolismo , Transporte Biológico , Divisão Celular , Coenzimas/metabolismo , Reparo do DNA , Replicação do DNA , Elementos de DNA Transponíveis , DNA Bacteriano/análise , Farmacorresistência Bacteriana , Fusobacterium nucleatum/metabolismo , Metabolismo dos Lipídeos , Lipopolissacarídeos/metabolismo , Mutagênese Insercional , Nucleotídeos/metabolismo , Prótons , Transdução de Sinais/fisiologia , Virulência
12.
Proc Natl Acad Sci U S A ; 99(1): 443-8, 2002 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-11756688

RESUMO

Brucella melitensis is a facultative intracellular bacterial pathogen that causes abortion in goats and sheep and Malta fever in humans. The genome of B. melitensis strain 16M was sequenced and found to contain 3,294,935 bp distributed over two circular chromosomes of 2,117,144 bp and 1,177,787 bp encoding 3,197 ORFs. By using the bioinformatics suite ERGO, 2,487 (78%) ORFs were assigned functions. The origins of replication of the two chromosomes are similar to those of other alpha-proteobacteria. Housekeeping genes, including those involved in DNA replication, transcription, translation, core metabolism, and cell wall biosynthesis, are distributed on both chromosomes. Type I, II, and III secretion systems are absent, but genes encoding sec-dependent, sec-independent, and flagella-specific type III, type IV, and type V secretion systems as well as adhesins, invasins, and hemolysins were identified. Several features of the B. melitensis genome are similar to those of the symbiotic Sinorhizobium meliloti.


Assuntos
Brucella melitensis/genética , Genoma Bacteriano , Cromossomos , Ácidos Graxos/metabolismo , Modelos Biológicos , Modelos Genéticos , Dados de Sequência Molecular , Fases de Leitura Aberta , Biossíntese de Proteínas , Origem de Replicação , Análise de Sequência de DNA , Transdução de Sinais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...