Pesquisa | Portal Regional da BVS

Increasing metadata coverage of SRA BioSample entries using deep learning-based named entity recognition.

Klie, Adam; Tsui, Brian Y; Mollah, Shamim; Skola, Dylan; Dow, Michelle; Hsu, Chun-Nan; Carter, Hannah.

Database (Oxford) ; 20212021 04 29.

Artigo em Inglês | MEDLINE | ID: mdl-33914028

RESUMO

High-quality metadata annotations for data hosted in large public repositories are essential for research reproducibility and for conducting fast, powerful and scalable meta-analyses. Currently, a majority of sequencing samples in the National Center for Biotechnology Information's Sequence Read Archive (SRA) are missing metadata across several categories. In an effort to improve the metadata coverage of these samples, we leveraged almost 44 million attribute-value pairs from SRA BioSample to train a scalable, recurrent neural network that predicts missing metadata via named entity recognition (NER). The network was first trained to classify short text phrases according to 11 metadata categories and achieved an overall accuracy and area under the receiver operating characteristic curve of 85.2% and 0.977, respectively. We then applied our classifier to predict 11 metadata categories from the longer TITLE attribute of samples, evaluating performance on a set of samples withheld from model training. Prediction accuracies were high when extracting sample Genus/Species (94.85%), Condition/Disease (95.65%) and Strain (82.03%) from TITLEs, with lower accuracies and lack of predictions for other categories highlighting multiple issues with the current metadata annotations in BioSample. These results indicate the utility of recurrent neural networks for NER-based metadata prediction and the potential for models such as the one presented here to increase metadata coverage in BioSample while minimizing the need for manual curation. Database URL: https://github.com/cartercompbio/PredictMEE.

Assuntos

Aprendizado Profundo , Metadados , Sequenciamento de Nucleotídeos em Larga Escala , Reprodutibilidade dos Testes , Software

Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence.

Liang, Huiying; Tsui, Brian Y; Ni, Hao; Valentim, Carolina C S; Baxter, Sally L; Liu, Guangjian; Cai, Wenjia; Kermany, Daniel S; Sun, Xin; Chen, Jiancong; He, Liya; Zhu, Jie; Tian, Pin; Shao, Hua; Zheng, Lianghong; Hou, Rui; Hewett, Sierra; Li, Gen; Liang, Ping; Zang, Xuan; Zhang, Zhiqi; Pan, Liyan; Cai, Huimin; Ling, Rujuan; Li, Shuhua; Cui, Yongwang; Tang, Shusheng; Ye, Hong; Huang, Xiaoyan; He, Waner; Liang, Wenqing; Zhang, Qing; Jiang, Jianmin; Yu, Wei; Gao, Jianqun; Ou, Wanxing; Deng, Yingmin; Hou, Qiaozhen; Wang, Bei; Yao, Cuichan; Liang, Yan; Zhang, Shu; Duan, Yaou; Zhang, Runze; Gibson, Sarah; Zhang, Charlotte L; Li, Oulan; Zhang, Edward D; Karin, Gabriel; Nguyen, Nathan.

Nat Med ; 25(3): 433-438, 2019 03.

Artigo em Inglês | MEDLINE | ID: mdl-30742121

RESUMO

Artificial intelligence (AI)-based methods have emerged as powerful tools to transform medical care. Although machine learning classifiers (MLCs) have already demonstrated strong performance in image-based diagnoses, analysis of diverse and massive electronic health record (EHR) data remains challenging. Here, we show that MLCs can query EHRs in a manner similar to the hypothetico-deductive reasoning used by physicians and unearth associations that previous statistical methods have not found. Our model applies an automated natural language processing system using deep learning techniques to extract clinically relevant information from EHRs. In total, 101.6 million data points from 1,362,559 pediatric patient visits presenting to a major referral center were analyzed to train and validate the framework. Our model demonstrates high diagnostic accuracy across multiple organ systems and is comparable to experienced pediatricians in diagnosing common childhood diseases. Our study provides a proof of concept for implementing an AI-based system as a means to aid physicians in tackling large amounts of data, augmenting diagnostic evaluations, and to provide clinical decision support in cases of diagnostic uncertainty or complexity. Although this impact may be most evident in areas where healthcare providers are in relative shortage, the benefits of such an AI system are likely to be universal.

Assuntos

Aprendizado Profundo , Diagnóstico por Computador , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Pediatria , Adolescente , Inteligência Artificial , Criança , Pré-Escolar , China , Feminino , Humanos , Lactente , Recém-Nascido , Aprendizado de Máquina , Masculino , Estudo de Prova de Conceito , Reprodutibilidade dos Testes , Estudos Retrospectivos

Integrative genomic analysis of mouse and human hepatocellular carcinoma.

Dow, Michelle; Pyke, Rachel M; Tsui, Brian Y; Alexandrov, Ludmil B; Nakagawa, Hayato; Taniguchi, Koji; Seki, Ekihiro; Harismendy, Olivier; Shalapour, Shabnam; Karin, Michael; Carter, Hannah; Font-Burgada, Joan.

Proc Natl Acad Sci U S A ; 115(42): E9879-E9888, 2018 10 16.

Artigo em Inglês | MEDLINE | ID: mdl-30287485

RESUMO

Cancer genomics has enabled the exhaustive molecular characterization of tumors and exposed hepatocellular carcinoma (HCC) as among the most complex cancers. This complexity is paralleled by dozens of mouse models that generate histologically similar tumors but have not been systematically validated at the molecular level. Accurate models of the molecular pathogenesis of HCC are essential for biomedical progress; therefore we compared genomic and transcriptomic profiles of four separate mouse models [MUP transgenic, TAK1-knockout, carcinogen-driven diethylnitrosamine (DEN), and Stelic Animal Model (STAM)] with those of 987 HCC patients with distinct etiologies. These four models differed substantially in their mutational load, mutational signatures, affected genes and pathways, and transcriptomes. STAM tumors were most molecularly similar to human HCC, with frequent mutations in Ctnnb1, similar pathway alterations, and high transcriptomic similarity to high-grade, proliferative human tumors with poor prognosis. In contrast, TAK1 tumors better reflected the mutational signature of human HCC and were transcriptionally similar to low-grade human tumors. DEN tumors were least similar to human disease and almost universally carried the Braf V637E mutation, which is rarely found in human HCC. Immune analysis revealed that strain-specific MHC-I genotype can influence the molecular makeup of murine tumors. Thus, different mouse models of HCC recapitulate distinct aspects of HCC biology, and their use should be adapted to specific questions based on the molecular features provided here.

Assuntos

Biomarcadores Tumorais/genética , Carcinoma Hepatocelular/genética , Perfilação da Expressão Gênica , Genômica/métodos , Neoplasias Hepáticas Experimentais/genética , Neoplasias Hepáticas/genética , Animais , Carcinoma Hepatocelular/patologia , Modelos Animais de Doenças , Humanos , Neoplasias Hepáticas/patologia , Neoplasias Hepáticas Experimentais/patologia , Camundongos , Camundongos Endogâmicos C57BL , Transcriptoma

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA