Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
1.
Mol Carcinog ; 63(1): 120-135, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37750589

ABSTRACT

Head and neck squamous cell carcinomas (HNSCC) remain a poorly understood disease clinically and immunologically. HPV is a known risk factor of HNSCC associated with better outcome, whereas HPV-negative HNSCC are more heterogeneous in outcome. Gene expression signatures have been developed to classify HNSCC into four molecular subtypes (classical, basal, mesenchymal, and atypical). However, the molecular underpinnings of treatment response and the immune landscape for these molecular subtypes are largely unknown. Herein, we described a comprehensive immune landscape analysis in three independent HNSCC cohorts (>700 patients) using transcriptomics data. We assigned the HPV- HNSCC patients into these four molecular subtypes and characterized the tumor microenvironment using deconvolution method. We determined that atypical and mesenchymal subtypes have greater immune enrichment and exhibit a T-cell exhaustion phenotype, compared to classical and basal subtypes. Further analyses revealed different B cell maturation and antibody isotypes enrichment patterns, and distinct immune microenvironment crosstalk in the atypical and mesenchymal subtypes. Taken together, our study suggests that treatments that enhances B cell activity may benefit patients with HNSCC of the atypical subtypes. The rationale can be utilized in the design of future precision immunotherapy trials based on the molecular subtypes of HPV- HNSCC.


Subject(s)
Head and Neck Neoplasms , Papillomavirus Infections , Humans , Squamous Cell Carcinoma of Head and Neck/genetics , Human Papillomavirus Viruses , Papillomavirus Infections/complications , Papillomavirus Infections/genetics , Head and Neck Neoplasms/genetics , Immunotherapy , Tumor Microenvironment
2.
NPJ Precis Oncol ; 7(1): 68, 2023 Jul 18.
Article in English | MEDLINE | ID: mdl-37464050

ABSTRACT

Preclinical genetically engineered mouse models (GEMMs) of lung adenocarcinoma are invaluable for investigating molecular drivers of tumor formation, progression, and therapeutic resistance. However, histological analysis of these GEMMs requires significant time and training to ensure accuracy and consistency. To achieve a more objective and standardized analysis, we used machine learning to create GLASS-AI, a histological image analysis tool that the broader cancer research community can utilize to grade, segment, and analyze tumors in preclinical models of lung adenocarcinoma. GLASS-AI demonstrates strong agreement with expert human raters while uncovering a significant degree of unreported intratumor heterogeneity. Integrating immunohistochemical staining with high-resolution grade analysis by GLASS-AI identified dysregulation of Mapk/Erk signaling in high-grade lung adenocarcinomas and locally advanced tumor regions. Our work demonstrates the benefit of employing GLASS-AI in preclinical lung adenocarcinoma models and the power of integrating machine learning and molecular biology techniques for studying the molecular pathways that underlie cancer progression.

3.
JCO Clin Cancer Inform ; 6: e2100129, 2022 05.
Article in English | MEDLINE | ID: mdl-35623021

ABSTRACT

PURPOSE: Liver cancer is a global challenge, and disparities exist across multiple domains and throughout the disease continuum. However, liver cancer's global epidemiology and etiology are shifting, and the literature is rapidly evolving, presenting a challenge to the synthesis of knowledge needed to identify areas of research needs and to develop research agendas focusing on disparities. Machine learning (ML) techniques can be used to semiautomate the literature review process and improve efficiency. In this study, we detail our approach and provide practical benchmarks for the development of a ML approach to classify literature and extract data at the intersection of three fields: liver cancer, health disparities, and epidemiology. METHODS: We performed a six-phase process including: training (I), validating (II), confirming (III), and performing error analysis (IV) for a ML classifier. We then developed an extraction model (V) and applied it (VI) to the liver cancer literature identified through PubMed. We present precision, recall, F1, and accuracy metrics for the classifier and extraction models as appropriate for each phase of the process. We also provide the results for the application of our extraction model. RESULTS: With limited training data, we achieved a high degree of accuracy for both our classifier and for the extraction model for liver cancer disparities research literature performed using epidemiologic methods. The disparities concept was the most challenging to accurately classify, and concepts that appeared infrequently in our data set were the most difficult to extract. CONCLUSION: We provide a roadmap for using ML to classify and extract comprehensive information on multidisciplinary literature. Our technique can be adapted and modified for other cancers or diseases where disparities persist.


Subject(s)
Liver Neoplasms , Machine Learning , Humans , Liver Neoplasms/diagnosis , Liver Neoplasms/epidemiology , Liver Neoplasms/therapy
5.
Nat Commun ; 13(1): 614, 2022 02 01.
Article in English | MEDLINE | ID: mdl-35105868

ABSTRACT

Distinct lung stem cells give rise to lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC). ΔNp63, the p53 family member and p63 isoform, guides the maturation of these stem cells through the regulation of their self-renewal and terminal differentiation; however, the underlying mechanistic role regulated by ∆Np63 in lung cancer development has remained elusive. By utilizing a ΔNp63-specific conditional knockout mouse model and xenograft models of LUAD and LUSC, we found that ∆Np63 promotes non-small cell lung cancer by maintaining the lung stem cells necessary for lung cancer cell initiation and progression in quiescence. ChIP-seq analysis of lung basal cells, alveolar type 2 (AT2) cells, and LUAD reveals robust ∆Np63 regulation of a common landscape of enhancers of cell identity genes. Importantly, one of these genes, BCL9L, is among the enhancer associated genes regulated by ∆Np63 in Kras-driven LUAD and mediates the oncogenic effects of ∆Np63 in both LUAD and LUSC. Accordingly, high BCL9L levels correlate with poor prognosis in LUAD patients. Taken together, our findings provide a unifying oncogenic role for ∆Np63 in both LUAD and LUSC through the regulation of a common landscape of enhancer associated genes.


Subject(s)
Carcinoma, Non-Small-Cell Lung/genetics , Gene Expression Regulation, Neoplastic , Lung Neoplasms/genetics , Adenocarcinoma of Lung/genetics , Adenocarcinoma of Lung/pathology , Animals , Carcinoma, Squamous Cell/genetics , Carcinoma, Squamous Cell/pathology , Cell Line, Tumor , Cell Proliferation , Epithelium , Female , Humans , Lung/pathology , Lung Neoplasms/pathology , Male , Mice , Mice, Knockout
6.
Front Artif Intell ; 4: 754641, 2021.
Article in English | MEDLINE | ID: mdl-34568816

ABSTRACT

The tumor immune microenvironment (TIME) encompasses many heterogeneous cell types that engage in extensive crosstalk among the cancer, immune, and stromal components. The spatial organization of these different cell types in TIME could be used as biomarkers for predicting drug responses, prognosis and metastasis. Recently, deep learning approaches have been widely used for digital histopathology images for cancer diagnoses and prognoses. Furthermore, some recent approaches have attempted to integrate spatial and molecular omics data to better characterize the TIME. In this review we focus on machine learning-based digital histopathology image analysis methods for characterizing tumor ecosystem. In this review, we will consider three different scales of histopathological analyses that machine learning can operate within: whole slide image (WSI)-level, region of interest (ROI)-level, and cell-level. We will systematically review the various machine learning methods in these three scales with a focus on cell-level analysis. We will provide a perspective of workflow on generating cell-level training data sets using immunohistochemistry markers to "weakly-label" the cell types. We will describe some common steps in the workflow of preparing the data, as well as some limitations of this approach. Finally, we will discuss future opportunities of integrating molecular omics data with digital histopathology images for characterizing tumor ecosystem.

7.
Nucleic Acids Res ; 49(W1): W352-W358, 2021 07 02.
Article in English | MEDLINE | ID: mdl-33950204

ABSTRACT

Searching and reading relevant literature is a routine practice in biomedical research. However, it is challenging for a user to design optimal search queries using all the keywords related to a given topic. As such, existing search systems such as PubMed often return suboptimal results. Several computational methods have been proposed as an effective alternative to keyword-based query methods for literature recommendation. However, those methods require specialized knowledge in machine learning and natural language processing, which can make them difficult for biologists to utilize. In this paper, we propose LitSuggest, a web server that provides an all-in-one literature recommendation and curation service to help biomedical researchers stay up to date with scientific literature. LitSuggest combines advanced machine learning techniques for suggesting relevant PubMed articles with high accuracy. In addition to innovative text-processing methods, LitSuggest offers multiple advantages over existing tools. First, LitSuggest allows users to curate, organize, and download classification results in a single interface. Second, users can easily fine-tune LitSuggest results by updating the training corpus. Third, results can be readily shared, enabling collaborative analysis and curation of scientific literature. Finally, LitSuggest provides an automated personalized weekly digest of newly published articles for each user's project. LitSuggest is publicly available at https://www.ncbi.nlm.nih.gov/research/litsuggest.


Subject(s)
Publications , Software , COVID-19 , Data Curation , Healthcare Disparities , Humans , Internet , Liver Neoplasms/epidemiology , Machine Learning
8.
Bioinformatics ; 37(20): 3681-3683, 2021 Oct 25.
Article in English | MEDLINE | ID: mdl-33901274

ABSTRACT

SUMMARY: The heterogeneous cell types of the tumor-immune microenvironment (TIME) play key roles in determining cancer progression, metastasis and response to treatment. We report the development of TIMEx, a novel TIME deconvolution method emphasizing on estimating infiltrating immune cells for bulk transcriptomics using pan-cancer single-cell RNA-seq signatures. We also implemented a comprehensive, user-friendly web-portal for users to evaluate TIMEx and other deconvolution methods with bulk transcriptomic profiles. AVAILABILITY AND IMPLEMENTATION: TIMEx web-portal is freely accessible at http://timex.moffitt.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.
Top Stroke Rehabil ; 28(2): 81-87, 2021 03.
Article in English | MEDLINE | ID: mdl-32482159

ABSTRACT

BACKGROUND: Accurate prediction of fall likelihood is advantageous for instituting fall prevention program in rehabilitation facilities. OBJECTIVE: This study was designed to determine the clinical measures, which can predict the risk of fall events in a rehabilitation hospital. METHODS: Medical records of 166 patients (114 males and 52 females) who were hospitalized in an adult inpatient unit of a rehabilitation hospital were retrospectively analyzed for this study. As predictor variables for assessing fall risk, demographic data and the following measurements were selectively collected from patient's medical records: Tinetti Performance-Oriented Mobility Assessment-Ambulation (POMA-G), Timed Up and Go test (TUG), 10 m walk test, 2 min walk test, Korean version Mini-Mental State Examination (K-MMSE), Korean version of the Modified Barthel Index (KMBI), Berg Balance Scale (BBS), Global Deterioration Scale (GDS), and Morse Fall Scale (Morse FS). RESULTS: The Morse FS, TUG, and age were found to be risk factors for the classification of faller and non-faller groups. CONCLUSION: This study suggests Morse FS, TUG, and age in the routine initial assessment upon admission in a rehabilitation setting, as key variables for screening the risk of fall. Additionally, the cutoff scores of Morse FS and TUG were observed to be more rigid than other clinical settings.


Subject(s)
Accidental Falls/statistics & numerical data , Stroke Rehabilitation , Stroke/complications , Accidental Falls/prevention & control , Adult , Aged , Aged, 80 and over , Female , Hospitalization , Humans , Incidence , Male , Mental Status and Dementia Tests , Middle Aged , Postural Balance , Retrospective Studies , Risk Factors , Sensitivity and Specificity , Stroke/physiopathology , Stroke/psychology , Time and Motion Studies , Walking , Young Adult
10.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32770181

ABSTRACT

MOTIVATION: To obtain key information for personalized medicine and cancer research, clinicians and researchers in the biomedical field are in great need of searching genomic variant information from the biomedical literature now than ever before. Due to the various written forms of genomic variants, however, it is difficult to locate the right information from the literature when using a general literature search system. To address the difficulty of locating genomic variant information from the literature, researchers have suggested various solutions based on automated literature-mining techniques. There is, however, no study for summarizing and comparing existing tools for genomic variant literature mining in terms of how to search easily for information in the literature on genomic variants. RESULTS: In this article, we systematically compared currently available genomic variant recognition and normalization tools as well as the literature search engines that adopted these literature-mining techniques. First, we explain the problems that are caused by the use of non-standard formats of genomic variants in the PubMed literature by considering examples from the literature and show the prevalence of the problem. Second, we review literature-mining tools that address the problem by recognizing and normalizing the various forms of genomic variants in the literature and systematically compare them. Third, we present and compare existing literature search engines that are designed for a genomic variant search by using the literature-mining techniques. We expect this work to be helpful for researchers who seek information about genomic variants from the literature, developers who integrate genomic variant information from the literature and beyond.


Subject(s)
Data Mining , Genetic Variation , Precision Medicine , Search Engine , PubMed , Publications
11.
PLoS Comput Biol ; 16(4): e1007617, 2020 04.
Article in English | MEDLINE | ID: mdl-32324731

ABSTRACT

A massive number of biological entities, such as genes and mutations, are mentioned in the biomedical literature. The capturing of the semantic relatedness of biological entities is vital to many biological applications, such as protein-protein interaction prediction and literature-based discovery. Concept embeddings-which involve the learning of vector representations of concepts using machine learning models-have been employed to capture the semantics of concepts. To develop concept embeddings, named-entity recognition (NER) tools are first used to identify and normalize concepts from the literature, and then different machine learning models are used to train the embeddings. Despite multiple attempts, existing biomedical concept embeddings generally suffer from suboptimal NER tools, small-scale evaluation, and limited availability. In response, we employed high-performance machine learning-based NER tools for concept recognition and trained our concept embeddings, BioConceptVec, via four different machine learning models on ~30 million PubMed abstracts. BioConceptVec covers over 400,000 biomedical concepts mentioned in the literature and is of the largest among the publicly available biomedical concept embeddings to date. To evaluate the validity and utility of BioConceptVec, we respectively performed two intrinsic evaluations (identifying related concepts based on drug-gene and gene-gene interactions) and two extrinsic evaluations (protein-protein interaction prediction and drug-drug interaction extraction), collectively using over 25 million instances from nine independent datasets (17 million instances from six intrinsic evaluation tasks and 8 million instances from three extrinsic evaluation tasks), which is, by far, the most comprehensive to our best knowledge. The intrinsic evaluation results demonstrate that BioConceptVec consistently has, by a large margin, better performance than existing concept embeddings in identifying similar and related concepts. More importantly, the extrinsic evaluation results demonstrate that using BioConceptVec with advanced deep learning models can significantly improve performance in downstream bioinformatics studies and biomedical text-mining applications. Our BioConceptVec embeddings and benchmarking datasets are publicly available at https://github.com/ncbi-nlp/BioConceptVec.


Subject(s)
Computational Biology/methods , Data Mining/methods , Deep Learning , Publications , Algorithms , Databases, Protein , Drug Interactions , Electronic Health Records , Humans , Protein Interaction Mapping , PubMed , Semantics
12.
NPJ Genom Med ; 4: 25, 2019.
Article in English | MEDLINE | ID: mdl-31632691

ABSTRACT

Understanding the drivers of research on human genes is a critical component to success of translation efforts of genomics into medicine and public health. Using publicly available curated online databases we sought to identify specific genes that are featured in translational genetic research in comparison to all genomics research publications. Articles in the CDC's Public Health Genomics and Precision Health Knowledge Base were stratified into studies that have moved beyond basic research to population and clinical epidemiologic studies (T1: clinical and population human genome epidemiology research), and studies that evaluate, implement, and assess impact of genes in clinical and public health areas (T2+: beyond bench to bedside). We examined gene counts and numbers of publications within these phases of translation in comparison to all genes from Medline. We are able to highlight those genes that are moving from basic research to clinical and public health translational research, namely in cancer and a few genetic diseases with high penetrance and clinical actionability. Identifying human genes of translational value is an important step towards determining an evidence-based trajectory of the human genome in clinical and public health practice over time.

13.
PLoS Comput Biol ; 14(8): e1006390, 2018 08.
Article in English | MEDLINE | ID: mdl-30102703

ABSTRACT

Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases.


Subject(s)
Data Curation/methods , Information Storage and Retrieval/methods , Data Curation/statistics & numerical data , Databases, Genetic , Databases, Protein , Deep Learning , Genomics , Knowledge Bases , Machine Learning , Publications
14.
Nucleic Acids Res ; 46(W1): W530-W536, 2018 07 02.
Article in English | MEDLINE | ID: mdl-29762787

ABSTRACT

The identification and interpretation of genomic variants play a key role in the diagnosis of genetic diseases and related research. These tasks increasingly rely on accessing relevant manually curated information from domain databases (e.g. SwissProt or ClinVar). However, due to the sheer volume of medical literature and high cost of expert curation, curated variant information in existing databases are often incomplete and out-of-date. In addition, the same genetic variant can be mentioned in publications with various names (e.g. 'A146T' versus 'c.436G>A' versus 'rs121913527'). A search in PubMed using only one name usually cannot retrieve all relevant articles for the variant of interest. Hence, to help scientists, healthcare professionals, and database curators find the most up-to-date published variant research, we have developed LitVar for the search and retrieval of standardized variant information. In addition, LitVar uses advanced text mining techniques to compute and extract relationships between variants and other associated entities such as diseases and chemicals/drugs. LitVar is publicly available at https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar.


Subject(s)
Data Curation/methods , Data Mining/methods , Polymorphism, Single Nucleotide , Search Engine , User-Computer Interface , Genetics, Medical , Genome, Human , Genomics/methods , Humans , Internet , PubMed , Semantics
15.
JMIR Med Inform ; 6(1): e2, 2018 Jan 05.
Article in English | MEDLINE | ID: mdl-29305341

ABSTRACT

BACKGROUND: With the development of artificial intelligence (AI) technology centered on deep-learning, the computer has evolved to a point where it can read a given text and answer a question based on the context of the text. Such a specific task is known as the task of machine comprehension. Existing machine comprehension tasks mostly use datasets of general texts, such as news articles or elementary school-level storybooks. However, no attempt has been made to determine whether an up-to-date deep learning-based machine comprehension model can also process scientific literature containing expert-level knowledge, especially in the biomedical domain. OBJECTIVE: This study aims to investigate whether a machine comprehension model can process biomedical articles as well as general texts. Since there is no dataset for the biomedical literature comprehension task, our work includes generating a large-scale question answering dataset using PubMed and manually evaluating the generated dataset. METHODS: We present an attention-based deep neural model tailored to the biomedical domain. To further enhance the performance of our model, we used a pretrained word vector and biomedical entity type embedding. We also developed an ensemble method of combining the results of several independent models to reduce the variance of the answers from the models. RESULTS: The experimental results showed that our proposed deep neural network model outperformed the baseline model by more than 7% on the new dataset. We also evaluated human performance on the new dataset. The human evaluation result showed that our deep neural model outperformed humans in comprehension by 22% on average. CONCLUSIONS: In this work, we introduced a new task of machine comprehension in the biomedical domain using a deep neural model. Since there was no large-scale dataset for training deep neural models in the biomedical domain, we created the new cloze-style datasets Biomedical Knowledge Comprehension Title (BMKC_T) and Biomedical Knowledge Comprehension Last Sentence (BMKC_LS) (together referred to as BioMedical Knowledge Comprehension) using the PubMed corpus. The experimental results showed that the performance of our model is much higher than that of humans. We observed that our model performed consistently better regardless of the degree of difficulty of a text, whereas humans have difficulty when performing biomedical literature comprehension tasks that require expert level knowledge.

16.
BMC Bioinformatics ; 19(1): 21, 2018 01 25.
Article in English | MEDLINE | ID: mdl-29368597

ABSTRACT

BACKGROUND: Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models have increasingly led to the identification of novel gene-mutation-drug relations, and these results have been reported and published in the scientific literature. RESULTS: Here, we present two new computational methods that utilize all the PubMed articles as domain specific background knowledge to assist in the extraction and curation of gene-mutation-drug relations from the literature. The first method uses the Biomedical Entity Search Tool (BEST) scoring results as some of the features to train the machine learning classifiers. The second method uses not only the BEST scoring results, but also word vectors in a deep convolutional neural network model that are constructed from and trained on numerous documents such as PubMed abstracts and Google News articles. Using the features obtained from both the BEST search engine scores and word vectors, we extract mutation-gene and mutation-drug relations from the literature using machine learning classifiers such as random forest and deep convolutional neural networks. Our methods achieved better results compared with the state-of-the-art methods. We used our proposed features in a simple machine learning model, and obtained F1-scores of 0.96 and 0.82 for mutation-gene and mutation-drug relation classification, respectively. We also developed a deep learning classification model using convolutional neural networks, BEST scores, and the word embeddings that are pre-trained on PubMed or Google News data. Using deep learning, the classification accuracy improved, and F1-scores of 0.96 and 0.86 were obtained for the mutation-gene and mutation-drug relations, respectively. CONCLUSION: We believe that our computational methods described in this research could be used as an important tool in identifying molecular biomarkers that predict drug responses in cancer patients. We also built a database of these mutation-gene-drug relations that were extracted from all the PubMed abstracts. We believe that our database can prove to be a valuable resource for precision medicine researchers.


Subject(s)
Drug Resistance, Neoplasm/genetics , Search Engine , Antineoplastic Agents/therapeutic use , Databases, Factual , Humans , Mutation , Neoplasms/drug therapy , Neoplasms/genetics , Neoplasms/pathology , Neural Networks, Computer , Precision Medicine
17.
PLoS One ; 13(1): e0190926, 2018.
Article in English | MEDLINE | ID: mdl-29373599

ABSTRACT

Detecting drug-drug interactions (DDI) is important because information on DDIs can help prevent adverse effects from drug combinations. Since there are many new DDI-related papers published in the biomedical domain, manually extracting DDI information from the literature is a laborious task. However, text mining can be used to find DDIs in the biomedical literature. Among the recently developed neural networks, we use a Recursive Neural Network to improve the performance of DDI extraction. Our recursive neural network model uses a position feature, a subtree containment feature, and an ensemble method to improve the performance of DDI extraction. Compared with the state-of-the-art models, the DDI detection and type classifiers of our model performed 4.4% and 2.8% better, respectively, on the DDIExtraction Challenge'13 test data. We also validated our model on the PK DDI corpus that consists of two types of DDIs data: in vivo DDI and in vitro DDI. Compared with the existing model, our detection classifier performed 2.3% and 6.7% better on in vivo and in vitro data respectively. The results of our validation demonstrate that our model can automatically extract DDIs better than existing models.


Subject(s)
Data Mining/methods , Drug Interactions , Neural Networks, Computer , Data Mining/statistics & numerical data , Drug-Related Side Effects and Adverse Reactions , Humans , Natural Language Processing , Pharmacokinetics , Publications , Support Vector Machine
18.
Nucleic Acids Res ; 45(D1): D784-D789, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899563

ABSTRACT

Fusion gene is an important class of therapeutic targets and prognostic markers in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data and manual curations. In this update, the database coverage was enhanced considerably by adding two new modules of The Cancer Genome Atlas (TCGA) RNA-Seq analysis and PubMed abstract mining. ChimerDB 3.0 is composed of three modules of ChimerKB, ChimerPub and ChimerSeq. ChimerKB represents a knowledgebase including 1066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences. ChimerPub includes 2767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq module is designed to archive the fusion candidates from deep sequencing data. Importantly, we have analyzed RNA-Seq data of the TCGA project covering 4569 patients in 23 cancer types using two reliable programs of FusionScan and TopHat-Fusion. The new user interface supports diverse search options and graphic representation of fusion gene structure. ChimerDB 3.0 is available at http://ercsb.ewha.ac.kr/fusiongene/.


Subject(s)
Data Mining , Databases, Genetic , Neoplasms/genetics , Oncogene Proteins, Fusion/genetics , Transcriptome , Computational Biology/methods , Gene Expression Profiling/methods , Humans , Software , User-Computer Interface
19.
PLoS One ; 11(10): e0164680, 2016.
Article in English | MEDLINE | ID: mdl-27760149

ABSTRACT

As the volume of publications rapidly increases, searching for relevant information from the literature becomes more challenging. To complement standard search engines such as PubMed, it is desirable to have an advanced search tool that directly returns relevant biomedical entities such as targets, drugs, and mutations rather than a long list of articles. Some existing tools submit a query to PubMed and process retrieved abstracts to extract information at query time, resulting in a slow response time and limited coverage of only a fraction of the PubMed corpus. Other tools preprocess the PubMed corpus to speed up the response time; however, they are not constantly updated, and thus produce outdated results. Further, most existing tools cannot process sophisticated queries such as searches for mutations that co-occur with query terms in the literature. To address these problems, we introduce BEST, a biomedical entity search tool. BEST returns, as a result, a list of 10 different types of biomedical entities including genes, diseases, drugs, targets, transcription factors, miRNAs, and mutations that are relevant to a user's query. To the best of our knowledge, BEST is the only system that processes free text queries and returns up-to-date results in real time including mutation information in the results. BEST is freely accessible at http://best.korea.ac.kr.


Subject(s)
Biomedical Research , Data Mining/methods , Drug Resistance/genetics , Mutation , Publications , User-Computer Interface
20.
Bioinformatics ; 32(18): 2886-8, 2016 09 15.
Article in English | MEDLINE | ID: mdl-27485446

ABSTRACT

UNLABELLED: We introduce HiPub, a seamless Chrome browser plug-in that automatically recognizes, annotates and translates biomedical entities from texts into networks for knowledge discovery. Using a combination of two different named-entity recognition resources, HiPub can recognize genes, proteins, diseases, drugs, mutations and cell lines in texts, and achieve high precision and recall. HiPub extracts biomedical entity-relationships from texts to construct context-specific networks, and integrates existing network data from external databases for knowledge discovery. It allows users to add additional entities from related articles, as well as user-defined entities for discovering new and unexpected entity-relationships. HiPub provides functional enrichment analysis on the biomedical entity network, and link-outs to external resources to assist users in learning new entities and relations. AVAILABILITY AND IMPLEMENTATION: HiPub and detailed user guide are available at http://hipub.korea.ac.kr CONTACT: kangj@korea.ac.kr, aikchoon.tan@ucdenver.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Data Curation , Databases, Factual , Pattern Recognition, Automated , Algorithms , Computational Biology/methods , Genes , Humans , Pharmaceutical Preparations , Proteins , PubMed , Search Engine
SELECTION OF CITATIONS
SEARCH DETAIL
...