Pesquisa | Portal Regional da BVS

Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration.

Grissa, Dhouha; Junge, Alexander; Oprea, Tudor I; Jensen, Lars Juhl.

Database (Oxford) ; 20222022 03 28.

Artigo em Inglês | MEDLINE | ID: mdl-35348648

RESUMO

The scientific knowledge about which genes are involved in which diseases grows rapidly, which makes it difficult to keep up with new publications and genetics datasets. The DISEASES database aims to provide a comprehensive overview by systematically integrating and assigning confidence scores to evidence for disease-gene associations from curated databases, genome-wide association studies (GWAS) and automatic text mining of the biomedical literature. Here, we present a major update to this resource, which greatly increases the number of associations from all these sources. This is especially true for the text-mined associations, which have increased by at least 9-fold at all confidence cutoffs. We show that this dramatic increase is primarily due to adding full-text articles to the text corpus, secondarily due to improvements to both the disease and gene dictionaries used for named entity recognition, and only to a very small extent due to the growth in number of PubMed abstracts. DISEASES now also makes use of a new GWAS database, Target Illumination by GWAS Analytics, which considerably increased the number of GWAS-derived disease-gene associations. DISEASES itself is also integrated into several other databases and resources, including GeneCards/MalaCards, Pharos/Target Central Resource Database and the Cytoscape stringApp. All data in DISEASES are updated on a weekly basis and is available via a web interface at https://diseases.jensenlab.org, from where it can also be downloaded under open licenses. Database URL: https://diseases.jensenlab.org.

Assuntos

Mineração de Dados , Estudo de Associação Genômica Ampla , Bases de Dados Factuais

TIGA: target illumination GWAS analytics.

Yang, Jeremy J; Grissa, Dhouha; Lambert, Christophe G; Bologa, Cristian G; Mathias, Stephen L; Waller, Anna; Wild, David J; Jensen, Lars Juhl; Oprea, Tudor I.

Bioinformatics ; 37(21): 3865-3873, 2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34086846

RESUMO

MOTIVATION: Genome-wide association studies can reveal important genotype-phenotype associations; however, data quality and interpretability issues must be addressed. For drug discovery scientists seeking to prioritize targets based on the available evidence, these issues go beyond the single study. RESULTS: Here, we describe rational ranking, filtering and interpretation of inferred gene-trait associations and data aggregation across studies by leveraging existing curation and harmonization efforts. Each gene-trait association is evaluated for confidence, with scores derived solely from aggregated statistics, linking a protein-coding gene and phenotype. We propose a method for assessing confidence in gene-trait associations from evidence aggregated across studies, including a bibliometric assessment of scientific consensus based on the iCite relative citation ratio, and meanRank scores, to aggregate multivariate evidence.This method, intended for drug target hypothesis generation, scoring and ranking, has been implemented as an analytical pipeline, available as open source, with public datasets of results, and a web application designed for usability by drug discovery scientists. AVAILABILITY AND IMPLEMENTATION: Web application, datasets and source code via https://unmtid-shinyapps.net/tiga/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Estudo de Associação Genômica Ampla , Iluminação , Genótipo , Polimorfismo de Nucleotídeo Único , Fenótipo

Alcoholic liver disease: A registry view on comorbidities and disease prediction.

Grissa, Dhouha; Nytoft Rasmussen, Ditlev; Krag, Aleksander; Brunak, Søren; Juhl Jensen, Lars.

PLoS Comput Biol ; 16(9): e1008244, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32960884

RESUMO

Alcoholic-related liver disease (ALD) is the cause of more than half of all liver-related deaths. Sustained excess drinking causes fatty liver and alcohol-related steatohepatitis, which may progress to alcoholic liver fibrosis (ALF) and eventually to alcohol-related liver cirrhosis (ALC). Unfortunately, it is difficult to identify patients with early-stage ALD, as these are largely asymptomatic. Consequently, the majority of ALD patients are only diagnosed by the time ALD has reached decompensated cirrhosis, a symptomatic phase marked by the development of complications as bleeding and ascites. The main goal of this study is to discover relevant upstream diagnoses helping to understand the development of ALD, and to highlight meaningful downstream diagnoses that represent its progression to liver failure. Here, we use data from the Danish health registries covering the entire population of Denmark during nineteen years (1996-2014), to examine if it is possible to identify patients likely to develop ALF or ALC based on their past medical history. To this end, we explore a knowledge discovery approach by using high-dimensional statistical and machine learning techniques to extract and analyze data from the Danish National Patient Registry. Consistent with the late diagnoses of ALD, we find that ALC is the most common form of ALD in the registry data and that ALC patients have a strong over-representation of diagnoses associated with liver dysfunction. By contrast, we identify a small number of patients diagnosed with ALF who appear to be much less sick than those with ALC. We perform a matched case-control study using the group of patients with ALC as cases and their matched patients with non-ALD as controls. Machine learning models (SVM, RF, LightGBM and NaiveBayes) trained and tested on the set of ALC patients achieve a high performance for data classification (AUC = 0.89). When testing the same trained models on the small set of ALF patients, their performance unsurprisingly drops a lot (AUC = 0.67 for NaiveBayes). The statistical and machine learning results underscore small groups of upstream and downstream comorbidities that accurately detect ALC patients and show promise in prediction of ALF. Some of these groups are conditions either caused by alcohol or caused by malnutrition associated with alcohol-overuse. Others are comorbidities either related to trauma and life-style or to complications to cirrhosis, such as oesophageal varices. Our findings highlight the potential of this approach to uncover knowledge in registry data related to ALD.

Assuntos

Hepatopatias Alcoólicas/epidemiologia , Hepatopatias Alcoólicas/patologia , Aprendizado de Máquina , Modelos Estatísticos , Idoso , Idoso de 80 Anos ou mais , Comorbidade , Dinamarca , Feminino , Humanos , Falência Hepática/prevenção & controle , Masculino , Pessoa de Meia-Idade , Sistema de Registros , Fatores de Risco

Systems Metabolomics for Prediction of Metabolic Syndrome.

Pujos-Guillot, Estelle; Brandolini, Marion; Pétéra, Mélanie; Grissa, Dhouha; Joly, Charlotte; Lyan, Bernard; Herquelot, Éléonore; Czernichow, Sébastien; Zins, Marie; Goldberg, Marcel; Comte, Blandine.

J Proteome Res ; 16(6): 2262-2272, 2017 06 02.

Artigo em Inglês | MEDLINE | ID: mdl-28440083

RESUMO

The evolution of human health is a continuum of transitions, involving multifaceted processes at multiple levels, and there is an urgent need for integrative biomarkers that can characterize and predict progression toward disease development. The objective of this work was to perform a systems metabolomics approach to predict metabolic syndrome (MetS) development. A case-control design was used within the French occupational GAZEL cohort (n = 112 males: discovery study; n = 94: replication/validation study). Our integrative strategy was to combine untargeted metabolomics with clinical, sociodemographic, and food habit parameters to describe early phenotypes and build multidimensional predictive models. Different models were built from the discriminant variables, and prediction performances were optimized either when reducing the number of metabolites used or when keeping the associated signature. We illustrated that a selected reduced metabolic profile was able to reveal subtle phenotypic differences 5 years before MetS occurrence. Moreover, resulting metabolomic markers, when combined with clinical characteristics, allowed improving the disease development prediction. The validation study showed that this predictive performance was specific to the MetS component. This work also demonstrates the interest of such an approach to discover subphenotypes that will need further characterization to be able to shift to molecular reclassification and targeting of MetS.

Assuntos

Síndrome Metabólica/diagnóstico , Metabolômica/métodos , Valor Preditivo dos Testes , Biologia de Sistemas/métodos , Biomarcadores , Estudos de Casos e Controles , Progressão da Doença , França , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo

Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data.

Grissa, Dhouha; Pétéra, Mélanie; Brandolini, Marion; Napoli, Amedeo; Comte, Blandine; Pujos-Guillot, Estelle.

Front Mol Biosci ; 3: 30, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27458587

RESUMO

Untargeted metabolomics is a powerful phenotyping tool for better understanding biological mechanisms involved in human pathology development and identifying early predictive biomarkers. This approach, based on multiple analytical platforms, such as mass spectrometry (MS), chemometrics and bioinformatics, generates massive and complex data that need appropriate analyses to extract the biologically meaningful information. Despite various tools available, it is still a challenge to handle such large and noisy datasets with limited number of individuals without risking overfitting. Moreover, when the objective is focused on the identification of early predictive markers of clinical outcome, few years before occurrence, it becomes essential to use the appropriate algorithms and workflow to be able to discover subtle effects among this large amount of data. In this context, this work consists in studying a workflow describing the general feature selection process, using knowledge discovery and data mining methodologies to propose advanced solutions for predictive biomarker discovery. The strategy was focused on evaluating a combination of numeric-symbolic approaches for feature selection with the objective of obtaining the best combination of metabolites producing an effective and accurate predictive model. Relying first on numerical approaches, and especially on machine learning methods (SVM-RFE, RF, RF-RFE) and on univariate statistical analyses (ANOVA), a comparative study was performed on an original metabolomic dataset and reduced subsets. As resampling method, LOOCV was applied to minimize the risk of overfitting. The best k-features obtained with different scores of importance from the combination of these different approaches were compared and allowed determining the variable stabilities using Formal Concept Analysis. The results revealed the interest of RF-Gini combined with ANOVA for feature selection as these two complementary methods allowed selecting the 48 best candidates for prediction. Using linear logistic regression on this reduced dataset enabled us to obtain the best performances in terms of prediction accuracy and number of false positive with a model including 5 top variables. Therefore, these results highlighted the interest of feature selection methods and the importance of working on reduced datasets for the identification of predictive biomarkers issued from untargeted metabolomics data.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA