Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 51
Filter
1.
Open Res Eur ; 3: 97, 2023.
Article in English | MEDLINE | ID: mdl-37645489

ABSTRACT

Background: Data management is fast becoming an essential part of scientific practice, driven by open science and FAIR (findable, accessible, interoperable, and reusable) data sharing requirements. Whilst data management plans (DMPs) are clear to data management experts and data stewards, understandings of their purpose and creation are often obscure to the producers of the data, which in academic environments are often PhD students. Methods: Within the RNAct EU Horizon 2020 ITN project, we engaged the 10 RNAct early-stage researchers (ESRs) in a training project aimed at formulating a DMP. To do so, we used the Data Stewardship Wizard (DSW) framework and modified the existing Life Sciences Knowledge Model into a simplified version aimed at training young scientists, with computational or experimental backgrounds, in core data management principles. We collected feedback from the ESRs during this exercise. Results: Here, we introduce our new life-sciences training DMP template for young scientists. We report and discuss our experiences as principal investigators (PIs) and ESRs during this project and address the typical difficulties that are encountered in developing and understanding a DMP. Conclusions: We found that the DS-wizard can also be an appropriate tool for DMP training, to get terminology and concepts across to researchers. A full training in addition requires an upstream step to present basic DMP concepts and a downstream step to publish a dataset in a (public) repository. Overall, the DS-Wizard tool was essential for our DMP training and we hope our efforts can be used in other projects.

2.
Bioinform Adv ; 3(1): vbad081, 2023.
Article in English | MEDLINE | ID: mdl-37431435

ABSTRACT

Motivation: Protein domains can be viewed as building blocks, essential for understanding structure-function relationships in proteins. However, each domain database classifies protein domains using its own methodology. Thus, in many cases, domain models and boundaries differ from one domain database to the other, raising the question of domain definition and enumeration of true domain instances. Results: We propose an automated iterative workflow to assess protein domain classification by cross-mapping domain structural instances between domain databases and by evaluating structural alignments. CroMaSt (for Cross-Mapper of domain Structural instances) will classify all experimental structural instances of a given domain type into four different categories ('Core', 'True', 'Domain-like' and 'Failed'). CroMast is developed in Common Workflow Language and takes advantage of two well-known domain databases with wide coverage: Pfam and CATH. It uses the Kpax structural alignment tool with expert-adjusted parameters. CroMaSt was tested with the RNA Recognition Motif domain type and identifies 962 'True' and 541 'Domain-like' structural instances for this domain type. This method solves a crucial issue in domain-centric research and can generate essential information that could be used for synthetic biology and machine-learning approaches of protein domain engineering. Availability and implementation: The workflow and the Results archive for the CroMaSt runs presented in this article are available from WorkflowHub (doi: 10.48546/workflowhub.workflow.390.2). Supplementary information: Supplementary data are available at Bioinformatics Advances online.

3.
J Biomed Semantics ; 14(1): 7, 2023 Jul 01.
Article in English | MEDLINE | ID: mdl-37393296

ABSTRACT

The current rise of Open Science and Reproducibility in the Life Sciences requires the creation of rich, machine-actionable metadata in order to better share and reuse biological digital resources such as datasets, bioinformatics tools, training materials, etc. For this purpose, FAIR principles have been defined for both data and metadata and adopted by large communities, leading to the definition of specific metrics. However, automatic FAIRness assessment is still difficult because computational evaluations frequently require technical expertise and can be time-consuming. As a first step to address these issues, we propose FAIR-Checker, a web-based tool to assess the FAIRness of metadata presented by digital resources. FAIR-Checker offers two main facets: a "Check" module providing a thorough metadata evaluation and recommendations, and an "Inspect" module which assists users in improving metadata quality and therefore the FAIRness of their resource. FAIR-Checker leverages Semantic Web standards and technologies such as SPARQL queries and SHACL constraints to automatically assess FAIR metrics. Users are notified of missing, necessary, or recommended metadata for various resource categories. We evaluate FAIR-Checker in the context of improving the FAIRification of individual resources, through better metadata, as well as analyzing the FAIRness of more than 25 thousand bioinformatics software descriptions.


Subject(s)
Biological Science Disciplines , Pattern Recognition, Automated , Reproducibility of Results , Semantic Web , Computational Biology
4.
Sci Rep ; 13(1): 3643, 2023 03 04.
Article in English | MEDLINE | ID: mdl-36871056

ABSTRACT

The search for an effective drug is still urgent for COVID-19 as no drug with proven clinical efficacy is available. Finding the new purpose of an approved or investigational drug, known as drug repurposing, has become increasingly popular in recent years. We propose here a new drug repurposing approach for COVID-19, based on knowledge graph (KG) embeddings. Our approach learns "ensemble embeddings" of entities and relations in a COVID-19 centric KG, in order to get a better latent representation of the graph elements. Ensemble KG-embeddings are subsequently used in a deep neural network trained for discovering potential drugs for COVID-19. Compared to related works, we retrieve more in-trial drugs among our top-ranked predictions, thus giving greater confidence in our prediction for out-of-trial drugs. For the first time to our knowledge, molecular docking is then used to evaluate the predictions obtained from drug repurposing using KG embedding. We show that Fosinopril is a potential ligand for the SARS-CoV-2 nsp13 target. We also provide explanations of our predictions thanks to rules extracted from the KG and instanciated by KG-derived explanatory paths. Molecular evaluation and explanatory paths bring reliability to our results and constitute new complementary and reusable methods for assessing KG-based drug repurposing.


Subject(s)
COVID-19 , Humans , SARS-CoV-2 , Drug Repositioning , Molecular Docking Simulation , Pattern Recognition, Automated , Reproducibility of Results , Learning
5.
BMC Bioinformatics ; 23(Suppl 2): 433, 2022 Dec 12.
Article in English | MEDLINE | ID: mdl-36510133

ABSTRACT

BACKGROUND: Automatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problem RESULTS: In this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in: Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent-child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure. CONCLUSION: Our results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions.


Subject(s)
Computational Biology , Semantics , Humans , Gene Ontology , Molecular Sequence Annotation , Computational Biology/methods , Databases, Protein , Proteins/chemistry
6.
J Biomed Inform ; 135: 104212, 2022 11.
Article in English | MEDLINE | ID: mdl-36182054

ABSTRACT

Machine learning is now an essential part of any biomedical study but its integration into real effective Learning Health Systems, including the whole process of Knowledge Discovery from Data (KDD), is not yet realised. We propose an original extension of the KDD process model that involves an inductive database. We designed for the first time a generic model of Inductive Clinical DataBase (ICDB) aimed at hosting both patient data and learned models. We report experiments conducted on patient data in the frame of a project dedicated to fight heart failure. The results show how the ICDB approach allows to identify biomarker combinations, specific and predictive of heart fibrosis phenotype, that put forward hypotheses relative to underlying mechanisms. Two main scenarios were considered, a local-to-global KDD scenario and a trans-cohort alignment scenario. This promising proof of concept enables us to draw the contours of a next-generation Knowledge Discovery Environment (KDE).


Subject(s)
Data Mining , Knowledge Discovery , Databases, Factual
7.
J Chem Inf Model ; 62(12): 3107-3122, 2022 06 27.
Article in English | MEDLINE | ID: mdl-35754360

ABSTRACT

Emerging SARS-CoV-2 variants raise concerns about our ability to withstand the Covid-19 pandemic, and therefore, understanding mechanistic differences of those variants is crucial. In this study, we investigate disparities between the SARS-CoV-2 wild type and five variants that emerged in late 2020, focusing on the structure and dynamics of the spike protein interface with the human angiotensin-converting enzyme 2 (ACE2) receptor, by using crystallographic structures and extended analysis of microsecond molecular dynamics simulations. Dihedral angle principal component analysis (PCA) showed the strong similarities in the spike receptor binding domain (RBD) dynamics of the Alpha, Beta, Gamma, and Delta variants, in contrast with those of WT and Epsilon. Dynamical perturbation networks and contact PCA identified the peculiar interface dynamics of the Delta variant, which cannot be directly imputable to its specific L452R and T478K mutations since those residues are not in direct contact with the human ACE2 receptor. Our outcome shows that in the Delta variant the L452R and T478K mutations act synergistically on neighboring residues to provoke drastic changes in the spike/ACE2 interface; thus a singular mechanism of action eventually explains why it dominated over preceding variants.


Subject(s)
COVID-19 , SARS-CoV-2 , Angiotensin-Converting Enzyme 2/genetics , Humans , Molecular Dynamics Simulation , Mutation , Pandemics , Protein Binding , SARS-CoV-2/genetics
8.
JACC Cardiovasc Imaging ; 15(2): 193-208, 2022 02.
Article in English | MEDLINE | ID: mdl-34538625

ABSTRACT

OBJECTIVES: This study sought to identify homogenous echocardiographic phenotypes in community-based cohorts and assess their association with outcomes. BACKGROUND: Asymptomatic cardiac dysfunction leads to a high risk of long-term cardiovascular morbidity and mortality; however, better echocardiographic classification of asymptomatic individuals remains a challenge. METHODS: Echocardiographic phenotypes were identified using K-means clustering in the first generation of the STANISLAS (Yearly non-invasive follow-up of Health status of Lorraine insured inhabitants) cohort (N = 827; mean age: 60 ± 5 years; men: 48%), and their associations with vascular function and circulating biomarkers were also assessed. These phenotypes were externally validated in the Malmö Preventive Project cohort (N = 1,394; mean age: 67 ± 6 years; men: 70%), and their associations with the composite of cardiovascular mortality (CVM) or heart failure hospitalization (HFH) were assessed as well. RESULTS: Three echocardiographic phenotypes were identified as "mostly normal (MN)" (n = 334), "diastolic changes (D)" (n = 323), and "diastolic changes with structural remodeling (D/S)" (n = 170). The D and D/S phenotypes had similar ages, body mass indices, cardiovascular risk factors, vascular impairments, and diastolic function changes. The D phenotype consisted mainly of women and featured increased levels of inflammatory biomarkers, whereas the D/S phenotype, consisted predominantly of men, displayed the highest values of left ventricular mass, volume, and remodeling biomarkers. The phenotypes were predicted based on a simple algorithm including e', left ventricular mass and volume (e'VM algorithm). In the Malmö cohort, subgroups derived from e'VM algorithm were significantly associated with a higher risk of CVM and HFH (adjusted HR in the D phenotype = 1.87; 95% CI: 1.04 to 3.37; adjusted HR in the D/S phenotype = 3.02; 95% CI: 1.71 to 5.34). CONCLUSIONS: Among asymptomatic, middle-aged individuals, echocardiographic data-driven classification based on the simple e'VM algorithm identified profiles with different long-term HF risk. (4th Visit at 17 Years of Cohort STANISLAS-Stanislas Ancillary Study ESCIF [STANISLASV4]; NCT01391442).


Subject(s)
Echocardiography , Heart Failure , Aged , Female , Heart Failure/diagnostic imaging , Heart Failure/epidemiology , Humans , Incidence , Machine Learning , Male , Middle Aged , Phenotype , Predictive Value of Tests , Prognosis , Stroke Volume , Ventricular Function, Left
9.
PLoS Comput Biol ; 17(8): e1008844, 2021 08.
Article in English | MEDLINE | ID: mdl-34370723

ABSTRACT

Many biological processes are mediated by protein-protein interactions (PPIs). Because protein domains are the building blocks of proteins, PPIs likely rely on domain-domain interactions (DDIs). Several attempts exist to infer DDIs from PPI networks but the produced datasets are heterogeneous and sometimes not accessible, while the PPI interactome data keeps growing. We describe a new computational approach called "PPIDM" (Protein-Protein Interactions Domain Miner) for inferring DDIs using multiple sources of PPIs. The approach is an extension of our previously described "CODAC" (Computational Discovery of Direct Associations using Common neighbors) method for inferring new edges in a tripartite graph. The PPIDM method has been applied to seven widely used PPI resources, using as "Gold-Standard" a set of DDIs extracted from 3D structural databases. Overall, PPIDM has produced a dataset of 84,552 non-redundant DDIs. Statistical significance (p-value) is calculated for each source of PPI and used to classify the PPIDM DDIs in Gold (9,175 DDIs), Silver (24,934 DDIs) and Bronze (50,443 DDIs) categories. Dataset comparison reveals that PPIDM has inferred from the 2017 releases of PPI sources about 46% of the DDIs present in the 2020 release of the 3did database, not counting the DDIs present in the Gold-Standard. The PPIDM dataset contains 10,229 DDIs that are consistent with more than 13,300 PPIs extracted from the IMEx database, and nearly 23,300 DDIs (27.5%) that are consistent with more than 214,000 human PPIs extracted from the STRING database. Examples of newly inferred DDIs covering more than 10 PPIs in the IMEx database are provided. Further exploitation of the PPIDM DDI reservoir includes the inventory of possible partners of a protein of interest and characterization of protein interactions at the domain level in combination with other methods. The result is publicly available at http://ppidm.loria.fr/.


Subject(s)
Protein Interaction Domains and Motifs , Protein Interaction Mapping/statistics & numerical data , Protein Interaction Maps , Algorithms , Computational Biology , Data Mining/statistics & numerical data , Databases, Protein/statistics & numerical data , Humans , Software
10.
Sci Rep ; 11(1): 4202, 2021 02 18.
Article in English | MEDLINE | ID: mdl-33603019

ABSTRACT

The choice of the most appropriate unsupervised machine-learning method for "heterogeneous" or "mixed" data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of "ready-to-use" tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.

11.
Biol Sex Differ ; 11(1): 47, 2020 08 24.
Article in English | MEDLINE | ID: mdl-32831121

ABSTRACT

BACKGROUND: Many patients with heart failure with preserved ejection fraction (HFpEF) are women. Exploring mechanisms underlying the sex differences may improve our understanding of the pathophysiology of HFpEF. Studies focusing on sex differences in circulating proteins in HFpEF patients are scarce. METHODS: A total of 415 proteins were analyzed in 392 HFpEF patients included in The Metabolic Road to Diastolic Heart Failure: Diastolic Heart Failure study (MEDIA-DHF). Sex differences in these proteins were assessed using adjusted logistic regression analyses. The associations between candidate proteins and cardiovascular (CV) death or CV hospitalization (with sex interaction) were assessed using Cox regression models. RESULTS: We found 9 proteins to be differentially expressed between female and male patients. Women expressed more LPL and PLIN1, which are markers of lipid metabolism; more LHB, IGFBP3, and IL1RL2 as markers of transcriptional regulation; and more Ep-CAM as marker of hemostasis. Women expressed less MMP-3, which is a marker associated with extracellular matrix organization; less NRP1, which is associated with developmental processes; and less ACE2, which is related to metabolism. Sex was not associated with the study outcomes (adj. HR 1.48, 95% CI 0.83-2.63), p = 0.18. CONCLUSION: In chronic HFpEF, assessing sex differences in a wide range of circulating proteins led to the identification of 9 proteins that were differentially expressed between female and male patients. These findings may help further investigations into potential pathophysiological processes contributing to HFpEF.


Subject(s)
Gene Expression Regulation/physiology , Heart Failure/metabolism , Stroke Volume/physiology , Aged , Aged, 80 and over , Biomarkers/blood , Female , Humans , Male , Sex Factors
12.
Eur J Endocrinol ; 183(3): 285-295, 2020 Sep.
Article in English | MEDLINE | ID: mdl-32567559

ABSTRACT

OBJECTIVE: Determining the factors associated with new-onset pre-diabetes and type 2 diabetes mellitus (T2D) is important for improving the current prevention strategies and for a better understanding of the disease. DESIGN: To study the factors (clinical, circulating protein and genetic) associated with new onset pre-diabetes and T2D in an initially healthy (without diabetes) populational familial cohort with a long follow-up (STANISLAS cohort). METHODS: A total of 1506 participants attended both the visit 1 and visit 4, separated by ≈20 years. Over 400 proteins, GWAS and genetic associations were studied using models adjusted for potential confounders. Both prospective (V1 to V4) and cross-sectional (V4) analyses were performed. RESULTS: People who developed pre-diabetes (n = 555) and/or T2D (n = 73) were older, had higher BMI, blood pressure, glucose, LDL cholesterol, and lower eGFR. After multivariable selection, PAPP-A (pappalysin-1) was the only circulating protein associated with the onset of both pre-diabetes and T2D with associations persisting at visit 4 (i.e. ≈20 years later). FGF-21 (fibroblast growth factor 21) was a strong prognosticator for incident T2D in the longitudinal analysis, but not in the cross-sectional analysis. The heritability of the circulating PAPP-A was estimated at 44%. In GWAS analysis, the SNP rs634737 was associated with PAPP-A both at V1 and V4. External replication also showed lower levels of PAPP-A in patients with T2D. CONCLUSIONS: The risk of developing pre-diabetes and T2D increases with age and with features of the metabolic syndrome. Circulating PAPP-A, which has an important genetic component, was associated with both the development and presence of pre-diabetes and T2D.


Subject(s)
Blood Proteins/genetics , Blood Proteins/metabolism , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/metabolism , Genomics/methods , Proteomics/methods , Adult , Cohort Studies , Cross-Sectional Studies , Diabetes Mellitus, Type 2/blood , Female , Humans , Male , Prediabetic State/blood , Prediabetic State/genetics , Prediabetic State/metabolism , Prospective Studies , Risk Factors , Young Adult
13.
Biomarkers ; 25(2): 201-211, 2020 Mar.
Article in English | MEDLINE | ID: mdl-32063068

ABSTRACT

Background: Heart failure with preserved ejection fraction (HFpEF) is a heterogeneous syndrome for which clear evidence of effective therapies is lacking. Understanding which factors determine this heterogeneity may be helped by better phenotyping. An unsupervised statistical approach applied to a large set of biomarkers may identify distinct HFpEF phenotypes.Methods: Relevant proteomic biomarkers were analyzed in 392 HFpEF patients included in Metabolic Road to Diastolic HF (MEDIA-DHF). We performed an unsupervised cluster analysis to define distinct phenotypes. Cluster characteristics were explored with logistic regression. The association between clusters and 1-year cardiovascular (CV) death and/or CV hospitalization was studied using Cox regression.Results: Based on 415 biomarkers, we identified 2 distinct clusters. Clinical variables associated with cluster 2 were diabetes, impaired renal function, loop diuretics and/or betablockers. In addition, 17 biomarkers were higher expressed in cluster 2 vs. 1. Patients in cluster 2 vs. those in 1 experienced higher rates of CV death/CV hospitalization (adj. HR 1.93, 95% CI 1.12-3.32, p = 0.017). Complex-network analyses linked these biomarkers to immune system activation, signal transduction cascades, cell interactions and metabolism.Conclusion: Unsupervised machine-learning algorithms applied to a wide range of biomarkers identified 2 HFpEF clusters with different CV phenotypes and outcomes. The identified pathways may provide a basis for future research.Clinical significanceMore insight is obtained in the mechanisms related to poor outcome in HFpEF patients since it was demonstrated that biomarkers associated with the high-risk cluster were related to the immune system, signal transduction cascades, cell interactions and metabolismBiomarkers (and pathways) identified in this study may help select high-risk HFpEF patients which could be helpful for the inclusion/exclusion of patients in future trials.Our findings may be the basis of investigating therapies specifically targeting these pathways and the potential use of corresponding markers potentially identifying patients with distinct mechanistic bioprofiles most likely to respond to the selected mechanistically targeted therapies.


Subject(s)
Heart Failure/physiopathology , Phenotype , Aged , Biomarkers/analysis , Cluster Analysis , Female , Humans , Machine Learning , Male , Middle Aged , Proteomics , Stroke Volume
14.
Sci Data ; 7(1): 3, 2020 01 02.
Article in English | MEDLINE | ID: mdl-31896797

ABSTRACT

Pharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes PGx-related knowledge a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly reusable by humans or software. Natural language processing techniques have been developed to guide experts who curate this amount of knowledge. But existing works are limited by the absence of a high quality annotated corpus focusing on PGx domain. In particular, this absence restricts the use of supervised machine learning. This article introduces PGxCorpus, a manually annotated corpus, designed to fill this gap and to enable the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly gene variations, genes, drugs and phenotypes), and relationships between those. In this article, we present the corpus itself, its construction and a baseline experiment that illustrates how it may be leveraged to synthesize and summarize PGx knowledge.


Subject(s)
Data Curation , Pharmacogenetics , Supervised Machine Learning , Humans , PubMed
15.
Clin Res Cardiol ; 109(1): 22-33, 2020 Jan.
Article in English | MEDLINE | ID: mdl-31062082

ABSTRACT

BACKGROUND: Hypertension, obesity and diabetes are major and potentially modifiable "risk factors" for cardiovascular diseases. Identification of biomarkers specific to these risk factors may help understanding the underlying pathophysiological pathways, and developing individual treatment. METHODS: The FIBRO-TARGETS (targeting cardiac fibrosis for heart failure treatment) consortium has merged data from 12 patient cohorts in 1 common database of > 12,000 patients. Three mutually exclusive main phenotypic groups were identified ("cases"): (1) "hypertensive"; (2) "obese"; and (3) "diabetic"; age-sex matched in a 1:2 proportion with "healthy controls" without any of these phenotypes. Proteomic associations were studied using a biostatistical method based on LASSO and confronted with machine-learning and complex network approaches. RESULTS: The case:control distribution by each cardiovascular phenotype was hypertension (50:100), obesity (50:98), and diabetes (36:72). Of the 86 studied proteins, 4 were found to be independently associated with hypertension: GDF-15, LEP, SORT-1 and FABP-2; 3 with obesity: CEACAM-8, LEP and PRELP; and 4 with diabetes: GDF-15, REN, CXCL-1 and SCF. GDF-15 (hypertension + diabetes) and LEP (hypertension + obesity) are shared by 2 different phenotypes. A machine-learning approach confirmed GDF-15, LEP and SORT-1 as discriminant biomarkers for the hypertension group, and LEP plus PRELP for the obesity group. Complex network analyses provided insight on the mechanisms underlying these disease phenotypes where fibrosis may play a central role. CONCLUSION: Patients with "mutually exclusive" phenotypes display distinct bioprofiles that might underpin different biological pathways, potentially leading to fibrosis. Plasma protein biomarkers and their association with mutually exclusive cardiovascular phenotypes: the FIBRO-TARGETS case-control analyses. Patients with "mutually exclusive" phenotypes (blue: obesity, hypertension and diabetes) display distinct protein bioprofiles (green: decreased expression; red: increased expression) that might underpin different biological pathways (orange arrow), potentially leading to fibrosis.


Subject(s)
Diabetes Mellitus/physiopathology , Hypertension/physiopathology , Obesity/physiopathology , Adult , Aged , Biomarkers/blood , Blood Proteins/metabolism , Case-Control Studies , Diabetes Mellitus/blood , Female , Humans , Hypertension/blood , Male , Middle Aged , Obesity/blood , Phenotype , Proteomics , Risk Factors
17.
Gastroenterology ; 158(1): 76-94.e2, 2020 01.
Article in English | MEDLINE | ID: mdl-31593701

ABSTRACT

Since 2010, substantial progress has been made in artificial intelligence (AI) and its application to medicine. AI is explored in gastroenterology for endoscopic analysis of lesions, in detection of cancer, and to facilitate the analysis of inflammatory lesions or gastrointestinal bleeding during wireless capsule endoscopy. AI is also tested to assess liver fibrosis and to differentiate patients with pancreatic cancer from those with pancreatitis. AI might also be used to establish prognoses of patients or predict their response to treatments, based on multiple factors. We review the ways in which AI may help physicians make a diagnosis or establish a prognosis and discuss its limitations, knowing that further randomized controlled studies will be required before the approval of AI techniques by the health authorities.


Subject(s)
Artificial Intelligence , Diagnosis, Computer-Assisted/methods , Gastroenterology/methods , Gastrointestinal Diseases/diagnosis , Liver Diseases/diagnosis , Clinical Decision-Making/methods , Decision Support Systems, Clinical , Decision Trees , Gastrointestinal Diseases/mortality , Gastrointestinal Diseases/therapy , Humans , Liver Diseases/mortality , Liver Diseases/therapy , Prognosis , Treatment Outcome
18.
Int J Lab Hematol ; 41(6): 726-730, 2019 Dec.
Article in English | MEDLINE | ID: mdl-31523903

ABSTRACT

INTRODUCTION: The confirmation time interval for the presence of antiphospholipid antibodies (aPL) has been extended to 12 weeks as epiphenomenal antibodies may disappear after 6 weeks. Our aim was to analyse extended persistence of aPL positivity beyond the 12-week interval. METHODS: We retrospectively analysed our database of 23 856 aPL test samples collected between 2005 and 2017 from 17 367 consecutive patients. Two groups of patients were identified among aPL-positive patients, confirmed at 12 weeks: with or without extended persistence beyond confirmatory testing. Percentages of extended persistence are given according to the initial aPL positivity profiles, and baseline laboratory variables are compared between the two groups. RESULTS: Three hundred and twenty-seven patients confirmed aPL-positive had subsequent testing. The vast majority of them displayed extended persistence in the long term: 89.6% and up to 97.9% for patients with initial triple positivity. In extended persistent positive patients, there were more LA-positive initial samples, and baseline LA test values and IgG aCL titres were higher than in nonpersistent positive patients. CONCLUSION: Data from a large database of an aPL referral laboratory showed that the time interval of 12 weeks defining persistence of aPL positivity was appropriate for the majority of patients. Furthermore, we found baseline features associated with extended persistence.


Subject(s)
Antibodies, Antiphospholipid/blood , Adult , Antiphospholipid Syndrome/blood , Antiphospholipid Syndrome/immunology , Female , Humans , Lupus Coagulation Inhibitor/blood , Male , Middle Aged , Retrospective Studies , Time Factors
19.
Mob DNA ; 10: 18, 2019.
Article in English | MEDLINE | ID: mdl-31073337

ABSTRACT

BACKGROUND: Conjugative spread of antibiotic resistance and virulence genes in bacteria constitutes an important threat to public health. Beyond the well-known conjugative plasmids, recent genome analyses have shown that integrative and conjugative elements (ICEs) are the most widespread conjugative elements, even if their transfer mechanism has been little studied until now. The initiator of conjugation is the relaxase, a protein catalyzing a site-specific nick on the origin of transfer (oriT) of the ICE. Besides canonical relaxases, recent studies revealed non-canonical ones, such as relaxases of the MOBT family that are related to rolling-circle replication proteins of the Rep_trans family. MOBT relaxases are encoded by ICEs of the ICESt3/ICEBs1/Tn916 superfamily, a superfamily widespread in Firmicutes, and frequently conferring antibiotic resistance. RESULTS: Here, we present the first biochemical and structural characterization of a MOBT relaxase: the RelSt3 relaxase encoded by ICESt3 from Streptococcus thermophilus. We identified the oriT region of ICESt3 and demonstrated that RelSt3 is required for its conjugative transfer. The purified RelSt3 protein is a stable dimer that provides a Mn2+-dependent single-stranded endonuclease activity. Sequence comparisons of MOBT relaxases led to the identification of MOBT conserved motifs. These motifs, together with the construction of a 3D model of the relaxase domain of RelSt3, allowed us to determine conserved residues of the RelSt3 active site. The involvement of these residues in DNA nicking activity was demonstrated by targeted mutagenesis. CONCLUSIONS: All together, this work argues in favor of MOBT being a full family of non-canonical relaxases. The biochemical and structural characterization of a MOBT member provides new insights on the molecular mechanism of conjugative transfer mediated by ICEs in Gram-positive bacteria. This could be a first step towards conceiving rational strategies to control gene transfer in these bacteria.

20.
BMC Bioinformatics ; 19(Suppl 14): 413, 2018 Nov 20.
Article in English | MEDLINE | ID: mdl-30453875

ABSTRACT

BACKGROUND: Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task. RESULTS: We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach "CODAC" (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe "GODomainMiner" for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively. CONCLUSIONS: These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation.


Subject(s)
Computational Biology/methods , Gene Ontology , Proteins/chemistry , Algorithms , Amino Acid Sequence , Area Under Curve , Databases, Protein , Molecular Sequence Annotation , Protein Domains
SELECTION OF CITATIONS
SEARCH DETAIL
...