Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
4.
Stud Health Technol Inform ; 281: 506-507, 2021 May 27.
Article in English | MEDLINE | ID: mdl-34042623

ABSTRACT

i2b2 data-warehouse could be a useful tool to support the enrollment phase of clinical studies. The aim of this work is to evaluate its performance on two clinical trials. We developed also an i2b2 extension to help in suggesting eligible patients for a study. The work showed good results in terms of ability to implement inclusion/exclusion criteria, but also in terms of identified patients actually enrolled and high number of patients suggested as potentially enrollable.


Subject(s)
Data Warehousing , Information Storage and Retrieval , Humans
5.
Stud Health Technol Inform ; 258: 21-25, 2019.
Article in English | MEDLINE | ID: mdl-30942706

ABSTRACT

i2b2 and REDCap are two widely adopted solutions respectively to facilitate data re-use for research purpose and to manage non-for-profit research studies. REDCap provides the design specifications to build a web service used to import data from an external source with a procedure called DDP. In this work we have developed a web service that implements these specifications in order to import data from i2b2. Our approach has been tested with a real REDCap study.


Subject(s)
Data Warehousing , Data Analysis
6.
Stud Health Technol Inform ; 247: 715-719, 2018.
Article in English | MEDLINE | ID: mdl-29678054

ABSTRACT

Medical reports often contain a lot of relevant information in the form of free text. To reuse these unstructured texts for biomedical research, it is important to extract structured data from them. In this work, we adapted a previously developed information extraction system to the oncology domain, to process a set of anatomic pathology reports in the Italian language. The information extraction system relies on a domain ontology, which was adapted and refined in an iterative way. The final output was evaluated by a domain expert, with promising results.


Subject(s)
Information Storage and Retrieval , Language , Natural Language Processing , Biomedical Research , Data Mining , Humans , Italy
7.
Eur Child Adolesc Psychiatry ; 26(11): 1309-1317, 2017 Nov.
Article in English | MEDLINE | ID: mdl-28455596

ABSTRACT

Psychiatric disorders are amongst the most prevalent and impairing conditions in childhood and adolescence. Unfortunately, it is well known that general practitioners (GPs) and other frontline health providers (i.e., child protection workers, public health nurses, and pediatricians) are not adequately trained to address these ubiquitous problems (Braddick et al. Child and Adolescent mental health in Europe: infrastructures, policy and programmes, European Communities, 2009; Levav et al. Eur Child Adolesc Psychiatry 13:395-401, 2004). Advances in technology may offer a solution to this problem with clinical decision support systems (CDSS) that are designed to help professionals make sound clinical decisions in real time. This paper offers a systematic review of currently available CDSS for child and adolescent mental health disorders prepared according to the PRISMA-Protocols (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols). Applying strict eligibility criteria, the identified studies (n = 5048) were screened. Ten studies, describing eight original clinical decision support systems for child and adolescent psychiatric disorders, fulfilled inclusion criteria. Based on this systematic review, there appears to be a need for a new, readily available CDSS for child neuropsychiatric disorder which promotes evidence-based, best practices, while enabling consideration of national variation in practices by leveraging data-reuse to generate predictions regarding treatment outcome, addressing a broader cluster of clinical disorders, and targeting frontline practice environments.


Subject(s)
Adolescent Psychiatry/standards , Child Psychiatry/standards , Decision Support Systems, Clinical/standards , Adolescent , Child , Humans
8.
Stud Health Technol Inform ; 228: 572-6, 2016.
Article in English | MEDLINE | ID: mdl-27577448

ABSTRACT

The i2b2 software is a widely adopted solution for secondary use of clinical data for clinical research, specifically designed for cohort identification. i2b2 is still lacking functionalities for data analysis. The aim of this work is to empower the i2b2 framework enabling clinical researchers to perform statistical analyses for accelerating the process of hypothesis testing. To this aim we have developed a flexible extension of i2b2 able to exploit different statistical engines. We have implemented some first applications for basic statistics and survival analyses, exploiting this extension and accessible through suitable user interfaces designed with a special consideration for usability.


Subject(s)
Cohort Studies , Health Information Exchange , Search Engine , Databases, Factual , Humans , Information Storage and Retrieval/methods , Software , User-Computer Interface
9.
BMC Bioinformatics ; 17: 155, 2016 Apr 08.
Article in English | MEDLINE | ID: mdl-27059896

ABSTRACT

BACKGROUND: Understanding the interactions between antibodies and the linear epitopes that they recognize is an important task in the study of immunological diseases. We present a novel computational method for the design of linear epitopes of specified binding affinity to Intravenous Immunoglobulin (IVIg). RESULTS: We show that the method, called Pythia-design can accurately design peptides with both high-binding affinity and low binding affinity to IVIg. To show this, we experimentally constructed and tested the computationally constructed designs. We further show experimentally that these designed peptides are more accurate that those produced by a recent method for the same task. Pythia-design is based on combining random walks with an ensemble of probabilistic support vector machines (SVM) classifiers, and we show that it produces a diverse set of designed peptides, an important property to develop robust sets of candidates for construction. We show that by combining Pythia-design and the method of (PloS ONE 6(8):23616, 2011), we are able to produce an even more accurate collection of designed peptides. Analysis of the experimental validation of Pythia-design peptides indicates that binding of IVIg is favored by epitopes that contain trypthophan and cysteine. CONCLUSIONS: Our method, Pythia-design, is able to generate a diverse set of binding and non-binding peptides, and its designs have been experimentally shown to be accurate.


Subject(s)
Computational Biology/methods , Epitopes/chemistry , Immunoglobulins, Intravenous/chemistry , Peptides, Cyclic/chemistry , Citrulline/chemistry , Cysteine/chemistry , Humans , Models, Molecular , Reproducibility of Results , Support Vector Machine , Tryptophan/chemistry
10.
J Biomed Inform ; 57: 369-76, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26325295

ABSTRACT

The increasing prevalence of diabetes and its related complications is raising the need for effective methods to predict patient evolution and for stratifying cohorts in terms of risk of developing diabetes-related complications. In this paper, we present a novel approach to the simulation of a type 1 diabetes population, based on Dynamic Bayesian Networks, which combines literature knowledge with data mining of a rich longitudinal cohort of type 1 diabetes patients, the DCCT/EDIC study. In particular, in our approach we simulate the patient health state and complications through discretized variables. Two types of models are presented, one entirely learned from the data and the other partially driven by literature derived knowledge. The whole cohort is simulated for fifteen years, and the simulation error (i.e. for each variable, the percentage of patients predicted in the wrong state) is calculated every year on independent test data. For each variable, the population predicted in the wrong state is below 10% on both models over time. Furthermore, the distributions of real vs. simulated patients greatly overlap. Thus, the proposed models are viable tools to support decision making in type 1 diabetes.


Subject(s)
Bayes Theorem , Computer Simulation , Data Mining , Diabetes Complications , Diabetes Mellitus, Type 1 , Humans
11.
Diabetologia ; 58(6): 1363-71, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25740695

ABSTRACT

AIMS/HYPOTHESIS: We selected the most informative protein biomarkers for the prediction of incident cardiovascular disease (CVD) in people with type 2 diabetes. METHODS: In this nested case-control study we measured 42 candidate CVD biomarkers in 1,123 incident CVD cases and 1,187 controls with type 2 diabetes selected from five European centres. Combinations of biomarkers were selected using cross-validated logistic regression models. Model prediction was assessed using the area under the receiver operating characteristic curve (AUROC). RESULTS: Sixteen biomarkers showed univariate associations with incident CVD. The most predictive subset selected by forward selection methods contained six biomarkers: N-terminal pro-B-type natriuretic peptide (OR 1.69 per 1 SD, 95% CI 1.47, 1.95), high-sensitivity troponin T (OR 1.29, 95% CI 1.11, 1.51), IL-6 (OR 1.13, 95% CI 1.02, 1.25), IL-15 (OR 1.15, 95% CI 1.01, 1.31), apolipoprotein C-III (OR 0.79, 95% CI 0.70, 0.88) and soluble receptor for AGE (OR 0.84, 95% CI 0.76, 0.94). The prediction of CVD beyond clinical covariates improved from an AUROC of 0.66 to 0.72 (AUROC for Framingham Risk Score covariates 0.59). In addition to the biomarkers, the most important clinical covariates for improving prediction beyond the Framingham covariates were estimated GFR, insulin therapy and HbA1c. CONCLUSIONS/INTERPRETATION: We identified six protein biomarkers that in combination with clinical covariates improved the prediction of our model beyond the Framingham Score covariates. Biomarkers can contribute to improved prediction of CVD in diabetes but clinical data including measures of renal function and diabetes-specific factors not included in the Framingham Risk Score are also needed.


Subject(s)
Biomarkers/blood , Cardiovascular Diseases/complications , Diabetes Mellitus, Type 2/complications , Aged , Apolipoprotein C-III/blood , Area Under Curve , Cardiovascular Diseases/diagnosis , Case-Control Studies , Diabetes Complications , Diabetes Mellitus, Type 2/diagnosis , Europe , Female , Glomerular Filtration Rate , Glycated Hemoglobin/metabolism , Humans , Insulin/therapeutic use , Interleukin-15/blood , Interleukin-6/blood , Logistic Models , Male , Middle Aged , Natriuretic Peptide, Brain/blood , Peptide Fragments/blood , ROC Curve , Risk Factors , Troponin T/blood
12.
Adv Bioinformatics ; 2015: 382869, 2015.
Article in English | MEDLINE | ID: mdl-25653679

ABSTRACT

Phosphorylation is a protein posttranslational modification. It is responsible of the activation/inactivation of disease-related pathways, thanks to its role of "molecular switch." The study of phosphorylated proteins becomes a key point for the proteomic analyses focused on the identification of diagnostic/therapeutic targets. Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the most widely used analytical approach. Although unmodified peptides are automatically identified by consolidated algorithms, phosphopeptides still require automated tools to avoid time-consuming manual interpretation. To improve phosphopeptide identification efficiency, a novel procedure was developed and implemented in a Perl/C tool called PhosphoHunter, here proposed and evaluated. It includes a preliminary heuristic step for filtering out the MS/MS spectra produced by nonphosphorylated peptides before sequence identification. A method to assess the statistical significance of identified phosphopeptides was also formulated. PhosphoHunter performance was tested on a dataset of 1500 MS/MS spectra and it was compared with two other tools: Mascot and Inspect. Comparisons demonstrated that a strong point of PhosphoHunter is sensitivity, suggesting that it is able to identify real phosphopeptides with superior performance. Performance indexes depend on a single parameter (intensity threshold) that users can tune according to the study aim. All the three tools localized >90% of phosphosites.

13.
Diabetologia ; 57(8): 1611-22, 2014 Aug.
Article in English | MEDLINE | ID: mdl-24871321

ABSTRACT

AIMS/HYPOTHESIS: Diabetic nephropathy is a major diabetic complication, and diabetes is the leading cause of end-stage renal disease (ESRD). Family studies suggest a hereditary component for diabetic nephropathy. However, only a few genes have been associated with diabetic nephropathy or ESRD in diabetic patients. Our aim was to detect novel genetic variants associated with diabetic nephropathy and ESRD. METHODS: We exploited a novel algorithm, 'Bag of Naive Bayes', whose marker selection strategy is complementary to that of conventional genome-wide association models based on univariate association tests. The analysis was performed on a genome-wide association study of 3,464 patients with type 1 diabetes from the Finnish Diabetic Nephropathy (FinnDiane) Study and subsequently replicated with 4,263 type 1 diabetes patients from the Steno Diabetes Centre, the All Ireland-Warren 3-Genetics of Kidneys in Diabetes UK collection (UK-Republic of Ireland) and the Genetics of Kidneys in Diabetes US Study (GoKinD US). RESULTS: Five genetic loci (WNT4/ZBTB40-rs12137135, RGMA/MCTP2-rs17709344, MAPRE1P2-rs1670754, SEMA6D/SLC24A5-rs12917114 and SIK1-rs2838302) were associated with ESRD in the FinnDiane study. An association between ESRD and rs17709344, tagging the previously identified rs12437854 and located between the RGMA and MCTP2 genes, was replicated in independent case-control cohorts. rs12917114 near SEMA6D was associated with ESRD in the replication cohorts under the genotypic model (p < 0.05), and rs12137135 upstream of WNT4 was associated with ESRD in Steno. CONCLUSIONS/INTERPRETATION: This study supports the previously identified findings on the RGMA/MCTP2 region and suggests novel susceptibility loci for ESRD. This highlights the importance of applying complementary statistical methods to detect novel genetic variants in diabetic nephropathy and, in general, in complex diseases.


Subject(s)
Diabetic Nephropathies/genetics , Genetic Loci , Genetic Predisposition to Disease , Kidney Failure, Chronic/genetics , Adult , Bayes Theorem , Female , Genome-Wide Association Study , Humans , Male , Middle Aged , Polymorphism, Single Nucleotide , White People/genetics
14.
BMC Bioinformatics ; 13 Suppl 14: S6, 2012.
Article in English | MEDLINE | ID: mdl-23095471

ABSTRACT

BACKGROUND: Genome Wide Association Studies represent powerful approaches that aim at disentangling the genetic and molecular mechanisms underlying complex traits. The usual "one-SNP-at-the-time" testing strategy cannot capture the multi-factorial nature of this kind of disorders. We propose a Hierarchical Naïve Bayes classification model for taking into account associations in SNPs data characterized by Linkage Disequilibrium. Validation shows that our model reaches classification performances superior to those obtained by the standard Naïve Bayes classifier for simulated and real datasets. METHODS: In the Hierarchical Naïve Bayes implemented, the SNPs mapping to the same region of Linkage Disequilibrium are considered as "details" or "replicates" of the locus, each contributing to the overall effect of the region on the phenotype. A latent variable for each block, which models the "population" of correlated SNPs, can be then used to summarize the available information. The classification is thus performed relying on the latent variables conditional probability distributions and on the SNPs data available. RESULTS: The developed methodology has been tested on simulated datasets, each composed by 300 cases, 300 controls and a variable number of SNPs. Our approach has been also applied to two real datasets on the genetic bases of Type 1 Diabetes and Type 2 Diabetes generated by the Wellcome Trust Case Control Consortium. CONCLUSIONS: The approach proposed in this paper, called Hierarchical Naïve Bayes, allows dealing with classification of examples for which genetic information of structurally correlated SNPs are available. It improves the Naïve Bayes performances by properly handling the within-loci variability.


Subject(s)
Bayes Theorem , Diabetes Mellitus, Type 1/genetics , Diabetes Mellitus, Type 2/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Case-Control Studies , Computer Simulation , Humans , Linkage Disequilibrium , Models, Genetic
15.
PLoS One ; 6(8): e23616, 2011.
Article in English | MEDLINE | ID: mdl-21887285

ABSTRACT

The prediction of antibody-protein (antigen) interactions is very difficult due to the huge variability that characterizes the structure of the antibodies. The region of the antigen bound to the antibodies is called epitope. Experimental data indicate that many antibodies react with a panel of distinct epitopes (positive reaction). The Challenge 1 of DREAM5 aims at understanding whether there exists rules for predicting the reactivity of a peptide/epitope, i.e., its capability to bind to human antibodies. DREAM 5 provided a training set of peptides with experimentally identified high and low reactivities to human antibodies. On the basis of this training set, the participants to the challenge were asked to develop a predictive model of reactivity. A test set was then provided to evaluate the performance of the model implemented so far.We developed a logistic regression model to predict the peptide reactivity, by facing the challenge as a machine learning problem. The initial features have been generated on the basis of the available knowledge and the information reported in the dataset. Our predictive model had the second best performance of the challenge. We also developed a method, based on a clustering approach, able to "in-silico" generate a list of positive and negative new peptide sequences, as requested by the DREAM5 "bonus round" additional challenge.The paper describes the developed model and its results in terms of reactivity prediction, and highlights some open issues concerning the propensity of a peptide to react with human antibodies.


Subject(s)
Immunoglobulins, Intravenous/metabolism , Knowledge Bases , Peptides/metabolism , Amino Acid Sequence , Amino Acids/metabolism , Cluster Analysis , Humans , Models, Molecular , Molecular Sequence Data , Peptides/chemistry , ROC Curve , Reproducibility of Results
16.
BMC Evol Biol ; 11: 159, 2011 Jun 10.
Article in English | MEDLINE | ID: mdl-21663612

ABSTRACT

BACKGROUND: We have recently discovered that the two tryptophans of human ß2-microglobulin have distinctive roles within the structure and function of the protein. Deeply buried in the core, Trp95 is essential for folding stability, whereas Trp60, which is solvent-exposed, plays a crucial role in promoting the binding of ß2-microglobulin to the heavy chain of the class I major histocompatibility complex (MHCI). We have previously shown that the thermodynamic disadvantage of having Trp60 exposed on the surface is counter-balanced by the perfect fit between it and a cavity within the MHCI heavy chain that contributes significantly to the functional stabilization of the MHCI. Therefore, based on the peculiar differences of the two tryptophans, we have analysed the evolution of ß2-microglobulin with respect to these residues. RESULTS: Having defined the ß2-microglobulin protein family, we performed multiple sequence alignments and analysed the residue conservation in homologous proteins to generate a phylogenetic tree. Our results indicate that Trp60 is highly conserved, whereas some species have a Leu in position 95; the replacement of Trp95 with Leu destabilizes ß2-microglobulin by 1 kcal/mol and accelerates the kinetics of unfolding. Both thermodynamic and kinetic data fit with the crystallographic structure of the Trp95Leu variant, which shows how the hydrophobic cavity of the wild-type protein is completely occupied by Trp95, but is only half filled by Leu95. CONCLUSIONS: We have established that the functional Trp60 has been present within the sequence of ß2-microglobulin since the evolutionary appearance of proteins responsible for acquired immunity, whereas the structural Trp95 was selected and stabilized, most likely, for its capacity to fully occupy an internal cavity of the protein thereby creating a better stabilization of its folded state.


Subject(s)
Phylogeny , Tryptophan/genetics , Tryptophan/metabolism , beta 2-Microglobulin/genetics , beta 2-Microglobulin/metabolism , Amino Acid Sequence , Amyloid/metabolism , Animals , Crystallography, X-Ray , Humans , Models, Molecular , Molecular Sequence Data , Protein Conformation , Protein Folding , Sequence Alignment , Tryptophan/chemistry , beta 2-Microglobulin/chemistry
17.
BMC Bioinformatics ; 11: 518, 2010 Oct 16.
Article in English | MEDLINE | ID: mdl-20950483

ABSTRACT

BACKGROUND: Mass spectrometry is an essential technique in proteomics both to identify the proteins of a biological sample and to compare proteomic profiles of different samples. In both cases, the main phase of the data analysis is the procedure to extract the significant features from a mass spectrum. Its final output is the so-called peak list which contains the mass, the charge and the intensity of every detected biomolecule. The main steps of the peak list extraction procedure are usually preprocessing, peak detection, peak selection, charge determination and monoisotoping operation. RESULTS: This paper describes an original algorithm for peak list extraction from low and high resolution mass spectra. It has been developed principally to improve the precision of peak extraction in comparison to other reference algorithms. It contains many innovative features among which a sophisticated method for managing the overlapping isotopic distributions. CONCLUSIONS: The performances of the basic version of the algorithm and of its optional functionalities have been evaluated in this paper on both SELDI-TOF, MALDI-TOF and ESI-FTICR ECD mass spectra. Executable files of MassSpec, a MATLAB implementation of the peak list extraction procedure for Windows and Linux systems, can be downloaded free of charge for nonprofit institutions from the following web site: http://aimed11.unipv.it/MassSpec.


Subject(s)
Mass Spectrometry/methods , Proteins/chemistry , Proteomics/methods , Algorithms , Databases, Protein
18.
J Biomed Biotechnol ; 2010: 670125, 2010.
Article in English | MEDLINE | ID: mdl-20625507

ABSTRACT

Protein interactions are crucial in most biological processes. Several in silico methods have been recently developed to predict them. This paper describes a bioinformatics method that combines sequence similarity and structural information to support experimental studies on protein interactions. Given a target protein, the approach selects the most likely interactors among the candidates revealed by experimental techniques, but not yet in vivo validated. The sequence and the structural information of the in vivo confirmed proteins and complexes are exploited to evaluate the candidate interactors. Finally, a score is calculated to suggest the most likely interactors of the target protein. As an example, we searched for GRB2 interactors. We ranked a set of 46 candidate interactors by the presented method. These candidates were then reduced to 21, through a score threshold chosen by means of a cross-validation strategy. Among them, the isoform 1 of MAPK14 was in silico confirmed as a GRB2 interactor. Finally, given a set of already confirmed interactors of GRB2, the accuracy and the precision of the approach were 75% and 86%, respectively. In conclusion, the proposed method can be conveniently exploited to select the proteins to be experimentally investigated within a set of potential interactors.


Subject(s)
Computational Biology/methods , Protein Interaction Mapping/methods , Amino Acid Motifs , Databases, Protein , GRB2 Adaptor Protein/chemistry , GRB2 Adaptor Protein/metabolism , Humans , Hydrogen Bonding , Mitogen-Activated Protein Kinase 1/chemistry , Mitogen-Activated Protein Kinase 1/metabolism , Mitogen-Activated Protein Kinase 14/chemistry , Mitogen-Activated Protein Kinase 14/metabolism , Models, Molecular , Multiprotein Complexes/chemistry , Multiprotein Complexes/metabolism , Protein Binding , Reproducibility of Results , Sequence Alignment
19.
BMC Struct Biol ; 10: 18, 2010 Jun 17.
Article in English | MEDLINE | ID: mdl-20565796

ABSTRACT

BACKGROUND: Topological descriptors, other graph measures, and in a broader sense, graph-theoretical methods, have been proven as powerful tools to perform biological network analysis. However, the majority of the developed descriptors and graph-theoretical methods does not have the ability to take vertex- and edge-labels into account, e.g., atom- and bond-types when considering molecular graphs. Indeed, this feature is important to characterize biological networks more meaningfully instead of only considering pure topological information. RESULTS: In this paper, we put the emphasis on analyzing a special type of biological networks, namely bio-chemical structures. First, we derive entropic measures to calculate the information content of vertex- and edge-labeled graphs and investigate some useful properties thereof. Second, we apply the mentioned measures combined with other well-known descriptors to supervised machine learning methods for predicting Ames mutagenicity. Moreover, we investigate the influence of our topological descriptors - measures for only unlabeled vs. measures for labeled graphs - on the prediction performance of the underlying graph classification problem. CONCLUSIONS: Our study demonstrates that the application of entropic measures to molecules representing graphs is useful to characterize such structures meaningfully. For instance, we have found that if one extends the measures for determining the structural information content of unlabeled graphs to labeled graphs, the uniqueness of the resulting indices is higher. Because measures to structurally characterize labeled graphs are clearly underrepresented so far, the further development of such methods might be valuable and fruitful for solving problems within biological network analysis.


Subject(s)
Computational Biology/methods , Artificial Intelligence , Entropy , Mutagenicity Tests , Software
20.
PLoS One ; 4(12): e8057, 2009 Dec 15.
Article in English | MEDLINE | ID: mdl-20016828

ABSTRACT

This paper aims to investigate information-theoretic network complexity measures which have already been intensely used in mathematical- and medicinal chemistry including drug design. Numerous such measures have been developed so far but many of them lack a meaningful interpretation, e.g., we want to examine which kind of structural information they detect. Therefore, our main contribution is to shed light on the relatedness between some selected information measures for graphs by performing a large scale analysis using chemical networks. Starting from several sets containing real and synthetic chemical structures represented by graphs, we study the relatedness between a classical (partition-based) complexity measure called the topological information content of a graph and some others inferred by a different paradigm leading to partition-independent measures. Moreover, we evaluate the uniqueness of network complexity measures numerically. Generally, a high uniqueness is an important and desirable property when designing novel topological descriptors having the potential to be applied to large chemical databases.


Subject(s)
Information Theory , Models, Chemical , Computer Graphics , Entropy , Molecular Structure , Numerical Analysis, Computer-Assisted
SELECTION OF CITATIONS
SEARCH DETAIL
...