Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
2.
J Biomed Inform ; 65: 105-119, 2017 01.
Article in English | MEDLINE | ID: mdl-27919732

ABSTRACT

Electronic health records contain large amounts of longitudinal data that are valuable for biomedical informatics research. The application of machine learning is a promising alternative to manual analysis of such data. However, the complex structure of the data, which includes clinical events that are unevenly distributed over time, poses a challenge for standard learning algorithms. Some approaches to modeling temporal data rely on extracting single values from time series; however, this leads to the loss of potentially valuable sequential information. How to better account for the temporality of clinical data, hence, remains an important research question. In this study, novel representations of temporal data in electronic health records are explored. These representations retain the sequential information, and are directly compatible with standard machine learning algorithms. The explored methods are based on symbolic sequence representations of time series data, which are utilized in a number of different ways. An empirical investigation, using 19 datasets comprising clinical measurements observed over time from a real database of electronic health records, shows that using a distance measure to random subsequences leads to substantial improvements in predictive performance compared to using the original sequences or clustering the sequences. Evidence is moreover provided on the quality of the symbolic sequence representation by comparing it to sequences that are generated using domain knowledge by clinical experts. The proposed method creates representations that better account for the temporality of clinical events, which is often key to prediction tasks in the biomedical domain.


Subject(s)
Algorithms , Electronic Health Records , Machine Learning , Cluster Analysis , Databases, Factual
3.
BMC Med Inform Decis Mak ; 16 Suppl 2: 69, 2016 07 21.
Article in English | MEDLINE | ID: mdl-27459846

ABSTRACT

BACKGROUND: Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events - modeled in an ensemble of semantic spaces - for the purpose of predictive modeling. METHODS: Three different ways of exploiting a set of (ten) distributed representations of four types of clinical events - diagnosis codes, drug codes, measurements, and words in clinical notes - are investigated in a series of experiments using ensembles of randomized trees. Here, the semantic space ensembles are obtained by varying the context window size in the representation learning procedure. The proposed method trains a forest wherein each tree is built from a bootstrap replicate of the training set whose entire original feature set is represented in a randomly selected set of semantic spaces - corresponding to the considered data types - of a given context window size. RESULTS: The proposed method significantly outperforms concatenating the multiple representations of the bagged dataset; it also significantly outperforms representing, for each decision tree, only a subset of the features in a randomly selected set of semantic spaces. A follow-up analysis indicates that the proposed method exhibits less diversity while significantly improving average tree performance. It is also shown that the size of the semantic space ensemble has a significant impact on predictive performance and that performance tends to improve as the size increases. CONCLUSIONS: The strategy for utilizing a set of diverse distributed representations of clinical events when constructing ensembles of randomized trees has a significant impact on predictive performance. The most successful strategy - significantly outperforming the considered alternatives - involves randomly sampling distributed representations of the clinical events when building each decision tree in the forest.


Subject(s)
Decision Trees , Drug-Related Side Effects and Adverse Reactions , Electronic Health Records , Machine Learning , Models, Theoretical , Pharmacovigilance , Humans , Semantics
4.
BMC Med Inform Decis Mak ; 15 Suppl 4: S1, 2015.
Article in English | MEDLINE | ID: mdl-26606038

ABSTRACT

BACKGROUND: The digitization of healthcare data, resulting from the increasingly widespread adoption of electronic health records, has greatly facilitated its analysis by computational methods and thereby enabled large-scale secondary use thereof. This can be exploited to support public health activities such as pharmacovigilance, wherein the safety of drugs is monitored to inform regulatory decisions about sustained use. To that end, electronic health records have emerged as a potentially valuable data source, providing access to longitudinal observations of patient treatment and drug use. A nascent line of research concerns predictive modeling of healthcare data for the automatic detection of adverse drug events, which presents its own set of challenges: it is not yet clear how to represent the heterogeneous data types in a manner conducive to learning high-performing machine learning models. METHODS: Datasets from an electronic health record database are used for learning predictive models with the purpose of detecting adverse drug events. The use and representation of two data types, as well as their combination, are studied: clinical codes, describing prescribed drugs and assigned diagnoses, and measurements. Feature selection is conducted on the various types of data to reduce dimensionality and sparsity, while allowing for an in-depth feature analysis of the usefulness of each data type and representation. RESULTS: Within each data type, combining multiple representations yields better predictive performance compared to using any single representation. The use of clinical codes for adverse drug event detection significantly outperforms the use of measurements; however, there is no significant difference over datasets between using only clinical codes and their combination with measurements. For certain adverse drug events, the combination does, however, outperform using only clinical codes. Feature selection leads to increased predictive performance for both data types, in isolation and combined. CONCLUSIONS: We have demonstrated how machine learning can be applied to electronic health records for the purpose of detecting adverse drug events and proposed solutions to some of the challenges this presents, including how to represent the various data types. Overall, clinical codes are more useful than measurements and, in specific cases, it is beneficial to combine the two.


Subject(s)
Drug-Related Side Effects and Adverse Reactions/diagnosis , Electronic Health Records , Machine Learning , Pharmacovigilance , Algorithms , Computer Simulation , Databases, Factual , Drug-Related Side Effects and Adverse Reactions/etiology , Forecasting , Humans , Patient Safety
5.
AMIA Annu Symp Proc ; 2015: 1371-80, 2015.
Article in English | MEDLINE | ID: mdl-26958278

ABSTRACT

Using longitudinal data in electronic health records (EHRs) for post-marketing adverse drug event (ADE) detection allows for monitoring patients throughout their medical history. Machine learning methods have been shown to be efficient and effective in screening health records and detecting ADEs. How best to exploit historical data, as encoded by clinical events in EHRs is, however, not very well understood. In this study, three strategies for handling temporality of clinical events are proposed and evaluated using an EHR database from Stockholm, Sweden. The random forest learning algorithm is applied to predict fourteen ADEs using clinical events collected from different lengths of patient history. The results show that, in general, including longer patient history leads to improved predictive performance, and that assigning weights to events according to time distance from the ADE yields the biggest improvement.


Subject(s)
Algorithms , Drug-Related Side Effects and Adverse Reactions , Electronic Health Records , Machine Learning , Databases, Factual , Humans , Product Surveillance, Postmarketing
6.
J Mol Model ; 19(6): 2679-85, 2013 Jun.
Article in English | MEDLINE | ID: mdl-23479283

ABSTRACT

Uncertainty was introduced into the chemical descriptors of 11 datasets by conformational analysis in order to incorporate three-dimensional information and to investigate the resulting predictive performance of a state-of-the-art machine learning method, random forests, for binary classification tasks. A number of strategies for handling uncertainty in random forests were evaluated. The study showed that when incorporating three-dimensional information as uncertainty into chemical descriptors, the use of uniform probability distributions over the range of possible values, in conjunction with fractional distribution of compounds clearly outperforms the use of normal distributions as well as sampling from both normal and uniform distributions. The main conclusion of this study is that, even when distributions of uncertain values are provided, the random forest method can generate models that are almost as accurate from the expected values of these distributions alone. Hence, there seems to be little advantage to using the more elaborate methods of incorporating uncertainty in chemical descriptors when using random forests rather than replacing the distributions with single-point values. The results also show that random forest models with similar performances can also be generated using three-dimensional descriptor information derived from single (lowest-energy or Corina-derived) conformations.


Subject(s)
Artificial Intelligence , Models, Molecular , Uncertainty , Molecular Conformation , Reproducibility of Results
7.
J Chem Inf Model ; 52(11): 2815-22, 2012 Nov 26.
Article in English | MEDLINE | ID: mdl-23039214

ABSTRACT

Uncertainty was introduced to chemical descriptors of 16 publicly available data sets to various degrees and in various ways in order to investigate the effect on the predictive performance of the state-of-the-art method decision tree ensembles. A number of strategies to handle uncertainty in decision tree ensembles were evaluated. The main conclusion of the study is that uncertainty to a large extent may be introduced in chemical descriptors without impairing the predictive performance of ensembles and without the predictive performance being significantly reduced from a practical point of view. The investigation further showed that even when distributions of uncertain values were provided, the ensembles method could generate equally effective models from single-point samples from these distributions. Hence, there seems to be no advantage in using more elaborate methods for handling uncertainty in chemical descriptors when using decision tree ensembles as a modeling method for the considered types of introduced uncertainty.


Subject(s)
Algorithms , Proteins/chemistry , Uncertainty , Animals , Bacteria , Databases, Pharmaceutical , Decision Trees , Humans , Hydrophobic and Hydrophilic Interactions , Likelihood Functions , Static Electricity , Surface Properties , Viruses
8.
Future Med Chem ; 3(6): 647-63, 2011 Apr.
Article in English | MEDLINE | ID: mdl-21554073

ABSTRACT

BACKGROUND: Accuracy concerns the ability of a model to make correct predictions, while interpretability concerns to what degree the model allows for human understanding. Models exhibiting the former property are many times more complex and opaque, while interpretable models may lack the necessary accuracy. The trade-off between accuracy and interpretability for predictive in silico modeling is investigated. METHOD: A number of state-of-the-art methods for generating accurate models are compared with state-of-the-art methods for generating transparent models. CONCLUSION: Results on 16 biopharmaceutical classification tasks demonstrate that, although the opaque methods generally obtain higher accuracies than the transparent ones, one often only has to pay a quite limited penalty in terms of predictive performance when choosing an interpretable model.


Subject(s)
Drug Discovery/methods , Models, Theoretical , Pharmaceutical Preparations/chemistry , Algorithms , Databases, Factual , Pharmaceutical Preparations/classification
9.
Audiol Neurootol ; 15(3): 175-86, 2010.
Article in English | MEDLINE | ID: mdl-19851064

ABSTRACT

Adult spiral ganglion cells were cultured in chorus to assess the influence of the neurotrophins brain-derived neurotrophic factor, neurotrophin 3 and glial cell line-derived neurotrophic factor (GDNF) on neurite growth and Schwann cell alignment. Over 1500 measurements were collected using each factor at 10 ng/ml and all three in combination. Evaluation was made with GDNF at concentrations of up to 100 ng/ml. Neurite dimensions were assessed at days 5, 7, 9 and 11 using a computer-based program (Axon Analyzer). GDNF had a strong effect on spiral ganglion cell growth almost attaining the level of all three factors in combination. GDNF increased glial cell alignment and nerve bundle formation. Results show the potential of GDNF to maintain and possibly restore auditory nerve integrity.


Subject(s)
Nerve Growth Factors/pharmacology , Spiral Ganglion/cytology , Adult , Animals , Cell Culture Techniques , Glial Cell Line-Derived Neurotrophic Factor/pharmacology , Guinea Pigs , Humans , Neurites/drug effects , Neurites/physiology , Neurotrophin 3/pharmacology , Schwann Cells/cytology , Schwann Cells/drug effects , Schwann Cells/physiology , Spiral Ganglion/drug effects
10.
Mol Divers ; 10(2): 207-12, 2006 May.
Article in English | MEDLINE | ID: mdl-16721627

ABSTRACT

Rule-based ensemble modelling has been used to develop a model with high accuracy and predictive capabilities for distinguishing between four different modes of toxic action for a set of 220 phenols. The model not only predicts the majority class (polar narcotics) well but also the other three classes (weak acid respiratory uncouplers, pro-electrophiles and soft electrophiles) of toxic action despite the severely skewed distribution among the four investigated classes. Furthermore, the investigation also highlights the merits of using ensemble (or consensus) modelling as an alternative to the more traditional development of a single model in order to promote robustness and accuracy with respect to the predictive capability for the derived model.


Subject(s)
Models, Chemical , Models, Statistical , Phenols/toxicity , Quantitative Structure-Activity Relationship , Databases, Factual
11.
J Med Chem ; 46(26): 5781-9, 2003 Dec 18.
Article in English | MEDLINE | ID: mdl-14667231

ABSTRACT

Three different multivariate statistical methods, PLS discriminant analysis, rule-based methods, and Bayesian classification, have been applied to multidimensional scoring data from four different target proteins: estrogen receptor alpha (ERalpha), matrix metalloprotease 3 (MMP3), factor Xa (fXa), and acetylcholine esterase (AChE). The purpose was to build classifiers able to discriminate between active and inactive compounds, given a structure-based virtual screen. Seven different scoring functions were used to generate the scoring matrices. The classifiers were compared to classical consensus scoring and single scoring functions. The classifiers show a superior performance, with rule-based methods being most effective. The precision of correctly predicting an active compound is about 90% for three of the targets and about 25% for acetylcholine esterase. On the basis of these results, a new two-stage approach is suggested for structure-based virtual screening where limited activity information is available.


Subject(s)
Multivariate Analysis , Quantitative Structure-Activity Relationship , Acetylcholinesterase/chemistry , Binding Sites , Estrogen Receptor alpha , Factor Xa/chemistry , Ligands , Matrix Metalloproteinase 3/chemistry , Receptors, Estrogen/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...