Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
Patterns (N Y) ; 3(4): 100445, 2022 Apr 08.
Article in English | MEDLINE | ID: mdl-35465223

ABSTRACT

Clinical trials are crucial for drug development but often face uncertain outcomes due to safety, efficacy, or patient-recruitment problems. We propose the Hierarchical Interaction Network (HINT) to predict clinical trial outcomes. First, HINT encodes multi-modal data (drug molecule, target disease, trial eligibility criteria) into embeddings. Then, HINT trains knowledge-embedding modules using drug pharmacokinetic and historical trial data. Finally, a hierarchical interaction graph connects all of the embeddings to capture their interactions and predict trial outcomes. HINT was trained and validated on 1,160 phase I trials, 4,449 phase II trials, and 3,436 phase III trials. It obtained 0.665, 0.620, and 0.847 F1 scores on separate test sets of 627 phase I, 1,653 phase II, and 1,140 phase III trials, respectively. HINT significantly outperforms the best baseline method on most metrics. The benchmark dataset and codes are released at https://github.com/futianfan/clinical-trial-outcome-prediction.

2.
IEEE Trans Knowl Data Eng ; 34(11): 5459-5471, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36590707

ABSTRACT

The goal of molecular optimization is to generate molecules similar to a target molecule but with better chemical properties. Deep generative models have shown great success in molecule optimization. However, due to the iterative local generation process of deep generative models, the resulting molecules can significantly deviate from the input in molecular similarity and size, leading to poor chemical properties. The key issue here is that the existing deep generative models restrict their attention on substructure-level generation without considering the entire molecule as a whole. To address this challenge, we propose Molecule-Level Reward functions (MOLER) to encourage (1) the input and the generated molecule to be similar, and to ensure (2) the generated molecule has a similar size to the input. The proposed method can be combined with various deep generative models. Policy gradient technique is introduced to optimize reward-based objectives with small computational overhead. Empirical studies show that MOLER achieves up to 20.2% relative improvement in success rate over the best baseline method on several properties, including QED, DRD2 and LogP.

3.
Patterns (N Y) ; 2(10): 100328, 2021 Oct 08.
Article in English | MEDLINE | ID: mdl-34693370

ABSTRACT

Thanks to the increasing availability of genomics and other biomedical data, many machine learning algorithms have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine learning applications for genomics through the lens of therapeutic development. We investigate the interplay among genomics, compounds, proteins, electronic health records, cellular images, and clinical texts. We identify 22 machine learning in genomics applications that span the whole therapeutics pipeline, from discovering novel targets, personalizing medicine, developing gene-editing tools, all the way to facilitating clinical trials and post-market studies. We also pinpoint seven key challenges in this field with potentials for expansion and impact. This survey examines recent research at the intersection of machine learning, genomics, and therapeutic development.

4.
Bioinformatics ; 37(18): 2988-2995, 2021 09 29.
Article in English | MEDLINE | ID: mdl-33769494

ABSTRACT

MOTIVATION: Thanks to the increasing availability of drug-drug interactions (DDI) datasets and large biomedical knowledge graphs (KGs), accurate detection of adverse DDI using machine learning models becomes possible. However, it remains largely an open problem how to effectively utilize large and noisy biomedical KG for DDI detection. Due to its sheer size and amount of noise in KGs, it is often less beneficial to directly integrate KGs with other smaller but higher quality data (e.g. experimental data). Most of existing approaches ignore KGs altogether. Some tries to directly integrate KGs with other data via graph neural networks with limited success. Furthermore most previous works focus on binary DDI prediction whereas the multi-typed DDI pharmacological effect prediction is more meaningful but harder task. RESULTS: To fill the gaps, we propose a new method SumGNN: knowledge summarization graph neural network, which is enabled by a subgraph extraction module that can efficiently anchor on relevant subgraphs from a KG, a self-attention based subgraph summarization scheme to generate reasoning path within the subgraph, and a multi-channel knowledge and data integration module that utilizes massive external biomedical knowledge for significantly improved multi-typed DDI predictions. SumGNN outperforms the best baseline by up to 5.54%, and performance gain is particularly significant in low data relation types. In addition, SumGNN provides interpretable prediction via the generated reasoning paths for each prediction. AVAILABILITY AND IMPLEMENTATION: The code is available in Supplementary Material. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Neural Networks, Computer , Pattern Recognition, Automated , Drug Interactions , Machine Learning
5.
J Am Med Inform Assoc ; 28(4): 733-743, 2021 03 18.
Article in English | MEDLINE | ID: mdl-33486527

ABSTRACT

OBJECTIVE: We aim to develop a hybrid model for earlier and more accurate predictions for the number of infected cases in pandemics by (1) using patients' claims data from different counties and states that capture local disease status and medical resource utilization; (2) utilizing demographic similarity and geographical proximity between locations; and (3) integrating pandemic transmission dynamics into a deep learning model. MATERIALS AND METHODS: We proposed a spatio-temporal attention network (STAN) for pandemic prediction. It uses a graph attention network to capture spatio-temporal trends of disease dynamics and to predict the number of cases for a fixed number of days into the future. We also designed a dynamics-based loss term for enhancing long-term predictions. STAN was tested using both real-world patient claims data and COVID-19 statistics over time across US counties. RESULTS: STAN outperforms traditional epidemiological models such as susceptible-infectious-recovered (SIR), susceptible-exposed-infectious-recovered (SEIR), and deep learning models on both long-term and short-term predictions, achieving up to 87% reduction in mean squared error compared to the best baseline prediction model. CONCLUSIONS: By combining information from real-world claims data and disease case counts data, STAN can better predict disease status and medical resource utilization.


Subject(s)
Models, Statistical , Neural Networks, Computer , Pandemics , COVID-19/epidemiology , Deep Learning , Epidemiologic Methods , Humans , United States/epidemiology
6.
Bioinformatics ; 36(22-23): 5545-5547, 2021 04 01.
Article in English | MEDLINE | ID: mdl-33275143

ABSTRACT

SUMMARY: Accurate prediction of drug-target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use DL library for DTI prediction. DeepPurpose supports training of customized DTI prediction models by implementing 15 compound and protein encoders and over 50 neural architectures, along with providing many other useful features. We demonstrate state-of-the-art performance of DeepPurpose on several benchmark datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/kexinhuang12345/DeepPurpose. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Deep Learning , Pharmaceutical Preparations , Drug Development , Drug Discovery , Proteins
7.
Bioinformatics ; 37(6): 830-836, 2021 05 05.
Article in English | MEDLINE | ID: mdl-33070179

ABSTRACT

MOTIVATION: Drug-target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (i) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain and (ii) existing methods focus on limited labeled data while ignoring the value of massive unlabeled molecular data. RESULTS: We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (i) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction and (ii) an augmented transformer encoder to better extract and capture the semantic relations among sub-structures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real-world data and show it improved DTI prediction performance compared to state-of-the-art baselines. AVAILABILITY AND IMPLEMENTATION: The model scripts are available at https://github.com/kexinhuang12345/moltrans. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Drug Development , Pharmaceutical Preparations , Algorithms , Computer Simulation , Drug Discovery
8.
J Am Med Inform Assoc ; 28(3): 444-452, 2021 03 01.
Article in English | MEDLINE | ID: mdl-33125051

ABSTRACT

OBJECTIVE: The study sought to test the possibility of differentiating chest x-ray images of coronavirus disease 2019 (COVID-19) against other pneumonia and healthy patients using deep neural networks. MATERIALS AND METHODS: We construct the radiography (x-ray) imaging data from 2 publicly available sources, which include 5508 chest x-ray images across 2874 patients with 4 classes: normal, bacterial pneumonia, non-COVID-19 viral pneumonia, and COVID-19. To identify COVID-19, we propose a FLANNEL (Focal Loss bAsed Neural Network EnsembLe) model, a flexible module to ensemble several convolutional neural network models and fuse with a focal loss for accurate COVID-19 detection on class imbalance data. RESULTS: FLANNEL consistently outperforms baseline models on COVID-19 identification task in all metrics. Compared with the best baseline, FLANNEL shows a higher macro-F1 score, with 6% relative increase on the COVID-19 identification task, in which it achieves precision of 0.7833 ± 0.07, recall of 0.8609 ± 0.03, and F1 score of 0.8168 ± 0.03. DISCUSSION: Ensemble learning that combines multiple independent basis classifiers can increase the robustness and accuracy. We propose a neural weighing module to learn the importance weight for each base model and combine them via weighted ensemble to get the final classification results. In order to handle the class imbalance challenge, we adapt focal loss to our multiple classification task as the loss function. CONCLUSION: FLANNEL effectively combines state-of-the-art convolutional neural network classification models and tackles class imbalance with focal loss to achieve better performance on COVID-19 detection from x-rays.


Subject(s)
COVID-19/diagnostic imaging , Lung/diagnostic imaging , Neural Networks, Computer , Pneumonia, Viral/diagnostic imaging , Radiography, Thoracic , Algorithms , COVID-19/diagnosis , Diagnosis, Differential , Humans , ROC Curve , Radiography, Thoracic/methods
9.
Sci Rep ; 10(1): 21092, 2020 12 03.
Article in English | MEDLINE | ID: mdl-33273494

ABSTRACT

Molecular interaction networks are powerful resources for molecular discovery. They are increasingly used with machine learning methods to predict biologically meaningful interactions. While deep learning on graphs has dramatically advanced the prediction prowess, current graph neural network (GNN) methods are mainly optimized for prediction on the basis of direct similarity between interacting nodes. In biological networks, however, similarity between nodes that do not directly interact has proved incredibly useful in the last decade across a variety of interaction networks. Here, we present SkipGNN, a graph neural network approach for the prediction of molecular interactions. SkipGNN predicts molecular interactions by not only aggregating information from direct interactions but also from second-order interactions, which we call skip similarity. In contrast to existing GNNs, SkipGNN receives neural messages from two-hop neighbors as well as immediate neighbors in the interaction network and non-linearly transforms the messages to obtain useful information for prediction. To inject skip similarity into a GNN, we construct a modified version of the original network, called the skip graph. We then develop an iterative fusion scheme that optimizes a GNN using both the skip graph and the original graph. Experiments on four interaction networks, including drug-drug, drug-target, protein-protein, and gene-disease interactions, show that SkipGNN achieves superior and robust performance. Furthermore, we show that unlike popular GNNs, SkipGNN learns biologically meaningful embeddings and performs especially well on noisy, incomplete interaction networks.

10.
ArXiv ; 2020 Dec 07.
Article in English | MEDLINE | ID: mdl-33330741

ABSTRACT

OBJECTIVE: The COVID-19 pandemic has created many challenges that need immediate attention. Various epidemiological and deep learning models have been developed to predict the COVID-19 outbreak, but all have limitations that affect the accuracy and robustness of the predictions. Our method aims at addressing these limitations and making earlier and more accurate pandemic outbreak predictions by (1) using patients' EHR data from different counties and states that encode local disease status and medical resource utilization condition; (2) considering demographic similarity and geographical proximity between locations; and (3) integrating pandemic transmission dynamics into deep learning models. MATERIALS AND METHODS: We proposed a spatio-temporal attention network (STAN) for pandemic prediction. It uses an attention-based graph convolutional network to capture geographical and temporal trends and predict the number of cases for a fixed number of days into the future. We also designed a physical law-based loss term for enhancing long-term prediction. STAN was tested using both massive real-world patient data and open source COVID-19 statistics provided by Johns Hopkins university across all U.S. counties. RESULTS: STAN outperforms epidemiological modeling methods such as SIR and SEIR and deep learning models on both long-term and short-term predictions, achieving up to 87% lower mean squared error compared to the best baseline prediction model. CONCLUSIONS: By using information from real-world patient data and geographical data, STAN can better capture the disease status and medical resource utilization information and thus provides more accurate pandemic modeling. With pandemic transmission law based regularization, STAN also achieves good long-term prediction performance.

11.
J Am Med Inform Assoc ; 27(7): 1084-1091, 2020 07 01.
Article in English | MEDLINE | ID: mdl-32548622

ABSTRACT

OBJECTIVE: Prediction of disease phenotypes and their outcomes is a difficult task. In practice, patients routinely seek second opinions from multiple clinical experts for complex disease diagnosis. Our objective is to mimic such a practice of seeking second opinions by training 2 agents with different focuses: the primary agent studies the most recent visit of the patient to learn the current health status, and then the second-opinion agent considers the entire patient history to obtain a more global view. MATERIALS AND METHODS: Our approach Dr. Agent augments recurrent neural networks with 2 policy gradient agents. Moreover, Dr. Agent is customized with various patient demographics information and learns a dynamic skip connection to focus on the relevant information over time. We trained Dr. Agent to perform 4 clinical prediction tasks on the publicly available MIMIC-III (Medical Information Mart for Intensive Care) database: (1) in-hospital mortality prediction, (2) acute care phenotype classification, (3) physiologic decompensation prediction, and (4) forecasting length of stay. We compared the performance of Dr. Agent against 4 baseline clinical predictive models. RESULTS: Dr. Agent outperforms baseline clinical prediction models across all 4 tasks in terms of all metrics. Compared with the best baseline model, Dr. Agent achieves up to 15% higher area under the precision-recall curve on different tasks. CONCLUSIONS: Dr. Agent can comprehensively model the long-term dependencies of patients' health status while considering patients' demographics using 2 agents, and therefore achieves better prediction performance on different clinical prediction tasks.


Subject(s)
Computer Simulation , Neural Networks, Computer , Prognosis , Referral and Consultation , Deep Learning , Disease Progression , Hospital Mortality , Humans , Length of Stay , Logistic Models
12.
ArXiv ; 2020 Dec 08.
Article in English | MEDLINE | ID: mdl-33758769

ABSTRACT

Accurate prediction of the transmission of epidemic diseases such as COVID-19 is crucial for implementing effective mitigation measures. In this work, we develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. We construct a 3-way spatio-temporal tensor (location, attribute, time) of case counts and propose a nonnegative tensor factorization with latent epidemiological model regularization named STELAR. Unlike standard tensor factorization methods which cannot predict slabs ahead, STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete-time difference equations of a widely adopted epidemiological model. We use latent instead of location/attribute-level epidemiological dynamics to capture common epidemic profile sub-types and improve collaborative learning and prediction. We conduct experiments using both county- and state-level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic. Finally, we evaluate the predictive ability of our method and show superior performance compared to the baselines, achieving up to 21% lower root mean square error and 25% lower mean absolute error for county-level prediction.

13.
J Biomed Inform ; 100: 103326, 2019 12.
Article in English | MEDLINE | ID: mdl-31678589

ABSTRACT

The primary goal of a time-to-event estimation model is to accurately infer the occurrence time of a target event. Most existing studies focus on developing new models to effectively utilize the information in the censored observations. In this paper, we propose a model to tackle the time-to-event estimation problem from a completely different perspective. Our model relaxes a fundamental constraint that the target variable, time, is a univariate number which satisfies a partial order. Instead, the proposed model interprets each event occurrence time as a time concept with a vector representation. We hypothesize that the model will be more accurate and interpretable by capturing (1) the relationships between features and time concept vectors and (2) the relationships among time concept vectors. We also propose a scalable framework to simultaneously learn the model parameters and time concept vectors. Rigorous experiments and analysis have been conducted in medical event prediction task on seven gene expression datasets. The results demonstrate the efficiency and effectiveness of the proposed model. Furthermore, similarity information among time concept vectors helped in identifying time regimes, thus leading to a potential knowledge discovery related to the human cancer considered in our experiments.


Subject(s)
Models, Theoretical , Time and Motion Studies , Algorithms
14.
Ther Innov Regul Sci ; 50(6): 801-807, 2016 Nov.
Article in English | MEDLINE | ID: mdl-30231738

ABSTRACT

BACKGROUND: The pharmaceutical industry has continued to experience a large number of mergers, often involving the very largest companies. Behind many of these mergers has been the desire to achieve scale efficiencies and improved performance in both commercial and research and development (R&D) activities. METHODS: This research draws upon ClinicalTrials.gov data about commercially sponsored phase 3 clinical trials started and completed between 2008 and 2013. The research uses the bidirectional stepwise Akaike information criterion for model selection, adding a second-order term to the model where second-order terms were significant. RESULTS: First, and least surprising, the study therapeutic area has a major impact on study completion times. Second, the protocol design itself, as well as the clinical study execution plan, can have important consequences on study completion times. Several study execution variables are also critical to understanding completion times. While the size of clinical trial organization is not associated with more rapid completion times, the amount of organizational experience that an organization has in a particular therapeutic area does have a demonstrable impact. The models are able to supply the specific number of incremental completion days associated with each significant variable. CONCLUSIONS: Recent years have witnessed increasingly larger pharmaceutical R&D organizations, as many companies have worked to achieve the scale benefits of organizational size for R&D as well as commercial activity. Larger pharmaceutical companies may still achieve scale benefits. Faster phase 3 study completion times is not one of them.

15.
Ther Innov Regul Sci ; 49(6): 852-860, 2015 Nov.
Article in English | MEDLINE | ID: mdl-30222375

ABSTRACT

BACKGROUND: This study uses the data from many of the mandatory fields in ClinicalTrials.gov to examine changes, possibly leading to more complexity in the design and execution of commercially sponsored phase 3 clinical trials. METHODS: In this analysis we compare baseline year 2008 data, when a broad number of the protocol/study design and execution variables became mandatory, with the data from the last full year of results, 2013. RESULTS: There has been relatively little change in the protocol and study design over the years covered in this study. The most pronounced change is associated with single-patient duration: there is a significant increase in the period of time a patient is treated in the study protocol. The study also highlights an important methodological issue: many of the claims in print about complexity have yet been substantiated through the use of peer-reviewed data or in settings where others can interrogate the results. CONCLUSIONS: In general, there is limited evidence for significant increases in the study and protocol design and execution of phase 3 clinical trials sponsored by pharmaceutical companies.

16.
Ther Innov Regul Sci ; 49(2): 218-224, 2015 Mar.
Article in English | MEDLINE | ID: mdl-30222415

ABSTRACT

Since 2007, the US federal government has required that organizations sponsoring clinical trials with a least one site in the United States submit information on these clinical trials to an existing database: ClinicalTrials.gov . Over time, the number of mandatory variables has grown and will probably continue to grow. The database now represents an important source of descriptive information about the landscape for clinical trials. In addition, it constitutes a rich pool of data to test hypotheses-for instance, what variables are associated with an organization's ability to correctly estimate study completion times or complete those studies in as short a time frame as possible. This paper concludes that for mandated variables that the authors have labeled study identification, protocol and study design, and study execution, the data set constitutes a potentially very valuable research resource. With the exception of some site-related information, incomplete data did not exceed 3%. The incomplete site data are concentrated in several companies, so it is not unreasonable to assume that those data will also become more complete.

18.
Am J Cardiol ; 97(8A): 32C-43C, 2006 Apr 17.
Article in English | MEDLINE | ID: mdl-16581327

ABSTRACT

The adverse event (AE) profiles of 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase inhibitor (statin) agents are of great interest, in particular the most recently approved statin, rosuvastatin. The forwarding of reports of AEs has been shown to be influenced by several reporting biases, including secular trend, the new drug reporting effect, product withdrawals, and publicity. Comparative assessments that use AE reporting rates are difficult to interpret under these circumstances, because such effects can themselves lead to marked increases in AE reporting. Consequently, many comparative reporting rate analyses are best carried out in conjunction with other metrics that put reporting burden into context, such as report proportion. All-AE reporting rates showed a temporal profile that resembled those of other statins when marketing cycle and secular trend were taken into account. A before-and-after cerivastatin withdrawal comparison showed a substantial increase in the reporting of AEs of interest for the statin class overall. Report proportion analyses indicated that the burden of rosuvastatin-associated AEs was similar to that for other statin agents. Analyses of monthly reporting rates showed that the reporting of rosuvastatin-associated rhabdomyolysis and renal failure have increased following AE-specific mass media publicity. Postrosuvastatin AE reporting patterns were comparable to those seen with other statins and did not resemble cerivastatin.


Subject(s)
Adverse Drug Reaction Reporting Systems/statistics & numerical data , Hydroxymethylglutaryl-CoA Reductase Inhibitors/adverse effects , Databases, Factual , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/administration & dosage , Liver Failure/chemically induced , Muscular Diseases/chemically induced , Peripheral Nervous System Diseases/chemically induced , Product Surveillance, Postmarketing , Renal Insufficiency/chemically induced , United States , United States Food and Drug Administration
SELECTION OF CITATIONS
SEARCH DETAIL
...