Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Anal Chem ; 92(2): 1720-1729, 2020 01 21.
Article in English | MEDLINE | ID: mdl-31661259

ABSTRACT

Comprehensive and unambiguous identification of small molecules in complex samples will revolutionize our understanding of the role of metabolites in biological systems. Existing and emerging technologies have enabled measurement of chemical properties of molecules in complex mixtures and, in concert, are sensitive enough to resolve even stereoisomers. Despite these experimental advances, small molecule identification is inhibited by (i) chemical reference libraries (e.g., mass spectra, collision cross section, and other measurable property libraries) representing <1% of known molecules, limiting the number of possible identifications, and (ii) the lack of a method to generate candidate matches directly from experimental features (i.e., without a library). To this end, we developed a variational autoencoder (VAE) to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. We extended the VAE to include a chemical property decoder, trained as a multitask network, in order to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, with its focus on properties that can be obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involved a cascade of transfer learning iterations. First, molecular representation is learned from a large data set of structures with m/z labels. Next, in silico property values are used to continue training, as experimental property data is limited. Finally, the network is further refined by being trained with the experimental data. This allows the network to learn as much as possible at each stage, enabling success with progressively smaller data sets without overfitting. Once trained, the network can be used to predict chemical properties directly from structure, as well as generate candidate structures with desired chemical properties. Our approach is orders of magnitude faster than first-principles simulation for CCS property prediction. Additionally, the ability to generate novel molecules along manifolds, defined by chemical property analogues, positions DarkChem as highly useful in a number of application areas, including metabolomics and small molecule identification, drug discovery and design, chemical forensics, and beyond.


Subject(s)
Computer Simulation , Deep Learning , Small Molecule Libraries/analysis , Metabolomics , Molecular Structure , Small Molecule Libraries/metabolism
2.
PLoS One ; 12(12): e0188941, 2017.
Article in English | MEDLINE | ID: mdl-29244814

ABSTRACT

This work is the first to take advantage of recurrent neural networks to predict influenza-like illness (ILI) dynamics from various linguistic signals extracted from social media data. Unlike other approaches that rely on timeseries analysis of historical ILI data and the state-of-the-art machine learning models, we build and evaluate the predictive power of neural network architectures based on Long Short Term Memory (LSTMs) units capable of nowcasting (predicting in "real-time") and forecasting (predicting the future) ILI dynamics in the 2011 - 2014 influenza seasons. To build our models we integrate information people post in social media e.g., topics, embeddings, word ngrams, stylistic patterns, and communication behavior using hashtags and mentions. We then quantitatively evaluate the predictive power of different social media signals and contrast the performance of the-state-of-the-art regression models with neural networks using a diverse set of evaluation metrics. Finally, we combine ILI and social media signals to build a joint neural network model for ILI dynamics prediction. Unlike the majority of the existing work, we specifically focus on developing models for local rather than national ILI surveillance, specifically for military rather than general populations in 26 U.S. and six international locations., and analyze how model performance depends on the amount of social media data available per location. Our approach demonstrates several advantages: (a) Neural network architectures that rely on LSTM units trained on social media data yield the best performance compared to previously used regression models. (b) Previously under-explored language and communication behavior features are more predictive of ILI dynamics than stylistic and topic signals expressed in social media. (c) Neural network models learned exclusively from social media signals yield comparable or better performance to the models learned from ILI historical data, thus, signals from social media can be potentially used to accurately forecast ILI dynamics for the regions where ILI historical data is not available. (d) Neural network models learned from combined ILI and social media signals significantly outperform models that rely solely on ILI historical data, which adds to a great potential of alternative public sources for ILI dynamics prediction. (e) Location-specific models outperform previously used location-independent models e.g., U.S. only. (f) Prediction results significantly vary across geolocations depending on the amount of social media data available and ILI activity patterns. (g) Model performance improves with more tweets available per geo-location e.g., the error gets lower and the Pearson score gets higher for locations with more tweets.


Subject(s)
Influenza, Human/epidemiology , Military Personnel , Neural Networks, Computer , Social Media/statistics & numerical data , Epidemiological Monitoring , Forecasting , Humans , Influenza, Human/transmission , Influenza, Human/virology , Machine Learning , Regression Analysis , Time Factors , United States/epidemiology
3.
PLoS One ; 10(10): e0139701, 2015.
Article in English | MEDLINE | ID: mdl-26437454

ABSTRACT

OBJECTIVE: Research studies show that social media may be valuable tools in the disease surveillance toolkit used for improving public health professionals' ability to detect disease outbreaks faster than traditional methods and to enhance outbreak response. A social media work group, consisting of surveillance practitioners, academic researchers, and other subject matter experts convened by the International Society for Disease Surveillance, conducted a systematic primary literature review using the PRISMA framework to identify research, published through February 2013, answering either of the following questions: Can social media be integrated into disease surveillance practice and outbreak management to support and improve public health?Can social media be used to effectively target populations, specifically vulnerable populations, to test an intervention and interact with a community to improve health outcomes?Examples of social media included are Facebook, MySpace, microblogs (e.g., Twitter), blogs, and discussion forums. For Question 1, 33 manuscripts were identified, starting in 2009 with topics on Influenza-like Illnesses (n = 15), Infectious Diseases (n = 6), Non-infectious Diseases (n = 4), Medication and Vaccines (n = 3), and Other (n = 5). For Question 2, 32 manuscripts were identified, the first in 2000 with topics on Health Risk Behaviors (n = 10), Infectious Diseases (n = 3), Non-infectious Diseases (n = 9), and Other (n = 10). CONCLUSIONS: The literature on the use of social media to support public health practice has identified many gaps and biases in current knowledge. Despite the potential for success identified in exploratory studies, there are limited studies on interventions and little use of social media in practice. However, information gleaned from the articles demonstrates the effectiveness of social media in supporting and improving public health and in identifying target populations for intervention. A primary recommendation resulting from the review is to identify opportunities that enable public health professionals to integrate social media analytics into disease surveillance and outbreak management practice.


Subject(s)
Blogging , Communicable Diseases/epidemiology , Disease Outbreaks , Public Health , Social Media , Disease Management , Humans
4.
PLoS One ; 9(3): e91989, 2014.
Article in English | MEDLINE | ID: mdl-24647562

ABSTRACT

The objective of this manuscript is to present a systematic review of biosurveillance models that operate on select agents and can forecast the occurrence of a disease event. We define a disease event to be a biological event with focus on the One Health paradigm. These events are characterized by evidence of infection and or disease condition. We reviewed models that attempted to predict a disease event, not merely its transmission dynamics and we considered models involving pathogens of concern as determined by the US National Select Agent Registry (as of June 2011). We searched commercial and government databases and harvested Google search results for eligible models, using terms and phrases provided by public health analysts relating to biosurveillance, remote sensing, risk assessments, spatial epidemiology, and ecological niche modeling. After removal of duplications and extraneous material, a core collection of 6,524 items was established, and these publications along with their abstracts are presented in a semantic wiki at http://BioCat.pnnl.gov. As a result, we systematically reviewed 44 papers, and the results are presented in this analysis. We identified 44 models, classified as one or more of the following: event prediction (4), spatial (26), ecological niche (28), diagnostic or clinical (6), spread or response (9), and reviews (3). The model parameters (e.g., etiology, climatic, spatial, cultural) and data sources (e.g., remote sensing, non-governmental organizations, expert opinion, epidemiological) were recorded and reviewed. A component of this review is the identification of verification and validation (V&V) methods applied to each model, if any V&V method was reported. All models were classified as either having undergone Some Verification or Validation method, or No Verification or Validation. We close by outlining an initial set of operational readiness level guidelines for disease prediction models based upon established Technology Readiness Level definitions.


Subject(s)
Biosurveillance , Decision Support Techniques , Disease , Forecasting , Models, Biological , Disaster Planning , Humans , Reproducibility of Results , Statistics as Topic
5.
Biosecur Bioterror ; 10(1): 131-41, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22320664

ABSTRACT

This research follows the Updated Guidelines for Evaluating Public Health Surveillance Systems, Recommendations from the Guidelines Working Group, published by the Centers for Disease Control and Prevention nearly a decade ago. Since then, models have been developed and complex systems have evolved with a breadth of disparate data to detect or forecast chemical, biological, and radiological events that have a significant impact on the One Health landscape. How the attributes identified in 2001 relate to the new range of event-based biosurveillance technologies is unclear. This article frames the continuum of event-based biosurveillance systems (that fuse media reports from the internet), models (ie, computational that forecast disease occurrence), and constructs (ie, descriptive analytical reports) through an operational lens (ie, aspects and attributes associated with operational considerations in the development, testing, and validation of the event-based biosurveillance methods and models and their use in an operational environment). A workshop was held in 2010 to scientifically identify, develop, and vet a set of attributes for event-based biosurveillance. Subject matter experts were invited from 7 federal government agencies and 6 different academic institutions pursuing research in biosurveillance event detection. We describe 8 attribute families for the characterization of event-based biosurveillance: event, readiness, operational aspects, geographic coverage, population coverage, input data, output, and cost. Ultimately, the analyses provide a framework from which the broad scope, complexity, and relevant issues germane to event-based biosurveillance useful in an operational environment can be characterized.


Subject(s)
Biosurveillance/methods , Program Evaluation , Animals , Costs and Cost Analysis , Disaster Planning/methods , Disaster Planning/organization & administration , Disaster Planning/standards , Disease Outbreaks/economics , Disease Outbreaks/prevention & control , Humans , Interdisciplinary Communication , International Cooperation , Models, Theoretical , United States
6.
Adv Exp Med Biol ; 696: 181-90, 2011.
Article in English | MEDLINE | ID: mdl-21431558

ABSTRACT

Recently, human papilloma virus (HPV) has been implicated to cause several throat and oral cancers and HPV is established to cause most cervical cancers. A human papilloma virus vaccine has been proven successful to reduce infection incidence in FDA clinical trials, and it is currently available in the USA. Current intervention policy targets adolescent females for vaccination; however, the expansion of suggested guidelines may extend to other age groups and males as well. This research takes a first step toward automatically predicting personal beliefs, regarding health intervention, on the spread of disease. Using linguistic or statistical approaches, sentiment analysis determines a text's affective content. Self-reported HPV vaccination beliefs published in web and social media are analyzed for affect polarity and leveraged as knowledge inputs to epidemic models. With this in mind, we have developed a discrete-time model to facilitate predicting impact on the reduction of HPV prevalence due to arbitrary age- and gender-targeted vaccination schemes.


Subject(s)
Papillomavirus Infections/prevention & control , Adolescent , Adult , Computational Biology , Data Mining , Female , Humans , Male , Models, Statistical , Papillomavirus Infections/epidemiology , Papillomavirus Vaccines/pharmacology , Prevalence , Public Health , United States/epidemiology , Young Adult
7.
J Vis Lang Comput ; 22(4): 268-278, 2011 Aug.
Article in English | MEDLINE | ID: mdl-32288454

ABSTRACT

The National Strategy for Pandemic Influenza outlines a plan for community response to a potential pandemic. In this outline, state and local communities are charged with enhancing their preparedness. In order to help public health officials better understand these charges, we have developed a visual analytics toolkit (PanViz) for analyzing the effect of decision measures implemented during a simulated pandemic influenza scenario. Spread vectors based on the point of origin and distance traveled over time are calculated and the factors of age distribution and population density are taken into effect. Healthcare officials are able to explore the effects of the pandemic on the population through a geographical spatiotemporal view, moving forward and backward through time and inserting decision points at various days to determine the impact. Linked statistical displays are also shown, providing county level summaries of data in terms of the number of sick, hospitalized and dead as a result of the outbreak. Currently, this tool has been deployed in Indiana State Department of Health planning and preparedness exercises, and as an educational tool for demonstrating the impact of social distancing strategies during the recent H1N1 (swine flu) outbreak.

8.
Adv Exp Med Biol ; 680: 559-64, 2010.
Article in English | MEDLINE | ID: mdl-20865540

ABSTRACT

Analysis of Google influenza-like-illness (ILI) search queries has shown a strongly correlated pattern with Centers for Disease Control (CDC) and Prevention seasonal ILI reporting data. Web and social media provide another resource to detect increases in ILI. This paper evaluates trends in blog posts that discuss influenza. Our key finding is that from 5th October 2008 to 31st January 2009, a high correlation exists between the frequency of posts, containing influenza keywords, per week and CDC influenza-like-illness surveillance data.


Subject(s)
Blogging , Influenza, Human/epidemiology , Internet , Population Surveillance/methods , Centers for Disease Control and Prevention, U.S. , Computational Biology , Disease Outbreaks , Humans , Information Storage and Retrieval/methods , United States/epidemiology
9.
Int J Environ Res Public Health ; 7(2): 596-615, 2010 Feb.
Article in English | MEDLINE | ID: mdl-20616993

ABSTRACT

Text and structural data mining of web and social media (WSM) provides a novel disease surveillance resource and can identify online communities for targeted public health communications (PHC) to assure wide dissemination of pertinent information. WSM that mention influenza are harvested over a 24-week period, 5 October 2008 to 21 March 2009. Link analysis reveals communities for targeted PHC. Text mining is shown to identify trends in flu posts that correlate to real-world influenza-like illness patient report data. We also bring to bear a graph-based data mining technique to detect anomalies among flu blogs connected by publisher type, links, and user-tags.


Subject(s)
Influenza, Human , Information Storage and Retrieval , Internet , Social Support , Humans , Population Surveillance
10.
Comput Methods Programs Biomed ; 100(1): 16-23, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20236725

ABSTRACT

This paper explores Technosocial Predictive Analytics (TPA) and related methods for Web "data mining" where users' posts and queries are garnered from Social Web ("Web 2.0") tools such as blogs, micro-blogging and social networking sites to form coherent representations of real-time health events. The paper includes a brief introduction to commonly used Social Web tools such as mashups and aggregators, and maps their exponential growth as an open architecture of participation for the masses and an emerging way to gain insight about people's collective health status of whole populations. Several health related tool examples are described and demonstrated as practical means through which health professionals might create clear location specific pictures of epidemiological data such as flu outbreaks.


Subject(s)
Blogging , Data Mining/methods , Internet , Population Surveillance/methods , Public Health , Security Measures/organization & administration , Humans , United Kingdom , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...