Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
Front Public Health ; 8: 582205, 2020.
Article in English | MEDLINE | ID: mdl-33330323

ABSTRACT

Background: Given the worldwide spread of the 2019 Novel Coronavirus (COVID-19), there is an urgent need to identify risk and protective factors and expose areas of insufficient understanding. Emerging tools, such as the Rapid Evidence Map (rEM), are being developed to systematically characterize large collections of scientific literature. We sought to generate an rEM of risk and protective factors to comprehensively inform areas that impact COVID-19 outcomes for different sub-populations in order to better protect the public. Methods: We developed a protocol that includes a study goal, study questions, a PECO statement, and a process for screening literature by combining semi-automated machine learning with the expertise of our review team. We applied this protocol to reports within the COVID-19 Open Research Dataset (CORD-19) that were published in early 2020. SWIFT-Active Screener was used to prioritize records according to pre-defined inclusion criteria. Relevant studies were categorized by risk and protective status; susceptibility category (Behavioral, Physiological, Demographic, and Environmental); and affected sub-populations. Using tagged studies, we created an rEM for COVID-19 susceptibility that reveals: (1) current lines of evidence; (2) knowledge gaps; and (3) areas that may benefit from systematic review. Results: We imported 4,330 titles and abstracts from CORD-19. After screening 3,521 of these to achieve 99% estimated recall, 217 relevant studies were identified. Most included studies concerned the impact of underlying comorbidities (Physiological); age and gender (Demographic); and social factors (Environmental) on COVID-19 outcomes. Among the relevant studies, older males with comorbidities were commonly reported to have the poorest outcomes. We noted a paucity of COVID-19 studies among children and susceptible sub-groups, including pregnant women, racial minorities, refugees/migrants, and healthcare workers, with few studies examining protective factors. Conclusion: Using rEM analysis, we synthesized the recent body of evidence related to COVID-19 risk and protective factors. The results provide a comprehensive tool for rapidly elucidating COVID-19 susceptibility patterns and identifying resource-rich/resource-poor areas of research that may benefit from future investigation as the pandemic evolves.


Subject(s)
Biomedical Research/statistics & numerical data , COVID-19/epidemiology , Data Interpretation, Statistical , Pandemics/statistics & numerical data , Protective Factors , Research Report , Humans , Risk Factors
2.
Environ Int ; 138: 105623, 2020 05.
Article in English | MEDLINE | ID: mdl-32203803

ABSTRACT

BACKGROUND: In the screening phase of systematic review, researchers use detailed inclusion/exclusion criteria to decide whether each article in a set of candidate articles is relevant to the research question under consideration. A typical review may require screening thousands or tens of thousands of articles in and can utilize hundreds of person-hours of labor. METHODS: Here we introduce SWIFT-Active Screener, a web-based, collaborative systematic review software application, designed to reduce the overall screening burden required during this resource-intensive phase of the review process. To prioritize articles for review, SWIFT-Active Screener uses active learning, a type of machine learning that incorporates user feedback during screening. Meanwhile, a negative binomial model is employed to estimate the number of relevant articles remaining in the unscreened document list. Using a simulation involving 26 diverse systematic review datasets that were previously screened by reviewers, we evaluated both the document prioritization and recall estimation methods. RESULTS: On average, 95% of the relevant articles were identified after screening only 40% of the total reference list. In the 5 document sets with 5,000 or more references, 95% recall was achieved after screening only 34% of the available references, on average. Furthermore, the recall estimator we have proposed provides a useful, conservative estimate of the percentage of relevant documents identified during the screening process. CONCLUSION: SWIFT-Active Screener can result in significant time savings compared to traditional screening and the savings are increased for larger project sizes. Moreover, the integration of explicit recall estimation during screening solves an important challenge faced by all machine learning systems for document screening: when to stop screening a prioritized reference list. The software is currently available in the form of a multi-user, collaborative, online web application.


Subject(s)
Machine Learning , Animals , Humans , Magnetic Resonance Imaging , Research , Software
3.
Environ Int ; 123: 451-458, 2019 02.
Article in English | MEDLINE | ID: mdl-30622070

ABSTRACT

BACKGROUND: "Evidence Mapping" is an emerging tool that is increasingly being used to systematically identify, review, organize, quantify, and summarize the literature. It can be used as an effective method for identifying well-studied topic areas relevant to a broad research question along with any important literature gaps. However, because the procedure can be significantly resource-intensive, approaches that can increase the speed and reproducibility of evidence mapping are in great demand. METHODS: We propose an alternative process called "rapid Evidence Mapping" (rEM) to map the scientific evidence in a time-efficient manner, while still utilizing rigorous, transparent and explicit methodological approaches. To illustrate its application, we have conducted a proof-of-concept case study on the topic of low-calorie sweeteners (LCS) with respect to human dietary exposures and health outcomes. During this process, we developed and made publicly available our study protocol, established a PECO (Participants, Exposure, Comparator, and Outcomes) statement, searched the literature, screened titles and abstracts to identify potentially relevant studies, and applied semi-automated machine learning approaches to tag and categorize the included articles. We created various visualizations including bubble plots and frequency tables to map the evidence and research gaps according to comparison type, population baseline health status, outcome group, and study sample size. We compared our results with a traditional evidence mapping of the same topic published in 2016 (Wang et al., 2016). RESULTS: We conducted an rEM of LCS, for which we identified 8122 records from a PubMed search (January 1, 1946-May 1, 2014) and then utilized machine learning (SWIFT-Active Screener) to prioritize relevant records. After screening 2267 (28%) of the total set of titles and abstracts to achieve 95% estimated recall, we ultimately included 297 relevant studies. Overall, our findings corroborated those of Wang et al. (2016) and identified that most studies were acute or short-term in healthy individuals, and studied the outcomes of appetite, energy sensing and body weight. We also identified a lack of studies assessing appetite and dietary intake related outcomes in people with diabetes. The rEM approach required approximately 100 person-hours conducted over 7 calendar months. CONCLUSION: Rapid Evidence Mapping is an expeditious approach based on rigorous methodology that can be used to quickly summarize the available body of evidence relevant to a research question, identify gaps in the literature to inform future research, and contextualize the design of a systematic review within the broader scientific literature, significantly reducing human effort while yielding results comparable to those from traditional methods. The potential time savings of this approach in comparison to the traditional evidence mapping process make it a potentially powerful tool for rapidly translating knowledge to inform science-based decision-making.


Subject(s)
Health Status , Review Literature as Topic , Sweetening Agents , Body Weight , Humans , Reproducibility of Results
4.
PLoS One ; 13(2): e0191105, 2018.
Article in English | MEDLINE | ID: mdl-29462216

ABSTRACT

Changes in gene expression can help reveal the mechanisms of disease processes and the mode of action for toxicities and adverse effects on cellular responses induced by exposures to chemicals, drugs and environment agents. The U.S. Tox21 Federal collaboration, which currently quantifies the biological effects of nearly 10,000 chemicals via quantitative high-throughput screening(qHTS) in in vitro model systems, is now making an effort to incorporate gene expression profiling into the existing battery of assays. Whole transcriptome analyses performed on large numbers of samples using microarrays or RNA-Seq is currently cost-prohibitive. Accordingly, the Tox21 Program is pursuing a high-throughput transcriptomics (HTT) method that focuses on the targeted detection of gene expression for a carefully selected subset of the transcriptome that potentially can reduce the cost by a factor of 10-fold, allowing for the analysis of larger numbers of samples. To identify the optimal transcriptome subset, genes were sought that are (1) representative of the highly diverse biological space, (2) capable of serving as a proxy for expression changes in unmeasured genes, and (3) sufficient to provide coverage of well described biological pathways. A hybrid method for gene selection is presented herein that combines data-driven and knowledge-driven concepts into one cohesive method. Our approach is modular, applicable to any species, and facilitates a robust, quantitative evaluation of performance. In particular, we were able to perform gene selection such that the resulting set of "sentinel genes" adequately represents all known canonical pathways from Molecular Signature Database (MSigDB v4.0) and can be used to infer expression changes for the remainder of the transcriptome. The resulting computational model allowed us to choose a purely data-driven subset of 1500 sentinel genes, referred to as the S1500 set, which was then augmented using a knowledge-driven selection of additional genes to create the final S1500+ gene set. Our results indicate that the sentinel genes selected can be used to accurately predict pathway perturbations and biological relationships for samples under study.


Subject(s)
Gene Expression Profiling/methods , High-Throughput Screening Assays/methods , Computational Biology , Databases, Genetic , Genetic Variation , High-Throughput Nucleotide Sequencing , Humans , Oligonucleotide Array Sequence Analysis , Transcriptome
5.
Syst Rev ; 5: 87, 2016 May 23.
Article in English | MEDLINE | ID: mdl-27216467

ABSTRACT

BACKGROUND: There is growing interest in using machine learning approaches to priority rank studies and reduce human burden in screening literature when conducting systematic reviews. In addition, identifying addressable questions during the problem formulation phase of systematic review can be challenging, especially for topics having a large literature base. Here, we assess the performance of the SWIFT-Review priority ranking algorithm for identifying studies relevant to a given research question. We also explore the use of SWIFT-Review during problem formulation to identify, categorize, and visualize research areas that are data rich/data poor within a large literature corpus. METHODS: Twenty case studies, including 15 public data sets, representing a range of complexity and size, were used to assess the priority ranking performance of SWIFT-Review. For each study, seed sets of manually annotated included and excluded titles and abstracts were used for machine training. The remaining references were then ranked for relevance using an algorithm that considers term frequency and latent Dirichlet allocation (LDA) topic modeling. This ranking was evaluated with respect to (1) the number of studies screened in order to identify 95 % of known relevant studies and (2) the "Work Saved over Sampling" (WSS) performance metric. To assess SWIFT-Review for use in problem formulation, PubMed literature search results for 171 chemicals implicated as EDCs were uploaded into SWIFT-Review (264,588 studies) and categorized based on evidence stream and health outcome. Patterns of search results were surveyed and visualized using a variety of interactive graphics. RESULTS: Compared with the reported performance of other tools using the same datasets, the SWIFT-Review ranking procedure obtained the highest scores on 11 out of 15 of the public datasets. Overall, these results suggest that using machine learning to triage documents for screening has the potential to save, on average, more than 50 % of the screening effort ordinarily required when using un-ordered document lists. In addition, the tagging and annotation capabilities of SWIFT-Review can be useful during the activities of scoping and problem formulation. CONCLUSIONS: Text-mining and machine learning software such as SWIFT-Review can be valuable tools to reduce the human screening burden and assist in problem formulation.


Subject(s)
Algorithms , Data Mining , Machine Learning , Software , Systematic Reviews as Topic , Databases, Factual , Information Storage and Retrieval , Linear Models
6.
PLoS One ; 8(10): e74183, 2013.
Article in English | MEDLINE | ID: mdl-24098335

ABSTRACT

We report the results of a genome-wide analysis of transcription in Arabidopsis thaliana after treatment with Pseudomonas syringae pathovar tomato. Our time course RNA-Seq experiment uses over 500 million read pairs to provide a detailed characterization of the response to infection in both susceptible and resistant hosts. The set of observed differentially expressed genes is consistent with previous studies, confirming and extending existing findings about genes likely to play an important role in the defense response to Pseudomonas syringae. The high coverage of the Arabidopsis transcriptome resulted in the discovery of a surprisingly large number of alternative splicing (AS) events--more than 44% of multi-exon genes showed evidence for novel AS in at least one of the probed conditions. This demonstrates that the Arabidopsis transcriptome annotation is still highly incomplete, and that AS events are more abundant than expected. To further refine our predictions, we identified genes with statistically significant changes in the ratios of alternative isoforms between treatments. This set includes several genes previously known to be alternatively spliced or expressed during the defense response, and it may serve as a pool of candidate genes for regulated alternative splicing with possible biological relevance for the defense response against invasive pathogens.


Subject(s)
Alternative Splicing/genetics , Arabidopsis/microbiology , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Pseudomonas syringae/genetics , Pseudomonas syringae/physiology , Sequence Analysis, RNA , Exons/genetics , Genomics , Introns/genetics , RNA Splice Sites/genetics , Transcription, Genetic/genetics
7.
BMC Bioinformatics ; 11 Suppl 3: S6, 2010 Apr 29.
Article in English | MEDLINE | ID: mdl-20438653

ABSTRACT

BACKGROUND: In eukaryotes, alternative splicing often generates multiple splice variants from a single gene. Here we explore the use of RNA sequencing (RNA-Seq) datasets to address the isoform quantification problem. Given a set of known splice variants, the goal is to estimate the relative abundance of the individual variants. METHODS: Our method employs a linear models framework to estimate the ratios of known isoforms in a sample. A key feature of our method is that it takes into account the non-uniformity of RNA-Seq read positions along the targeted transcripts. RESULTS: Preliminary tests indicate that the model performs well on both simulated and real data. In two publicly available RNA-Seq datasets, we identified several alternatively-spliced genes with switch-like, on/off expression properties, as well as a number of other genes that varied more subtly in isoform expression. In many cases, genes exhibiting differential expression of alternatively spliced transcripts were not differentially expressed at the gene level. CONCLUSIONS: Given that changes in isoform expression level frequently involve a continuum of isoform ratios, rather than all-or-nothing expression, and that they are often independent of general gene expression changes, we anticipate that our research will contribute to revealing a so far uninvestigated layer of the transcriptome. We believe that, in the future, researchers will prioritize genes for functional analysis based not only on observed changes in gene expression levels, but also on changes in alternative splicing.


Subject(s)
Alternative Splicing/genetics , Computational Biology/methods , Databases, Nucleic Acid , Protein Isoforms/analysis , Sequence Analysis, RNA/methods , Algorithms , Arabidopsis/genetics , Base Sequence , Chi-Square Distribution , Computer Simulation , Gene Expression Regulation, Plant , Genes, Plant , Linear Models , Models, Genetic , Molecular Sequence Data , Protein Isoforms/genetics , Stress, Physiological
8.
BMC Bioinformatics ; 10: 191, 2009 Jun 22.
Article in English | MEDLINE | ID: mdl-19545436

ABSTRACT

BACKGROUND: Quality assessment of microarray data is an important and often challenging aspect of gene expression analysis. This task frequently involves the examination of a variety of summary statistics and diagnostic plots. The interpretation of these diagnostics is often subjective, and generally requires careful expert scrutiny. RESULTS: We show how an unsupervised classification technique based on the Expectation-Maximization (EM) algorithm and the naïve Bayes model can be used to automate microarray quality assessment. The method is flexible and can be easily adapted to accommodate alternate quality statistics and platforms. We evaluate our approach using Affymetrix 3' gene expression and exon arrays and compare the performance of this method to a similar supervised approach. CONCLUSION: This research illustrates the efficacy of an unsupervised classification approach for the purpose of automated microarray data quality assessment. Since our approach requires only unannotated training data, it is easy to customize and to keep up-to-date as technology evolves. In contrast to other "black box" classification systems, this method also allows for intuitive explanations.


Subject(s)
Oligonucleotide Array Sequence Analysis/methods , Algorithms , Computational Biology/methods , Gene Expression Profiling/methods , Normal Distribution
9.
J Mass Spectrom ; 41(3): 281-8, 2006 Mar.
Article in English | MEDLINE | ID: mdl-16538648

ABSTRACT

The number and wide dynamic range of components found in biological matrixes present several challenges for global proteomics. In this perspective, we will examine the potential of zero-dimensional (0D), one-dimensional (1D), and two-dimensional (2D) separations coupled with Fourier-transform ion cyclotron resonance (FT-ICR) and time-of-flight (TOF) mass spectrometry (MS) for the analysis of complex mixtures. We describe and further develop previous reports on the space occupied by peptides, to calculate the theoretical peak capacity available to each separations-mass spectrometry method examined. Briefly, the peak capacity attainable by each of the mass analyzers was determined from the mass resolving power (RP) and the m/z space occupied by peptides considered from the mass distribution of tryptic peptides from National Center for Biotechnology Information's (NCBI's) nonredundant database. Our results indicate that reverse-phase-nanoHPLC (RP-nHPLC) separation coupled with FT-ICR MS offers an order of magnitude improvement in peak capacity over RP-nHPLC separation coupled with TOF MS. The addition of an orthogonal separation method, strong cation exchange (SCX), for 2D LC-MS demonstrates an additional 10-fold improvement in peak capacity over 1D LC-MS methods. Peak capacity calculations for 0D LC, two different 1D RP-HPLC methods, and 2D LC (with various numbers of SCX fractions) for both RP-HPLC methods coupled to FT-ICR and TOF MS are examined in detail. Peak capacity production rates, which take into account the total analysis time, are also considered for each of the methods. Furthermore, the significance of the space occupied by peptides is discussed.


Subject(s)
Electrophoresis, Gel, Two-Dimensional , Mass Spectrometry , Peptides/analysis , Proteomics/instrumentation , Proteomics/methods , Biotechnology/instrumentation , Biotechnology/methods , Fourier Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...