Search | VHL Regional Portal

Novel statistical tools for management of public databases facilitate community-wide replicability and control of false discovery.

Rosset, Saharon; Aharoni, Ehud; Neuvirth, Hani.

Genet Epidemiol ; 38(5): 477-81, 2014 Jul.

Article in English | MEDLINE | ID: mdl-24706571

ABSTRACT

Issues of publication bias, lack of replicability, and false discovery have long plagued the genetics community. Proper utilization of public and shared data resources presents an opportunity to ameliorate these problems. We present an approach to public database management that we term Quality Preserving Database (QPD). It enables perpetual use of the database for testing statistical hypotheses while controlling false discovery and avoiding publication bias on the one hand, and maintaining testing power on the other hand. We demonstrate it on a use case of a replication server for GWAS findings, underlining its practical utility. We argue that a shift to using QPD in managing current and future biological databases will significantly enhance the community's ability to make efficient and statistically sound use of the available data resources.

Subject(s)

Databases, Factual/standards , Information Management/methods , Public Sector , Databases, Factual/economics , Information Management/economics , Information Management/standards , Publication Bias , Quality Control , Reproducibility of Results

The quality preserving database: a computational framework for encouraging collaboration, enhancing power and controlling false discovery.

Aharoni, Ehud; Neuvirth, Hani; Rosset, Saharon.

IEEE/ACM Trans Comput Biol Bioinform ; 8(5): 1431-7, 2011.

Article in English | MEDLINE | ID: mdl-21778529

ABSTRACT

The common scenario in computational biology in which a community of researchers conduct multiple statistical tests on one shared database gives rise to the multiple hypothesis testing problem. Conventional procedures for solving this problem control the probability of false discovery by sacrificing some of the power of the tests. We suggest a scheme for controlling false discovery without any power loss by adding new samples for each use of the database and charging the user with the expenses. The crux of the scheme is a carefully crafted pricing system that fairly prices different user requests based on their demands while keeping the probability of false discovery bounded. We demonstrate this idea in the context of HIV treatment research, where multiple researchers conduct tests on a repository of HIV samples.

Subject(s)

Computational Biology/standards , Database Management Systems/standards , Biomedical Research , Data Interpretation, Statistical

Investigation of expert rule bases, logistic regression, and non-linear machine learning techniques for predicting response to antiretroviral treatment.

Prosperi, Mattia C F; Altmann, Andre; Rosen-Zvi, Michal; Aharoni, Ehud; Borgulya, Gabor; Bazso, Fulop; Sönnerborg, Anders; Schülter, Eugen; Struck, Daniel; Ulivi, Giovanni; Vandamme, Anne-Mieke; Vercauteren, Jurgen; Zazzi, Maurizio.

Antivir Ther ; 14(3): 433-42, 2009.

Article in English | MEDLINE | ID: mdl-19474477

ABSTRACT

BACKGROUND: The extreme flexibility of the HIV type-1 (HIV-1) genome makes it challenging to build the ideal antiretroviral treatment regimen. Interpretation of HIV-1 genotypic drug resistance is evolving from rule-based systems guided by expert opinion to data-driven engines developed through machine learning methods. METHODS: The aim of the study was to investigate linear and non-linear statistical learning models for classifying short-term virological outcome of antiretroviral treatment. To optimize the model, different feature selection methods were considered. Robust extra-sample error estimation and different loss functions were used to assess model performance. The results were compared with widely used rule-based genotypic interpretation systems (Stanford HIVdb, Rega and ANRS). RESULTS: A set of 3,143 treatment change episodes were extracted from the EuResist database. The dataset included patient demographics, treatment history and viral genotypes. A logistic regression model using high order interaction variables performed better than rule-based genotypic interpretation systems (accuracy 75.63% versus 71.74-73.89%, area under the receiver operating characteristic curve [AUC] 0.76 versus 0.68-0.70) and was equivalent to a random forest model (accuracy 76.16%, AUC 0.77). However, when rule-based genotypic interpretation systems were coupled with additional patient attributes, and the combination was provided as input to the logistic regression model, the performance increased significantly, becoming comparable to the fully data-driven methods. CONCLUSIONS: Patient-derived supplementary features significantly improved the accuracy of the prediction of response to treatment, both with rule-based and data-driven interpretation systems. Fully data-driven models derived from large-scale data sources show promise as antiretroviral treatment decision support tools.

Subject(s)

Anti-HIV Agents/therapeutic use , Artificial Intelligence , HIV Infections/drug therapy , HIV-1/genetics , Models, Statistical , Adult , Databases, Factual , Female , HIV Infections/virology , Humans , Logistic Models , Male , Treatment Outcome , Viral Load

Comparison of classifier fusion methods for predicting response to anti HIV-1 therapy.

Altmann, André; Rosen-Zvi, Michal; Prosperi, Mattia; Aharoni, Ehud; Neuvirth, Hani; Schülter, Eugen; Büch, Joachim; Struck, Daniel; Peres, Yardena; Incardona, Francesca; Sönnerborg, Anders; Kaiser, Rolf; Zazzi, Maurizio; Lengauer, Thomas.

PLoS One ; 3(10): e3470, 2008.

Article in English | MEDLINE | ID: mdl-18941628

ABSTRACT

BACKGROUND: Analysis of the viral genome for drug resistance mutations is state-of-the-art for guiding treatment selection for human immunodeficiency virus type 1 (HIV-1)-infected patients. These mutations alter the structure of viral target proteins and reduce or in the worst case completely inhibit the effect of antiretroviral compounds while maintaining the ability for effective replication. Modern anti-HIV-1 regimens comprise multiple drugs in order to prevent or at least delay the development of resistance mutations. However, commonly used HIV-1 genotype interpretation systems provide only classifications for single drugs. The EuResist initiative has collected data from about 18,500 patients to train three classifiers for predicting response to combination antiretroviral therapy, given the viral genotype and further information. In this work we compare different classifier fusion methods for combining the individual classifiers. PRINCIPAL FINDINGS: The individual classifiers yielded similar performance, and all the combination approaches considered performed equally well. The gain in performance due to combining methods did not reach statistical significance compared to the single best individual classifier on the complete training set. However, on smaller training set sizes (200 to 1,600 instances compared to 2,700) the combination significantly outperformed the individual classifiers (p<0.01; paired one-sided Wilcoxon test). Together with a consistent reduction of the standard deviation compared to the individual prediction engines this shows a more robust behavior of the combined system. Moreover, using the combined system we were able to identify a class of therapy courses that led to a consistent underestimation (about 0.05 AUC) of the system performance. Discovery of these therapy courses is a further hint for the robustness of the combined system. CONCLUSION: The combined EuResist prediction engine is freely available at http://engine.euresist.org.

Subject(s)

Anti-HIV Agents/pharmacology , Artificial Intelligence , Computational Biology/methods , Drug Resistance/genetics , Genome, Viral , Mutation , Diagnosis, Computer-Assisted , Genotype , Internet , Methods , Models, Statistical

Selecting anti-HIV therapies based on a variety of genomic and clinical factors.

Rosen-Zvi, Michal; Altmann, Andre; Prosperi, Mattia; Aharoni, Ehud; Neuvirth, Hani; Sönnerborg, Anders; Schülter, Eugen; Struck, Daniel; Peres, Yardena; Incardona, Francesca; Kaiser, Rolf; Zazzi, Maurizio; Lengauer, Thomas.

Bioinformatics ; 24(13): i399-406, 2008 Jul 01.

Article in English | MEDLINE | ID: mdl-18586740

ABSTRACT

MOTIVATION: Optimizing HIV therapies is crucial since the virus rapidly develops mutations to evade drug pressure. Recent studies have shown that genotypic information might not be sufficient for the design of therapies and that other clinical and demographical factors may play a role in therapy failure. This study is designed to assess the improvement in prediction achieved when such information is taken into account. We use these factors to generate a prediction engine using a variety of machine learning methods and to determine which clinical conditions are most misleading in terms of predicting the outcome of a therapy. RESULTS: Three different machine learning techniques were used: generative-discriminative method, regression with derived evolutionary features, and regression with a mixture of effects. All three methods had similar performances with an area under the receiver operating characteristic curve (AUC) of 0.77. A set of three similar engines limited to genotypic information only achieved an AUC of 0.75. A straightforward combination of the three engines consistently improves the prediction, with significantly better prediction when the full set of features is employed. The combined engine improves on predictions obtained from an online state-of-the-art resistance interpretation system. Moreover, engines tend to disagree more on the outcome of failure therapies than regarding successful ones. Careful analysis of the differences between the engines revealed those mutations and drugs most closely associated with uncertainty of the therapy outcome. AVAILABILITY: The combined prediction engine will be available from July 2008, see http://engine.euresist.org.

Subject(s)

Anti-HIV Agents/therapeutic use , Chromosome Mapping/methods , Decision Support Systems, Clinical , Genetic Predisposition to Disease/genetics , HIV Infections/drug therapy , HIV Infections/genetics , Outcome Assessment, Health Care/methods , Pharmacogenetics/methods , Humans

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL