ABSTRACT
Integrating multiple databases of similar tasks is a significant problem in biological data analysis. In this paper, we consider whether feature selection in a single database can benefit from incorporating similar databases. We report that by using adaptive multi-task elastic net for feature selection and Random Forest for prediction, the prediction performance can be improved for pharmacogenomics databases. We also present a simulation study to explain the robust feature selection benefit of adaptive multi task elastic net while dealing with noisy features.
Subject(s)
Algorithms , Pharmacogenetics , Databases, FactualABSTRACT
Recent years have observed a number of Pharmacogenomics databases being published that enable testing of various predictive modeling techniques for personalized therapy applications. However, the consistencies between the databases are usually limited in spite of having significant number of common cell lines and drugs. In this article, we consider the problem of whether we can use the model learned from one secondary database to improve the prediction for the other target database. We illustrate using two pharmacogenomics databases that representing the databases using common basis vectors can improve prediction performance as compared to the naive application of a model trained on one database to another. We also elucidate the robustness of using PCA based basis vectors for scenarios with low correlated input features.