Search | VHL Regional Portal

Large-Scale Information Retrieval and Correction of Noisy Pharmacogenomic Datasets through Residual Thresholded Deep Matrix Factorization.

Hu, Zhiyue Tom; Yu, Yaodong; Chen, Ruoqiao; Yeh, Shan-Ju; Chen, Bin; Huang, Haiyan.

bioRxiv ; 2023 Dec 08.

Article in English | MEDLINE | ID: mdl-38106027

ABSTRACT

Pharmacogenomics studies are attracting an increasing amount of interest from researchers in precision medicine. The advances in high-throughput experiments and multiplexed approaches allow the large-scale quantification of drug sensitivities in molecularly characterized cancer cell lines (CCLs), resulting in a number of open drug sensitivity datasets for drug biomarker discovery. However, a significant inconsistency in drug sensitivity values among these datasets has been noted. Such inconsistency indicates the presence of substantial noise, subsequently hindering downstream analyses. To address the noise in drug sensitivity data, we introduce a robust and scalable deep learning framework, Residual Thresholded Deep Matrix Factorization (RT-DMF). This method takes a single drug sensitivity data matrix as its sole input and outputs a corrected and imputed matrix. Deep Matrix Factorization (DMF) excels at uncovering subtle patterns, due to its minimal reliance on data structure assumptions. This attribute significantly boosts DMF's ability to identify complex hidden patterns among nuisance effects in the data, thereby facilitating the detection of signals that are therapeutically relevant. Furthermore, RT-DMF incorporates an iterative residual thresholding (RT) procedure, which plays a crucial role in retaining signals more likely to hold therapeutic importance. Validation using simulated datasets and real pharmacogenomics datasets demonstrates the effectiveness of our approach in correcting noise and imputing missing data in drug sensitivity datasets (open source package available at https://github.com/tomwhoooo/rtdmf).

AICM: A Genuine Framework for Correcting Inconsistency Between Large Pharmacogenomics Datasets.

Hu, Zhiyue Tom; Ye, Yuting; Newbury, Patrick A; Huang, Haiyan; Chen, Bin.

Pac Symp Biocomput ; 24: 248-259, 2019.

Article in English | MEDLINE | ID: mdl-30864327

ABSTRACT

The inconsistency of open pharmacogenomics datasets produced by different studies limits the usage of such datasets in many tasks, such as biomarker discovery. Investigation of multiple pharmacogenomics datasets confirmed that the pairwise sensitivity data correlation between drugs, or rows, across different studies (drug-wise) is relatively low, while the pairwise sensitivity data correlation between cell-lines, or columns, across different studies (cell-wise) is considerably strong. This common interesting observation across multiple pharmacogenomics datasets suggests the existence of subtle consistency among the different studies (i.e., strong cell-wise correlation). However, significant noises are also shown (i.e., weak drug-wise correlation) and have prevented researchers from comfortably using the data directly. Motivated by this observation, we propose a novel framework for addressing the inconsistency between large-scale pharmacogenomics data sets. Our method can significantly boost the drug-wise correlation and can be easily applied to re-summarized and normalized datasets proposed by others. We also investigate our algorithm based on many different criteria to demonstrate that the corrected datasets are not only consistent, but also biologically meaningful. Eventually, we propose to extend our main algorithm into a framework, so that in the future when more datasets become publicly available, our framework can hopefully offer a "ground-truth" guidance for references.

Subject(s)

Algorithms , Databases, Genetic/statistics & numerical data , Pharmacogenetics/statistics & numerical data , Cell Line, Tumor , Computational Biology/methods , Databases, Pharmaceutical/statistics & numerical data , Drug Resistance, Neoplasm/genetics , Genetic Markers , Humans , Neoplasms/drug therapy , Neoplasms/genetics , Pharmacogenomic Variants , Precision Medicine

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL