Search | VHL Regional Portal

A feature selection strategy for gene expression time series experiments with hidden Markov models.

Cárdenas-Ovando, Roberto A; Fernández-Figueroa, Edith A; Rueda-Zárate, Héctor A; Noguez, Julieta; Rangel-Escareño, Claudia.

PLoS One ; 14(10): e0223183, 2019.

Article in English | MEDLINE | ID: mdl-31600242

ABSTRACT

Studies conducted in time series could be far more informative than those that only capture a specific moment in time. However, when it comes to transcriptomic data, time points are sparse creating the need for a constant search for methods capable of extracting information out of experiments of this kind. We propose a feature selection algorithm embedded in a hidden Markov model applied to gene expression time course data on either single or even multiple biological conditions. For the latter, in a simple case-control study features or genes are selected under the assumption of no change over time for the control samples, while the case group must have at least one change. The proposed model reduces the feature space according to a two-state hidden Markov model. The two states define change/no-change in gene expression. Features are ranked in consonance with three scores: number of changes across time, magnitude of such changes and quality of replicates as a measure of how much they deviate from the mean. An important highlight is that this strategy overcomes the few samples limitation, common in transcriptome experiments through a process of data transformation and rearrangement. To prove this method, our strategy was applied to three publicly available data sets. Results show that feature domain is reduced by up to 90% leaving only few but relevant features yet with findings consistent to those previously reported. Moreover, our strategy proved to be robust, stable and working on studies where sample size is an issue otherwise. Hence, even with two biological replicates and/or three time points our method proves to work well.

Subject(s)

Gene Expression/genetics , Markov Chains , Models, Statistical , Algorithms , Case-Control Studies

A computational toxicogenomics approach identifies a list of highly hepatotoxic compounds from a large microarray database.

Rueda-Zárate, Héctor A; Imaz-Rosshandler, Iván; Cárdenas-Ovando, Roberto A; Castillo-Fernández, Juan E; Noguez-Monroy, Julieta; Rangel-Escareño, Claudia.

PLoS One ; 12(4): e0176284, 2017.

Article in English | MEDLINE | ID: mdl-28448553

ABSTRACT

The liver and the kidney are the most common targets of chemical toxicity, due to their major metabolic and excretory functions. However, since the liver is directly involved in biotransformation, compounds in many currently and normally used drugs could affect it adversely. Most chemical compounds are already labeled according to FDA-approved labels using DILI-concern scale. Drug Induced Liver Injury (DILI) scale refers to an adverse drug reaction. Many compounds do not exhibit hepatotoxicity at early stages of development, so it is important to detect anomalies at gene expression level that could predict adverse reactions in later stages. In this study, a large collection of microarray data is used to investigate gene expression changes associated with hepatotoxicity. Using TG-GATEs a large-scale toxicogenomics database, we present a computational strategy to classify compounds by toxicity levels in human and animal models through patterns of gene expression. We combined machine learning algorithms with time series analysis to identify genes capable of classifying compounds by FDA-approved labeling as DILI-concern toxic. The goal is to define gene expression profiles capable of distinguishing the different subtypes of hepatotoxicity. The study illustrates that expression profiling can be used to classify compounds according to different hepatotoxic levels; to label those that are currently labeled as undertemined; and to determine if at the molecular level, animal models are a good proxy to predict hepatotoxicity in humans.

Subject(s)

Cytotoxins/toxicity , Databases, Genetic , Genomics/methods , Liver/drug effects , Liver/metabolism , Oligonucleotide Array Sequence Analysis , Toxicogenetics , Animals , Chemical and Drug Induced Liver Injury/genetics , Dose-Response Relationship, Drug , Drug Evaluation, Preclinical , Humans , Mice , Time Factors , Unsupervised Machine Learning

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL