Your browser doesn't support javascript.
Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin.
Kircher, Magdalena; Chludzinski, Elisa; Krepel, Jessica; Saremi, Babak; Beineke, Andreas; Jung, Klaus.
  • Kircher M; Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Buenteweg 17p, 30559 Hannover, Germany.
  • Chludzinski E; Department of Pathology, University of Veterinary Medicine Hannover, Buenteweg 17, 30559 Hannover, Germany.
  • Krepel J; Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Buenteweg 17p, 30559 Hannover, Germany.
  • Saremi B; Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Buenteweg 17p, 30559 Hannover, Germany.
  • Beineke A; Department of Pathology, University of Veterinary Medicine Hannover, Buenteweg 17, 30559 Hannover, Germany.
  • Jung K; Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Buenteweg 17p, 30559 Hannover, Germany.
Int J Mol Sci ; 23(5)2022 Feb 24.
Article in English | MEDLINE | ID: covidwho-1715406
ABSTRACT
To better understand the molecular basis of respiratory diseases of viral origin, high-throughput gene-expression data are frequently taken by means of DNA microarray or RNA-seq technology. Such data can also be useful to classify infected individuals by molecular signatures in the form of machine-learning models with genes as predictor variables. Early diagnosis of patients by molecular signatures could also contribute to better treatments. An approach that has rarely been considered for machine-learning models in the context of transcriptomics is data augmentation. For other data types it has been shown that augmentation can improve classification accuracy and prevent overfitting. Here, we compare three strategies for data augmentation of DNA microarray and RNA-seq data from two selected studies on respiratory diseases of viral origin. The first study involves samples of patients with either viral or bacterial origin of the respiratory disease, the second study involves patients with either SARS-CoV-2 or another respiratory virus as disease origin. Specifically, we reanalyze these public datasets to study whether patient classification by transcriptomic signatures can be improved when adding artificial data for training of the machine-learning models. Our comparison reveals that augmentation of transcriptomic data can improve the classification accuracy and that fewer genes are necessary as explanatory variables in the final models. We also report genes from our signatures that overlap with signatures presented in the original publications of our example data. Due to strict selection criteria, the molecular role of these genes in the context of respiratory infectious diseases is underlined.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: Neural Networks, Computer / Gene Expression Profiling / Transcriptome / Machine Learning / RNA-Seq / COVID-19 Type of study: Diagnostic study / Prognostic study Limits: Humans Language: English Year: 2022 Document Type: Article Affiliation country: Ijms23052481

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Neural Networks, Computer / Gene Expression Profiling / Transcriptome / Machine Learning / RNA-Seq / COVID-19 Type of study: Diagnostic study / Prognostic study Limits: Humans Language: English Year: 2022 Document Type: Article Affiliation country: Ijms23052481