Your browser doesn't support javascript.
loading
A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern
Murugaiah, Muthulakshmi; Ganesan, Murugeswari.
  • Murugaiah, Muthulakshmi; ManonmaniamSundaranar University. Department of Computer Science and Engineering. Tirunelveli. IN
  • Ganesan, Murugeswari; ManonmaniamSundaranar University. Department of Computer Science and Engineering. Tirunelveli. IN
Braz. arch. biol. technol ; 64: e21210075, 2021. tab, graf
Artigo em Inglês | LILACS-Express | LILACS | ID: biblio-1355812
ABSTRACT
Abstract Genome sequence regulates the life of all living organisms on earth. Genetic diseases cause genomic disorders and therefore early prediction of severe genetic diseases is quite possible by Genome sequence analysis. Genomic disorders refer to the mutation that is rearrangement of bases in the Genome of an organism. Genome sequence analysis and mutation identification can help to classify the diseased genome which can be accomplished using Machine Learning techniques. Feature Extraction plays a crucial role in classification as it is used to convert the Genome sequences into a set of quantitative values. In this article, we propose a novel feature extraction technique called Frequency based Feature Extraction Technique which extracts 120 features from genome sequences for classification. In the current scenario, COVID-19 is the pandemic disease and Corona virus is the source of this disease. So, in this research work, we tested the proposed feature extraction technique with 1000 samples of Genome sequences of Corona virus affected patients across the world. The extracted features were classified using both Machine Learning and Deep Learning techniques. From the results, it is evident that the proposed feature extraction technique performs well with Convolutional Neural Network classifier giving an accuracy of 97.96%. The proposed technique also helps to find the most repeat patterns in the genome sequences. It is discovered that the pattern "TTGTT" is the most repeat pattern in COVID-19 genome.


Texto completo: DisponíveL Índice: LILACS (Américas) Idioma: Inglês Revista: Braz. arch. biol. technol Assunto da revista: Biologia Ano de publicação: 2021 Tipo de documento: Artigo País de afiliação: Índia Instituição/País de afiliação: ManonmaniamSundaranar University/IN

Similares

MEDLINE

...
LILACS

LIS


Texto completo: DisponíveL Índice: LILACS (Américas) Idioma: Inglês Revista: Braz. arch. biol. technol Assunto da revista: Biologia Ano de publicação: 2021 Tipo de documento: Artigo País de afiliação: Índia Instituição/País de afiliação: ManonmaniamSundaranar University/IN