Your browser doesn't support javascript.
loading
A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern
Murugaiah, Muthulakshmi; Ganesan, Murugeswari.
  • Murugaiah, Muthulakshmi; ManonmaniamSundaranar University. Department of Computer Science and Engineering. Tirunelveli. IN
  • Ganesan, Murugeswari; ManonmaniamSundaranar University. Department of Computer Science and Engineering. Tirunelveli. IN
Braz. arch. biol. technol ; 64: e21210075, 2021. tab, graf
Article in English | LILACS-Express | LILACS | ID: biblio-1355812
ABSTRACT
Abstract Genome sequence regulates the life of all living organisms on earth. Genetic diseases cause genomic disorders and therefore early prediction of severe genetic diseases is quite possible by Genome sequence analysis. Genomic disorders refer to the mutation that is rearrangement of bases in the Genome of an organism. Genome sequence analysis and mutation identification can help to classify the diseased genome which can be accomplished using Machine Learning techniques. Feature Extraction plays a crucial role in classification as it is used to convert the Genome sequences into a set of quantitative values. In this article, we propose a novel feature extraction technique called Frequency based Feature Extraction Technique which extracts 120 features from genome sequences for classification. In the current scenario, COVID-19 is the pandemic disease and Corona virus is the source of this disease. So, in this research work, we tested the proposed feature extraction technique with 1000 samples of Genome sequences of Corona virus affected patients across the world. The extracted features were classified using both Machine Learning and Deep Learning techniques. From the results, it is evident that the proposed feature extraction technique performs well with Convolutional Neural Network classifier giving an accuracy of 97.96%. The proposed technique also helps to find the most repeat patterns in the genome sequences. It is discovered that the pattern "TTGTT" is the most repeat pattern in COVID-19 genome.


Full text: Available Index: LILACS (Americas) Language: English Journal: Braz. arch. biol. technol Journal subject: Biology Year: 2021 Type: Article Affiliation country: India Institution/Affiliation country: ManonmaniamSundaranar University/IN

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Index: LILACS (Americas) Language: English Journal: Braz. arch. biol. technol Journal subject: Biology Year: 2021 Type: Article Affiliation country: India Institution/Affiliation country: ManonmaniamSundaranar University/IN