Búsqueda | Global Index Medicus

A Novel Frequency Based Feature Extraction Technique for Classification of Corona Virus Genome and Discovery of COVID-19 Repeat Pattern

Murugaiah, Muthulakshmi; Ganesan, Murugeswari.

Braz. arch. biol. technol ; 64: e21210075, 2021. tab, graf

Artículo en Inglés | LILACS-Express | LILACS | ID: biblio-1355812

RESUMEN

Abstract Genome sequence regulates the life of all living organisms on earth. Genetic diseases cause genomic disorders and therefore early prediction of severe genetic diseases is quite possible by Genome sequence analysis. Genomic disorders refer to the mutation that is rearrangement of bases in the Genome of an organism. Genome sequence analysis and mutation identification can help to classify the diseased genome which can be accomplished using Machine Learning techniques. Feature Extraction plays a crucial role in classification as it is used to convert the Genome sequences into a set of quantitative values. In this article, we propose a novel feature extraction technique called Frequency based Feature Extraction Technique which extracts 120 features from genome sequences for classification. In the current scenario, COVID-19 is the pandemic disease and Corona virus is the source of this disease. So, in this research work, we tested the proposed feature extraction technique with 1000 samples of Genome sequences of Corona virus affected patients across the world. The extracted features were classified using both Machine Learning and Deep Learning techniques. From the results, it is evident that the proposed feature extraction technique performs well with Convolutional Neural Network classifier giving an accuracy of 97.96%. The proposed technique also helps to find the most repeat patterns in the genome sequences. It is discovered that the pattern "TTGTT" is the most repeat pattern in COVID-19 genome.

Research on nucleotide sequence of a newly emerged pandemic norovirus GⅡ.4 genotype / 国际检验医学杂志

Mingli ZHOU; Ailing CAI; Xuefeng WANG.

International Journal of Laboratory Medicine ; (12): 231-232, 2017.

Artículo en Chino | WPRIM | ID: wpr-508197

RESUMEN

Objective Analysis of the complete genome sequence about the newly emerged pandemic norovirus GⅡ.4 genotype, to understand its variation characteristics.Methods 264 patients were collected with diarrhea.The RNA was extracted from 264 fe-cal specimens and cDNA was synthetized.The positive samples were amplified by PCR,the amplified fragments were sequenced. The complete genome sequences of the norovirus was sequenced and analyzed.Results A new norovirus variant strain of Jingzhou GⅡ.4,a pleiston of Sydney GⅡ.4 was isolated.A large variation was found in the new variant subtype,which was mutated inclu-ding in the hypervariable P2 domain of the major capsid protein VP1.Conclusion The result demonstrates the variant strain of Sydney GⅡ.4 was spread to China.VP1 of norovirus GⅡ.4 is evolving rapidly.The spread and evolution situation of the norovirus GⅡ.4 need to be closely monitored in China for the development of effective vaccines and therapeutic monoclonal antibodies.

BIND – An algorithm for loss-less compression of nucleotide sequence data.

Bose, Tungadri; Mohammed, Monzoorul Haque; Dutta, Anirban; Mande, Sharmila S.

J Biosci ; 2012 Sep; 37 (4): 785-789

Artículo en Inglés | IMSEAR | ID: sea-161741

RESUMEN

Recent advances in DNA sequencing technologies have enabled the current generation of life science researchers to probe deeper into the genomic blueprint. The amount of data generated by these technologies has been increasing exponentially since the last decade. Storage, archival and dissemination of such huge data sets require efficient solutions, both from the hardware as well as software perspective. The present paper describes BIND – an algorithm specialized for compressing nucleotide sequence data. By adopting a unique ‘block-length’ encoding for representing binary data (as a key step), BIND achieves significant compression gains as compared to the widely used general purpose compression algorithms (gzip, bzip2 and lzma). Moreover, in contrast to implementations of existing specialized genomic compression approaches, the implementation of BIND is enabled to handle non-ATGC and lowercase characters. This makes BIND a loss-less compression approach that is suitable for practical use. More importantly, validation results of BIND (with real-world data sets) indicate reasonable speeds of compression and decompression that can be achieved with minimal processor/ memory usage. BIND is available for download at http://metagenomics.atc.tcs.com/compression/BIND. No license is required for academic or non-profit use.

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA