Search | VHL Regional Portal

A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up.

Delibas, Emre; Arslan, Ahmet; Seker, Abdulkadir; Diri, Banu.

J Mol Graph Model ; 100: 107693, 2020 11.

Article in English | MEDLINE | ID: mdl-32805559

ABSTRACT

DNA sequence similarity analysis is an essential task in computational biology and bioinformatics. In nearly all research that explores evolutionary relationships, gene function analysis, protein structure prediction and sequence retrieving, it is necessary to perform similarity calculations. As an alternative to alignment-based sequence comparison methods, which result in high computational cost, alignment-free methods have emerged that calculate similarity by digitizing the sequence in a different space. In this paper, we proposed an alignment-free DNA sequence similarity analysis method based on top-k n-gram matches, with the prediction that common repeating DNA subsections indicate high similarity between DNA sequences. In our method, we determined DNA sequence similarities by measuring similarity among feature vectors created according to top-k n-gram match-up scores without the use of similarity functions. We applied the similarity calculation for three different DNA data sets of different lengths. The phylogenetic relationships revealed by our method show that our trees coincide almost completely with the results of the MEGA software, which is based on sequence alignment. Our findings show that a certain number of frequently recurring common sequence patterns have the power to characterize DNA sequences.

Subject(s)

Algorithms , Software , Base Sequence , Computational Biology , Phylogeny , Sequence Alignment , Sequence Analysis, DNA

DNA sequence similarity analysis using image texture analysis based on first-order statistics.

Delibas, Emre; Arslan, Ahmet.

J Mol Graph Model ; 99: 107603, 2020 09.

Article in English | MEDLINE | ID: mdl-32442904

ABSTRACT

Similarity is one of the key processes of DNA sequence analysis in computational biology and bioinformatics. In nearly all research that explores evolutionary relationships, gene function analysis, protein structure prediction and sequence retrieving, it is necessary to perform similarity calculations. One major task in alignment-free DNA sequence similarity calculations is to develop novel mathematical descriptors for DNA sequences. In this paper, we present a novel approach to DNA sequence similarity analysis studies using similarity calculations of texture images. Texture analysis methods, which are a subset of digital image processing methods, are used here with the assumption that these calculations can be adapted to alignment-free DNA sequence similarity analysis methods. Gray-level textures were created by the values assigned to the nucleotides in the DNA sequences. Similarity calculations were made between these textures using histogram-based texture analyses based on first-order statistics. We obtained texture features for 3 different DNA data sets of different lengths, and calculated the similarity matrices. The phylogenetic relationships revealed by our method shows our trees to be similar to the results of the MEGA software, which is based on sequence alignment. Our findings show that texture analysis metrics can be used to characterize DNA sequences.

Subject(s)

Algorithms , Software , Base Sequence , Image Processing, Computer-Assisted , Phylogeny , Sequence Analysis, DNA

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL