Search | VHL Regional Portal

Detection of Highly Divergent Tandem Repeats in the Rice Genome.

Korotkov, Eugene V; Kamionskya, Anastasiya M; Korotkova, Maria A.

Genes (Basel) ; 12(4)2021 03 25.

Article in English | MEDLINE | ID: mdl-33806152

ABSTRACT

Currently, there is a lack of bioinformatics approaches to identify highly divergent tandem repeats (TRs) in eukaryotic genomes. Here, we developed a new mathematical method to search for TRs, which uses a novel algorithm for constructing multiple alignments based on the generation of random position weight matrices (RPWMs), and applied it to detect TRs of 2 to 50 nucleotides long in the rice genome. The RPWM method could find highly divergent TRs in the presence of insertions or deletions. Comparison of the RPWM algorithm with the other methods of TR identification showed that RPWM could detect TRs in which the average number of base substitutions per nucleotide (x) was between 1.5 and 3.2, whereas T-REKS and TRF methods could not detect divergent TRs with x > 1.5. Applied to the search of TRs in the rice genome, the RPWM method revealed that TRs occupied 5% of the genome and that most of them were 2 and 3 bases long. Using RPWM, we also revealed the correlation of TRs with dispersed repeats and transposons, suggesting that some transposons originated from TRs. Thus, the novel RPWM algorithm is an effective tool to search for highly divergent TRs in the genomes.

Subject(s)

Chromosome Mapping/methods , Chromosomes, Plant/genetics , Genome, Plant , Oryza/genetics , Tandem Repeat Sequences/genetics , Phylogeny

Multiple Alignment of Promoter Sequences from the Arabidopsis thaliana L. Genome.

Korotkov, Eugene V; Suvorova, Yulia M; Kostenko, Dmitrii O; Korotkova, Maria A.

Genes (Basel) ; 12(2)2021 01 21.

Article in English | MEDLINE | ID: mdl-33494278

ABSTRACT

In this study, we developed a new mathematical method for performing multiple alignment of highly divergent sequences (MAHDS), i.e., sequences that have on average more than 2.5 substitutions per position (x). We generated sets of artificial DNA sequences with x ranging from 0 to 4.4 and applied MAHDS as well as currently used multiple sequence alignment algorithms, including ClustalW, MAFFT, T-Coffee, Kalign, and Muscle to these sets. The results indicated that most of the existing methods could produce statistically significant alignments only for the sets with x < 2.5, whereas MAHDS could operate on sequences with x = 4.4. We also used MAHDS to analyze a set of promoter sequences from the Arabidopsis thaliana genome and discovered many conserved regions upstream of the transcription initiation site (from -499 to +1 bp); a part of the downstream region (from +1 to +70 bp) also significantly contributed to the obtained alignments. The possibilities of applying the newly developed method for the identification of promoter sequences in any genome are discussed. A server for multiple alignment of nucleotide sequences has been created.

Subject(s)

Arabidopsis/genetics , Computational Biology , Genome, Plant , Genomics , Promoter Regions, Genetic , Sequence Analysis, DNA/methods , Algorithms , Computational Biology/methods , Genomics/methods

Database of Periodic DNA Regions in Major Genomes.

Frenkel, Felix E; Korotkova, Maria A; Korotkov, Eugene V.

Biomed Res Int ; 2017: 7949287, 2017.

Article in English | MEDLINE | ID: mdl-28182099

ABSTRACT

Summary. We analyzed several prokaryotic and eukaryotic genomes looking for the periodicity sequences availability and employing a new mathematical method. The method envisaged using the random position weight matrices and dynamic programming. Insertions and deletions were allowed inside periodicities, thus adding a novelty to the results we obtained. A periodicity length, one of the key periodicity features, varied from 2 to 50 nt. Totally over 60,000 periodicity sequences were found in 15 genomes including some chromosomes of the H. sapiens (partial), C. elegans, D. melanogaster, and A. thaliana genomes.

Subject(s)

Genome , INDEL Mutation/genetics , Sequence Analysis, DNA , Animals , Arabidopsis/genetics , Caenorhabditis elegans/genetics , Chromosomes/genetics , Drosophila melanogaster/genetics , Humans , Models, Theoretical , Prokaryotic Cells

Comparative analysis of periodicity search methods in DNA sequences.

Suvorova, Yulia M; Korotkova, Maria A; Korotkov, Eugene V.

Comput Biol Chem ; 53 Pt A: 43-8, 2014 Dec.

Article in English | MEDLINE | ID: mdl-25218218

ABSTRACT

To determine the periodicity of a DNA sequence, different spectral approaches are applied (discrete Fourier transform (DFT), autocorrelation (CORR), information decomposition (ID), hybrid method (HYB), concept of spectral envelope for spectral analysis (SE), normalized autocorrelation (CORR_N) and profile analysis (PA). In this work, we investigated the possibility of finding the true period length, by depending on the average number of accumulated changes in DNA bases (PM) for the methods stated above. The results show that for periods with short length (≤4 b.p), it is possible to use the hybrid method (HYB), which combines properties of autocorrelation, Fourier transform, and information decomposition (ID). For larger period lengths (>4) with values of point mutation (PM) equal to 1.0 or more per one nucleotide, it is preferable to use information of decomposition method (ID), as the other spectral approaches cannot achieve correct determination of the period length present in the analyzed sequence.

Subject(s)

Caenorhabditis elegans/genetics , DNA, Helminth/genetics , Models, Statistical , Periodicity , Sequence Analysis, DNA/statistics & numerical data , Animals , Fourier Analysis , Nucleotides , Point Mutation

Study of the Paired Change Points in Bacterial Genes.

Suvorova, Yulia M; Korotkova, Maria A; Korotkov, Eugene V.

IEEE/ACM Trans Comput Biol Bioinform ; 11(5): 955-64, 2014.

Article in English | MEDLINE | ID: mdl-26356866

ABSTRACT

It is known that nucleotide sequences are not totally homogeneous and this heterogeneity could not be due to random fluctuations only. Such heterogeneity poses a problem of making sequence segmentation into a set of homogeneous parts divided by the points called "change points". In this work we investigated a special case of change points-paired change points (PCP). We used a well-known property of coding sequences-triplet periodicity (TP). The sequences that we are especially interested in consist of three successive parts: the first and the last parts have similar TP while the middle part has different TP type. We aimed to find the genes with PCP and provide explanation for this phenomenon. We developed a mathematical method for the PCP detection based on the new measure of similarity between TP matrices. We investigated 66,936 bacterial genes from 17 bacterial genomes and revealed 2,700 genes with PCP and 6,459 genes with single change point (SCP). We developed a mathematical approach to visualize the PCP cases. We suppose that PCP could be associated with double fusion or insertion events. The results of investigating the sequences with artificial insertions/fusions and distribution of TP inside the genome support the idea that the real number of genes formed by insertion/ fusion events could be 5-7 times greater than the number of genes revealed in the present work.

Subject(s)

Algorithms , Genes, Bacterial/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Gene Fusion/genetics , Mutagenesis, Insertional/genetics

An approach for searching insertions in bacterial genes leading to the phase shift of triplet periodicity.

Korotkova, Maria A; Kudryashov, Nikolay A; Korotkov, Eugene V.

Genomics Proteomics Bioinformatics ; 9(4-5): 158-70, 2011 Oct.

Article in English | MEDLINE | ID: mdl-22196359

ABSTRACT

The concept of the phase shift of triplet periodicity (TP) was used for searching potential DNA insertions in genes from 17 bacterial genomes. A mathematical algorithm for detection of these insertions has been developed. This approach can detect potential insertions and deletions with lengths that are not multiples of three bases, especially insertions of relatively large DNA fragments (>100 bases). New similarity measure between triplet matrixes was employed to improve the sensitivity for detecting the TP phase shift. Sequences of 17,220 bacterial genes with each consisting of more than 1,200 bases were analyzed, and the presence of a TP phase shift has been shown in â¼16% of analysed genes (2,809 genes), which is about 4 times more than that detected in our previous work. We propose that shifts of the TP phase may indicate the shifts of reading frame in genes after insertions of the DNA fragments with lengths that are not multiples of three bases. A relationship between the phase shifts of TP and the frame shifts in genes is discussed.

Subject(s)

Algorithms , Computational Biology/methods , DNA Transposable Elements/genetics , Genes, Bacterial/genetics , Base Sequence , Periodicity , Reading Frames/genetics , Sequence Homology, Amino Acid

Study of the triplet periodicity phase shifts in genes.

Korotkov, Eugene V; Korotkova, Maria A.

J Integr Bioinform ; 7(3)2010 Mar 25.

Article in English | MEDLINE | ID: mdl-20375465

ABSTRACT

The definition of a phase shift of triplet periodicity (TP) is introduced. The mathematical algorithm for detection of TP phase shift of nucleotide sequences has been developed. Gene sequences from Kegg-46 data bank were analyzed with a purpose of searching genes with a phase shift of TP. The presence of a phase shift of triplet periodicity has been shown for 318329 genes (approximately 10% from the number of genes in Kegg-46). We suppose that shifts of the TP phase may indicate the shifts of reading frame (RF) in genes. A relationship between the phase shifts of TP and the frame shifts in genes is discussed.

Subject(s)

Genes, Bacterial/genetics , Periodicity , Algorithms , Bacteria/genetics , Base Sequence , Databases, Genetic , Open Reading Frames/genetics , Sequence Homology, Amino Acid

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL