Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
J Bioinform Comput Biol ; 21(2): 2350009, 2023 04.
Article in English | MEDLINE | ID: mdl-37104034

ABSTRACT

Genome rearrangement events are widely used to estimate a minimum-size sequence of mutations capable of transforming a genome into another. The length of this sequence is called distance, and determining it is the main goal in genome rearrangement distance problems. Problems in the genome rearrangement field differ regarding the set of rearrangement events allowed and the genome representation. In this work, we consider the scenario where the genomes share the same set of genes, gene orientation is known or unknown, and intergenic regions (structures between a pair of genes and at the extremities of the genome) are taken into account. We use two models, the first model allows only conservative events (reversals and moves), and the second model includes non-conservative events (insertions and deletions) in the intergenic regions. We show that both models result in NP-hard problems no matter if gene orientation is known or unknown. When the information regarding the orientation of genes is available, we present for both models an approximation algorithm with a factor of 2. For the scenario where this information is unavailable, we propose a 4-approximation algorithm for both models.


Subject(s)
Gene Rearrangement , Models, Genetic , DNA, Intergenic/genetics , Genome , Mutation , Algorithms
2.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 1641-1653, 2023.
Article in English | MEDLINE | ID: mdl-35385387

ABSTRACT

Most mathematical models for genome rearrangement problems have considered only gene order. In this way, the rearrangement distance considering some set of events, such as reversal and transposition events, is commonly defined as the minimum number of rearrangement events that transform the gene order from a genome G1 into the gene order from a genome G2. Recent works initiate incorporating more information such as the sizes of the intergenic regions (i.e., number of nucleotides between pairs of consecutive genes), which yields good results for estimated distances on real data. In these models, besides transforming the gene order, the sequence of rearrangement events must transform the list of intergenic regions sizes from G1 into the list of intergenic regions sizes from G2 (target list). We study a new variation where the target list is flexible, in the sense that each target intergenic region size is in a range of acceptable values. This allows us to model scenarios where the main objective is still to transform the order of genes from the source genome into the target genome, allowing flexibility in the sizes of the intergenic regions, since the nucleotides in these regions tend to undergo more changes when compared to genes. We investigate the rearrangement distance considering three sets of events, two with the exclusive use of reversals or transpositions, and the other allowing both rearrangement events. We present approximation algorithms for the problems and an NP-hardness proof. Our results rely on the Flexible Weighted Cycle Graph, adapted from the breakpoint graph to deal with flexible intergenic regions sizes.


Subject(s)
Gene Rearrangement , Genomics , Genomics/methods , Gene Rearrangement/genetics , Genome , Algorithms , Nucleotides , Models, Genetic
3.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 1628-1640, 2023.
Article in English | MEDLINE | ID: mdl-36260590

ABSTRACT

Recent works on genome rearrangements have shown that incorporating intergenic region information along with gene order in models provides better estimations for the rearrangement distance than using gene order alone. The reversal distance is one of the main problems in genome rearrangements. It has a polynomial time algorithm when only gene order is used to model genomes, assuming that repeated genes do not exist and that gene orientation is known, even when the genomes have distinct gene sets. The reversal distance is NP-hard and has a 2-approximation algorithm when incorporating intergenic regions. However, the problem has only been studied assuming genomes with the same set of genes. In this work, we consider the variation that incorporates intergenic regions and that allows genomes to have distinct sets of genes, a scenario that leads us to include indels operations (insertions and deletions). We present a 2.5-approximation algorithm using the labeled intergenic breakpoint graph, which is based on the well-known breakpoint graph structure. We also present an experimental analysis of the proposed algorithm using simulated data, which showed that the practical approximation factor is considerably less than 2.5. Furthermore, we used the algorithm in real genomes to construct a phylogenetic tree.


Subject(s)
Genome , Models, Genetic , Phylogeny , INDEL Mutation/genetics , Gene Rearrangement , Algorithms
4.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2094-2108, 2021.
Article in English | MEDLINE | ID: mdl-34232886

ABSTRACT

In comparative genomics, one goal is to find similarities between genomes of different organisms. Comparisons using genome features like genes, gene order, and regulatory sequences are carried out with this purpose in mind. Genome rearrangements are mutational events that affect large extensions of the genome. They are responsible for creating extant species with conserved genes in different positions across genomes. Close species - from an evolutionary point of view - tend to have the same set of genes or share most of them. When we consider gene order to compare two genomes, it is possible to use a parsimony criterion to estimate how close the species are. We are interested in the shortest sequence of genome rearrangements capable of transforming one genome into the other, which is named rearrangement distance. Reversal is one of the most studied genome rearrangements events. This event acts in a segment of the genome, inverting the position and the orientation of genes in it. Transposition is another widely studied event. This event swaps the position of two consecutive segments of the genome. When the genome has no gene repetition, a common approach is to map it as a permutation such that each element represents a conserved block. When genomes have replicated genes, this mapping is usually performed using strings. The number of replicas depends on the organisms being compared, but in many scenarios, it tends to be small. In this work, we study the rearrangement distance between genomes with replicated genes considering that the orientation of genes is unknown. We present four heuristics for the problem of genome rearrangement distance with replicated genes. We carry out experiments considering the exclusive use of the reversals or transpositions events, as well as the version in which both events are allowed. We developed a database of simulated genomes and compared our results with other algorithms from the literature. The experiments showed that our heuristics with more sophisticated rules presented a better performance than the known algorithms to estimate the evolutionary distance between genomes with replicated genes. In order to validate the application of our algorithms in real data, we construct a phylogenetic tree based on the distance provided by our algorithm and compare it with a know tree from the literature.


Subject(s)
Algorithms , Gene Rearrangement/genetics , Genomics/methods , Gene Dosage/genetics , Heuristics
5.
J Bioinform Comput Biol ; 19(4): 2150013, 2021 08.
Article in English | MEDLINE | ID: mdl-34162319

ABSTRACT

In the field of comparative genomics, one way of comparing two genomes is through the analysis of how they distinguish themselves based on a set of mutations called rearrangement events. When considering that genomes undergo different types of rearrangements, it can be assumed that some events are more common than others. To model this assumption, one can assign different weights to different events, where common events tend to cost less than others. However, this approach, called weighted, does not guarantee that the rearrangement assumed to be the most frequent will be also the most frequently returned by proposed algorithms. To overcome this issue, we investigate a new problem where we seek the shortest sequence of rearrangement events able to transform one genome into the other, with a restriction regarding the proportion between the events returned. Here, we consider two rearrangement events: reversal, that inverts the order and the orientation of the genes inside a segment of the genome, and transposition, that moves a segment of the genome to another position. We show the complexity of this problem for any desired proportion considering scenarios where the orientation of the genes is known or unknown. We also develop an approximation algorithm with a constant approximation factor for each scenario and, in particular, we describe an improved (asymptotic) approximation algorithm for the case where the gene orientation is known. At last, we present the experimental tests comparing the proposed algorithms with others from the literature for the version of the problem without the proportion restriction.


Subject(s)
Genome , Genomics , Algorithms , Gene Rearrangement , Models, Genetic , Mutation
6.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2080-2093, 2021.
Article in English | MEDLINE | ID: mdl-33945484

ABSTRACT

Genome Rearrangements are events that affect large stretches of genomes during evolution. Many mathematical models have been used to estimate the evolutionary distance between two genomes based on genome rearrangements. However, most of them focused on the (order of the) genes of a genome, disregarding other important elements in it. Recently, researchers have shown that considering regions between each pair of genes, called intergenic regions, can enhance distance estimation in realistic data. Two of the most studied genome rearrangements are the reversal, which inverts a sequence of genes, and the transposition, which occurs when two adjacent gene sequences swap their positions inside the genome. In this work, we study the transposition distance between two genomes, but we also consider intergenic regions, a problem we name Sorting by Intergenic Transpositions. We show that this problem is NP-hard and propose two approximation algorithms, with factors 3.5 and 2.5, considering two distinct definitions for the problem. We also investigate the signed reversal and transposition distance between two genomes considering their intergenic regions. This second problem is called Sorting by Signed Intergenic Reversals and Intergenic Transpositions. We show that this problem is NP-hard and develop two approximation algorithms, with factors 3 and 2.5. We check how these algorithms behave when assigning weights for genome rearrangements. Finally, we implemented all these algorithms and tested them on real and simulated data.


Subject(s)
Algorithms , Gene Rearrangement/genetics , Genome/genetics , Genomics/methods , DNA Transposable Elements/genetics , DNA, Intergenic/genetics , Sequence Analysis, DNA
7.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2870-2876, 2021.
Article in English | MEDLINE | ID: mdl-32396097

ABSTRACT

Genome rearrangements are mutations affecting large portions of a genome, and a reversal is one of the most studied genome rearrangements in the literature through the Sorting by Reversals (SbR) problem. SbR is solvable in polynomial time on signed permutations (i.e., the gene orientation is known), and it is NP-hard on unsigned permutations. This problem (and many others considering genome rearrangements) models genome as a list of its genes in the order they appear, ignoring all other information present in the genome. Recent works claimed that the incorporation of the size of intergenic regions, i.e., sequences of nucleotides between genes, may result in better estimators for the real distance between genomes. Here we introduce the Sorting Signed Permutations by Intergenic Reversals problem, that sorts a signed permutation using reversals both on gene order and intergenic sizes. We show that this problem is NP-hard by a reduction from the 3-partition problem. Then, we propose a 2-approximation algorithm for it. Finally, we also incorporate intergenic indels (i.e., insertions or deletions of intergenic regions) to overcome a limitation of sorting by conservative events (such as reversals) and propose two approximation algorithms.


Subject(s)
DNA, Intergenic/genetics , Gene Rearrangement/genetics , Genomics/legislation & jurisprudence , Algorithms , INDEL Mutation/genetics , Models, Genetic , Mutation/genetics
8.
J Comput Biol ; 27(2): 156-174, 2020 Feb.
Article in English | MEDLINE | ID: mdl-31891533

ABSTRACT

During the evolutionary process, genomes are affected by various genome rearrangements, that is, events that modify large stretches of the genetic material. In the literature, a large number of models have been proposed to estimate the number of events that occurred during evolution; most of them represent a genome as an ordered sequence of genes, and, in particular, disregard the genetic material between consecutive genes. However, recent studies showed that taking into account the genetic material between consecutive genes can enhance evolutionary distance estimations. Reversal and transposition are genome rearrangements that have been widely studied in the literature. A reversal inverts a (contiguous) segment of the genome, while a transposition swaps the positions of two consecutive segments. Genomes also undergo nonconservative events (events that alter the amount of genetic material) such as insertions and deletions, in which genetic material from intergenic regions of the genome is inserted or deleted, respectively. In this article, we study a genome rearrangement model that considers both gene order and sizes of intergenic regions. We investigate the reversal distance, and also the reversal and transposition distance between two genomes in two scenarios: with and without nonconservative events. We show that these problems are NP-hard and we present constant ratio approximation algorithms for all of them. More precisely, we provide a 4-approximation algorithm for the reversal distance, both in the conservative and nonconservative versions. For the reversal and transposition distance, we provide a 4.5-approximation algorithm, both in the conservative and nonconservative versions. We also perform experimental tests to verify the behavior of our algorithms, as well as to compare the practical and theoretical results. We finally extend our study to scenarios in which events have different costs, and we present constant ratio approximation algorithms for each scenario.

9.
Article in English | MEDLINE | ID: mdl-31603793

ABSTRACT

We present three heuristics-Sliding Window, Look Ahead, and Iterative Sliding Window-to improve solutions for the Sorting Signed Permutations by Reversals and Transpositions Problem. We investigate the classical version of the problem as well as versions restricted to prefix and prefix or suffix operations. To assess the heuristics based on its improvement, we implemented algorithms described in the literature to provide initial solutions. Although we have a limited number of problems, these heuristics can be applied to many others within the area of genome rearrangement. When time is a crucial factor, Sliding Window is a better choice because it runs in linear time. If the quality of the solution is a priority, Look Ahead should be preferred. Iterative Sliding Window is the most flexible heuristic and allows us to find a trade-off for specific scenarios where running time and solution quality are relevant.


Subject(s)
Computational Biology/methods , Gene Rearrangement/genetics , Heuristics , Algorithms , Models, Genetic
10.
J Comput Biol ; 26(11): 1223-1229, 2019 11.
Article in English | MEDLINE | ID: mdl-31120331

ABSTRACT

In comparative genomics, rearrangements are mutations that affect a stretch of DNA sequences. Reversals and transpositions are well-known rearrangements, and each has a vast literature. The reversal and transposition distance, that is, the minimum number of reversals and transpositions needed to transform one genome into another is a relevant evolutionary distance. The problem of computing this distance when genomes are represented by permutations was proposed >20 years ago and received the name of sorting by reversals and transpositions problem. It has been the focus of a number of studies, but the computational complexity has remained open until now. We hereby solve this question and prove that it is NP-hard no matter whether genomes are represented by signed or unsigned permutations. In addition, we prove that a usual generalization of this problem, which assigns weights wρ for reversals and wτ for transpositions, is also NP-hard as long as wτ/wρ ≤ 1.5 for both signed and unsigned permutations.


Subject(s)
Base Sequence/genetics , Computational Biology/methods , Genomics/methods , Algorithms , Gene Rearrangement , Genome/genetics , Mutation/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...