Search | VHL Regional Portal

1.

A Barrier for Further Approximating Sorting by Transpositions.

Silva, Luiz A G; Kowada, Luis A B; Walter, Maria E M T.

J Comput Biol ; 30(12): 1277-1288, 2023 Dec.

Article in English | MEDLINE | ID: mdl-37883640

ABSTRACT

The transposition distance problem is a classical problem in genome rearrangements, which seeks to determine the minimum number of transpositions needed to transform a linear chromosome into another represented by the permutations π and σ, respectively. This article focuses on the equivalent problem of sorting by transpositions (SBT), where σ is the identity permutation Î¹. Specifically, we investigate palisades, a family of permutations that are "hard" to sort, as they require numerous transpositions above the celebrated lower bound devised by Bafna and Pevzner. By determining the transposition distance of palisades, we were able to provide the exact transposition diameter for 3-permutations (TD3), a special subset of the symmetric group Sn, essential for the study of approximate solutions for SBT using the simplification technique. The exact value for TD3 has remained unknown since Elias and Hartman showed an upper bound for it. Another consequence of determining the transposition distance of palisades is that, using as lower bound the one by Bafna and Pevzner, it is impossible to guarantee approximation ratios lower than 1.375 when approximating SBT. This finding has significant implications for the study of SBT, as this problem has been the subject of intense research efforts for the past 25 years.

Subject(s)

Algorithms , Genome , Gene Rearrangement , Models, Genetic

2.

Rearrangement distance with reversals, indels, and moves in intergenic regions on signed and unsigned permutations.

Brito, Klairton Lima; Oliveira, Andre Rodrigues; Alexandrino, Alexsandro Oliveira; Dias, Ulisses; Dias, Zanoni.

J Bioinform Comput Biol ; 21(2): 2350009, 2023 04.

Article in English | MEDLINE | ID: mdl-37104034

ABSTRACT

Genome rearrangement events are widely used to estimate a minimum-size sequence of mutations capable of transforming a genome into another. The length of this sequence is called distance, and determining it is the main goal in genome rearrangement distance problems. Problems in the genome rearrangement field differ regarding the set of rearrangement events allowed and the genome representation. In this work, we consider the scenario where the genomes share the same set of genes, gene orientation is known or unknown, and intergenic regions (structures between a pair of genes and at the extremities of the genome) are taken into account. We use two models, the first model allows only conservative events (reversals and moves), and the second model includes non-conservative events (insertions and deletions) in the intergenic regions. We show that both models result in NP-hard problems no matter if gene orientation is known or unknown. When the information regarding the orientation of genes is available, we present for both models an approximation algorithm with a factor of 2. For the scenario where this information is unavailable, we propose a 4-approximation algorithm for both models.

Subject(s)

Gene Rearrangement , Models, Genetic , DNA, Intergenic/genetics , Genome , Mutation , Algorithms

3.

A new 1.375-approximation algorithm for sorting by transpositions.

Silva, Luiz Augusto G; Kowada, Luis Antonio B; Rocco, Noraí Romeu; Walter, Maria Emília M T.

Algorithms Mol Biol ; 17(1): 1, 2022 Jan 15.

Article in English | MEDLINE | ID: mdl-35033127

ABSTRACT

BACKGROUND: SORTING BY TRANSPOSITIONS (SBT) is a classical problem in genome rearrangements. In 2012, SBT was proven to be [Formula: see text]-hard and the best approximation algorithm with a 1.375 ratio was proposed in 2006 by Elias and Hartman (EH algorithm). Their algorithm employs simplification, a technique used to transform an input permutation [Formula: see text] into a simple permutation [Formula: see text], presumably easier to handle with. The permutation [Formula: see text] is obtained by inserting new symbols into [Formula: see text] in a way that the lower bound of the transposition distance of [Formula: see text] is kept on [Formula: see text]. The simplification is guaranteed to keep the lower bound, not the transposition distance. A sequence of operations sorting [Formula: see text] can be mimicked to sort [Formula: see text]. RESULTS AND CONCLUSIONS: First, using an algebraic approach, we propose a new upper bound for the transposition distance, which holds for all [Formula: see text]. Next, motivated by a problem identified in the EH algorithm, which causes it, in scenarios involving how the input permutation is simplified, to require one extra transposition above the 1.375-approximation ratio, we propose a new approximation algorithm to solve SBT ensuring the 1.375-approximation ratio for all [Formula: see text]. We implemented our algorithm and EH's. Regarding the implementation of the EH algorithm, two other issues were identified and needed to be fixed. We tested both algorithms against all permutations of size n, [Formula: see text]. The results show that the EH algorithm exceeds the approximation ratio of 1.375 for permutations with a size greater than 7. The percentage of computed distances that are equal to transposition distance, computed by the implemented algorithms are also compared with others available in the literature. Finally, we investigate the performance of both implementations on longer permutations of maximum length 500. From the experiments, we conclude that maximum and the average distances computed by our algorithm are a little better than the ones computed by the EH algorithm and the running times of both algorithms are similar, despite the time complexity of our algorithm being higher.

4.

Diversity in the composition of the accessory genome of Mexican Pseudomonas aeruginosa strains.

Espinosa-Camacho, Luis F; Delgado, Gabriela; Cravioto, Alejandro; Morales-Espinosa, Rosario.

Genes Genomics ; 44(1): 53-77, 2022 01.

Article in English | MEDLINE | ID: mdl-34410625

ABSTRACT

BACKGROUND: Pseudomonas aeruginosa is an important opportunistic pathogen especially in nosocomial infections due to its easy adaptation to different environments; this characteristic is due to the great genetic diversity that presents its genome. In addition, it is considered a pathogen of critical priority due to the high antimicrobial resistance. OBJECTIVES: The aim of this study was to characterize the mobile genetic elements present in the chromosome of six Mexican P. aeruginosa strains isolated from adults with pneumonia and children with bacteremia. METHODS: The genomic DNA of six P. aeruginosa strains were isolated and sequenced using PacBio RS-II platform. They were annotated using Prokaryotic Genome Annotation Pipeline and manually curated and analyzed for the presence of mobile genetic elements, antibiotic resistances genes, efflux pumps and virulence factors using several bioinformatics programs and databases. RESULTS: The global analysis of the strains chromosomes showed a novel chromosomal rearrangement in two strains, possibly mediated by subsequent recombination and inversion events. They have a high content of mobile genetic elements: 21 genomic islands, four new islets, four different integrative conjugative elements, 28 different prophages, one CRISPR-Cas arrangements, and one class 1 integron. The acquisition of antimicrobials resistance genes into these elements are in concordance with their phenotype of multi-drug resistance. CONCLUSION: The accessory genome increased the ability of the strains to adapt or survive to the hospital environment, promote genomic plasticity and chromosomal rearrangements, which may affect the expression or functionality of the gene and might influence the clinical outcome, having an impact on the treatment.

Subject(s)

Genetic Variation , Genome Size/genetics , Genome, Bacterial/genetics , Genomic Islands/genetics , Genomics/methods , Pseudomonas aeruginosa/genetics , Adult , Bacteremia/microbiology , Child , Computational Biology/methods , DNA Transposable Elements/genetics , Humans , Mexico , Phylogeny , Pneumonia, Bacterial/microbiology , Pseudomonas Infections/microbiology , Pseudomonas aeruginosa/classification , Pseudomonas aeruginosa/pathogenicity , Sequence Analysis, DNA/methods , Virulence/genetics

5.

Labeled Cycle Graph for Transposition and Indel Distance.

Alexandrino, Alexsandro Oliveira; Oliveira, Andre Rodrigues; Dias, Ulisses; Dias, Zanoni.

J Comput Biol ; 29(3): 243-256, 2022 03.

Article in English | MEDLINE | ID: mdl-34724796

ABSTRACT

In the comparative genomics field, one way to infer the evolutionary distance between two organisms of related species is by finding the minimum number of large-scale mutations, called genome rearrangements, that transform one genome into the other. This number is referred to as the rearrangement distance. Since problems in this area emerged in the mid-1990s, several genome rearrangements have been proposed. Rearrangements that do not alter the genome content are called conservative, and in this group we have the following: the reversal, which inverts a segment of the genome; the transposition, which exchanges two consecutive segments; and the double cut and join, which cuts two different pairs of adjacent blocks and joins them differently. Seminal works compared genomes sharing the same set of conserved blocks, but nowadays, researchers started looking at genomes with unequal gene content, by allowing the use of nonconservative rearrangements such as insertion and deletion (jointly called indel). The transposition distance and the transposition and indel distance are both NP-hard. We investigate the transposition and indel distance and present a structure called labeled cycle graph, representing an instance of rearrangement distance problems for genomes with unequal gene content. This structure is used to devise a lower bound and a 2-approximation algorithm for the transposition and indel distance.

Subject(s)

Genome , INDEL Mutation , Algorithms , Gene Rearrangement , Genomics , Models, Genetic

6.

Incorporating intergenic regions into reversal and transposition distances with indels.

Alexandrino, Alexsandro Oliveira; Oliveira, Andre Rodrigues; Dias, Ulisses; Dias, Zanoni.

J Bioinform Comput Biol ; 19(6): 2140011, 2021 12.

Article in English | MEDLINE | ID: mdl-34775923

ABSTRACT

Problems in the genome rearrangement field are often formulated in terms of pairwise genome comparison: given two genomes [Formula: see text] and [Formula: see text], find the minimum number of genome rearrangements that may have occurred during the evolutionary process. This broad definition lacks at least two important considerations: the first being which features are extracted from genomes to create a useful mathematical model, and the second being which types of genome rearrangement events should be represented. Regarding the first consideration, seminal works in the genome rearrangement field solely used gene order to represent genomes as permutations of integer numbers, neglecting many important aspects like gene duplication, intergenic regions, and complex interactions between genes. Regarding the second consideration, some rearrangement events are widely studied such as reversals and transpositions. In this paper, we shed light on the first consideration and created a model that takes into account gene order and the number of nucleotides in intergenic regions. In addition, we consider events of reversals, transpositions, and indels (insertions and deletions) of genomic material. We present a 4-approximation algorithm for reversals and indels, a [Formula: see text]-approximation algorithm for transpositions and indels, and a 6-approximation for reversals, transpositions, and indels.

Subject(s)

Genome , Models, Genetic , Algorithms , DNA, Intergenic/genetics , Gene Rearrangement , Genomics

7.

Genome Rearrangement Distance with Reversals, Transpositions, and Indels.

Alexandrino, Alexsandro Oliveira; Oliveira, Andre Rodrigues; Dias, Ulisses; Dias, Zanoni.

J Comput Biol ; 28(3): 235-247, 2021 03.

Article in English | MEDLINE | ID: mdl-33085536

ABSTRACT

The rearrangement distance is a well-known problem in the field of comparative genomics. Given two genomes, the rearrangement distance is the minimum number of rearrangements in a set of allowed rearrangements (rearrangement model), which transforms one genome into the other. In rearrangement distance problems, a genome is modeled as a string, where each element represents a conserved region within the two genomes. When the orientation of the genes is known, it is represented by (plus or minus) signs assigned to the elements of the string. Two of the most studied rearrangements are reversals, which invert a segment of the genome, and transpositions, which exchange the relative positions of two adjacent segments of the genome. The first works in genome rearrangements considered that the genomes being compared had the same genetic material and that rearrangement events were restricted to reversals, transpositions, or both. El-Mabrouk extended the reversal model on signed strings to include the operations of insertion and deletion of segments in the genome, which allowed the comparison of genomes with different genetic material. Other studies also addressed this problem and, recently, this problem was proved to be solvable in polynomial time by Willing et al. For unsigned strings, we still observe a lack of results. That said, in this study we prove that computing the rearrangement distance for the following models is NP-Hard: reversals and indels on unsigned strings; transpositions and indels on unsigned strings; and reversals, transpositions, and indels on signed and unsigned strings. Along with the NP-hardness proofs, we present a 2-approximation algorithm for reversals on unsigned strings and 3-approximation algorithms for the other models.

Subject(s)

Gene Rearrangement/genetics , Genome/genetics , INDEL Mutation/genetics , Algorithms , Genomics/methods , Models, Genetic

8.

Sorting permutations by fragmentation-weighted operations.

Alexandrino, Alexsandro Oliveira; Lintzmayer, Carla Negri; Dias, Zanoni.

J Bioinform Comput Biol ; 18(2): 2050006, 2020 04.

Article in English | MEDLINE | ID: mdl-32326802

ABSTRACT

One of the main problems in Computational Biology is to find the evolutionary distance among species. In most approaches, such distance only involves rearrangements, which are mutations that alter large pieces of the species' genome. When we represent genomes as permutations, the problem of transforming one genome into another is equivalent to the problem of Sorting Permutations by Rearrangement Operations. The traditional approach is to consider that any rearrangement has the same probability to happen, and so, the goal is to find a minimum sequence of operations which sorts the permutation. However, studies have shown that some rearrangements are more likely to happen than others, and so a weighted approach is more realistic. In a weighted approach, the goal is to find a sequence which sorts the permutations, such that the cost of that sequence is minimum. This work introduces a new type of cost function, which is related to the amount of fragmentation caused by a rearrangement. We present some results about the lower and upper bounds for the fragmentation-weighted problems and the relation between the unweighted and the fragmentation-weighted approach. Our main results are 2-approximation algorithms for five versions of this problem involving reversals and transpositions. We also give bounds for the diameters concerning these problems and provide an improved approximation factor for simple permutations considering transpositions.

Subject(s)

Algorithms , Computational Biology/methods , Genome , Genomics/methods , Gene Rearrangement , Mutation , Probability

9.

Sorting by Genome Rearrangements on Both Gene Order and Intergenic Sizes.

Brito, Klairton Lima; Jean, Géraldine; Fertin, Guillaume; Oliveira, Andre Rodrigues; Dias, Ulisses; Dias, Zanoni.

J Comput Biol ; 27(2): 156-174, 2020 Feb.

Article in English | MEDLINE | ID: mdl-31891533

ABSTRACT

During the evolutionary process, genomes are affected by various genome rearrangements, that is, events that modify large stretches of the genetic material. In the literature, a large number of models have been proposed to estimate the number of events that occurred during evolution; most of them represent a genome as an ordered sequence of genes, and, in particular, disregard the genetic material between consecutive genes. However, recent studies showed that taking into account the genetic material between consecutive genes can enhance evolutionary distance estimations. Reversal and transposition are genome rearrangements that have been widely studied in the literature. A reversal inverts a (contiguous) segment of the genome, while a transposition swaps the positions of two consecutive segments. Genomes also undergo nonconservative events (events that alter the amount of genetic material) such as insertions and deletions, in which genetic material from intergenic regions of the genome is inserted or deleted, respectively. In this article, we study a genome rearrangement model that considers both gene order and sizes of intergenic regions. We investigate the reversal distance, and also the reversal and transposition distance between two genomes in two scenarios: with and without nonconservative events. We show that these problems are NP-hard and we present constant ratio approximation algorithms for all of them. More precisely, we provide a 4-approximation algorithm for the reversal distance, both in the conservative and nonconservative versions. For the reversal and transposition distance, we provide a 4.5-approximation algorithm, both in the conservative and nonconservative versions. We also perform experimental tests to verify the behavior of our algorithms, as well as to compare the practical and theoretical results. We finally extend our study to scenarios in which events have different costs, and we present constant ratio approximation algorithms for each scenario.

10.

Super short operations on both gene order and intergenic sizes.

Oliveira, Andre R; Jean, Géraldine; Fertin, Guillaume; Dias, Ulisses; Dias, Zanoni.

Algorithms Mol Biol ; 14: 21, 2019.

Article in English | MEDLINE | ID: mdl-31709002

ABSTRACT

BACKGROUND: The evolutionary distance between two genomes can be estimated by computing a minimum length sequence of operations, called genome rearrangements, that transform one genome into another. Usually, a genome is modeled as an ordered sequence of genes, and most of the studies in the genome rearrangement literature consist in shaping biological scenarios into mathematical models. For instance, allowing different genome rearrangements operations at the same time, adding constraints to these rearrangements (e.g., each rearrangement can affect at most a given number of genes), considering that a rearrangement implies a cost depending on its length rather than a unit cost, etc. Most of the works, however, have overlooked some important features inside genomes, such as the presence of sequences of nucleotides between genes, called intergenic regions. RESULTS AND CONCLUSIONS: In this work, we investigate the problem of computing the distance between two genomes, taking into account both gene order and intergenic sizes. The genome rearrangement operations we consider here are constrained types of reversals and transpositions, called super short reversals (SSRs) and super short transpositions (SSTs), which affect up to two (consecutive) genes. We denote by super short operations (SSOs) any SSR or SST. We show 3-approximation algorithms when the orientation of the genes is not considered when we allow SSRs, SSTs, or SSOs, and 5-approximation algorithms when considering the orientation for either SSRs or SSOs. We also show that these algorithms improve their approximation factors when the input permutation has a higher number of inversions, where the approximation factor decreases from 3 to either 2 or 1.5, and from 5 to either 3 or 2.

11.

On the Complexity of Sorting by Reversals and Transpositions Problems.

Oliveira, Andre Rodrigues; Brito, Klairton Lima; Dias, Ulisses; Dias, Zanoni.

J Comput Biol ; 26(11): 1223-1229, 2019 11.

Article in English | MEDLINE | ID: mdl-31120331

ABSTRACT

In comparative genomics, rearrangements are mutations that affect a stretch of DNA sequences. Reversals and transpositions are well-known rearrangements, and each has a vast literature. The reversal and transposition distance, that is, the minimum number of reversals and transpositions needed to transform one genome into another is a relevant evolutionary distance. The problem of computing this distance when genomes are represented by permutations was proposed >20 years ago and received the name of sorting by reversals and transpositions problem. It has been the focus of a number of studies, but the computational complexity has remained open until now. We hereby solve this question and prove that it is NP-hard no matter whether genomes are represented by signed or unsigned permutations. In addition, we prove that a usual generalization of this problem, which assigns weights wρ for reversals and wτ for transpositions, is also NP-hard as long as wτ/wρ ≤ 1.5 for both signed and unsigned permutations.

Subject(s)

Base Sequence/genetics , Computational Biology/methods , Genomics/methods , Algorithms , Gene Rearrangement , Genome/genetics , Mutation/genetics

12.

Sorting signed circular permutations by super short operations.

Oliveira, Andre R; Fertin, Guillaume; Dias, Ulisses; Dias, Zanoni.

Algorithms Mol Biol ; 13: 13, 2018.

Article in English | MEDLINE | ID: mdl-30065782

ABSTRACT

BACKGROUND: One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or genome rearrangements, that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form π=(π1π2πn) , and in that case we can consider without loss of generality that one of them is the identity permutation Î¹n=(12n) , and that we just need to sort the other (i.e., transform it into Î¹n ). The most studied genome rearrangement events are reversals, where a segment of the genome is reversed and reincorporated at the same location; and transpositions, where two consecutive segments are exchanged. Many variants, e.g., combining different types of (possibly constrained) rearrangements, have been proposed in the literature. One of them considers that the number of genes involved, in a reversal or a transposition, is never greater than two, which is known as the problem of sorting by super short operations (or SSOs). RESULTS AND CONCLUSIONS: All problems considering SSOs in permutations have been shown to be in P , except for one, namely sorting signed circular permutations by super short reversals and super short transpositions. Here we fill this gap by introducing a new graph structure called cyclic permutation graph and providing a series of intermediate results, which allows us to design a polynomial algorithm for sorting signed circular permutations by super short reversals and super short transpositions.

13.

Organization of plastid genomes in the freshwater red algal order Batrachospermales (Rhodophyta).

Paiano, Monica Orlandi; Del Cortona, Andrea; Costa, Joana F; Liu, Shao-Lun; Verbruggen, Heroen; De Clerck, Olivier; Necchi, Orlando.

J Phycol ; 54(1): 25-33, 2018 02.

Article in English | MEDLINE | ID: mdl-29077982

ABSTRACT

Little is known about genome organization in members of the order Batrachospermales, and the infra-ordinal relationship remains unresolved. Plastid (cp) genomes of seven members of the freshwater red algal order Batrachospermales were sequenced, with the following aims: (i) to describe the characteristics of cp genomes and compare these with other red algal groups; (ii) to infer the phylogenetic relationships among these members to better understand the infra-ordinal classification. Cp genomes of Batrachospermales are large, with several cases of gene loss, they are gene-dense (high gene content for the genome size and short intergenic regions) and have highly conserved gene order. Phylogenetic analyses based on concatenated nucleotide genome data roughly supports the current taxonomic system for the order. Comparative analyses confirm data for members of the class Florideophyceae that cp genomes in Batrachospermales is highly conserved, with little variation in gene composition. However, relevant new features were revealed in our study: genome sizes in members of Batrachospermales are close to the lowest values reported for Florideophyceae; differences in cp genome size within the order are large in comparison with other orders (Ceramiales, Gelidiales, Gracilariales, Hildenbrandiales, and Nemaliales); and members of Batrachospermales have the lowest number of protein-coding genes among the Florideophyceae. In terms of gene loss, apcF, which encodes the allophycocyanin beta subunit, is absent in all sequenced taxa of Batrachospermales. We reinforce that the interordinal relationships between the freshwater orders Batrachospermales and Thoreales within the Nemaliophycidae is not well resolved due to limited taxon sampling.

Subject(s)

Genome, Chloroplast , Genome, Plant , Rhodophyta/genetics , Base Sequence , Phylogeny , Sequence Alignment , Whole Genome Sequencing

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL