Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
Add more filters










Publication year range
1.
Methods Mol Biol ; 2802: 57-72, 2024.
Article in English | MEDLINE | ID: mdl-38819556

ABSTRACT

The comparison of large-scale genome structures across distinct species offers valuable insights into the species' phylogeny, genome organization, and gene associations. In this chapter, we review the family-free genome comparison tool FFGC that, relying on built-in interfaces with a sequence comparison tool (either BLAST+ or DIAMOND) and with an ILP solver (either CPLEX or Gurobi), provides several methods for analyses that do not require prior classification of genes across the studied genomes. Taking annotated genome sequences as input, FFGC is a complete workflow for genome comparison allowing not only the computation of measures of similarity and dissimilarity but also the inference of gene families, simultaneously based on sequence similarities and large-scale genomic features.


Subject(s)
Genomics , Phylogeny , Software , Genomics/methods , Genome , Computational Biology/methods , Humans
2.
Algorithms Mol Biol ; 19(1): 1, 2024 Jan 04.
Article in English | MEDLINE | ID: mdl-38178195

ABSTRACT

BACKGROUND: Two genomes [Formula: see text] and [Formula: see text] over the same set of gene families form a canonical pair when each of them has exactly one gene from each family. Denote by [Formula: see text] the number of common families of [Formula: see text] and [Formula: see text]. Different distances of canonical genomes can be derived from a structure called breakpoint graph, which represents the relation between the two given genomes as a collection of cycles of even length and paths. Let [Formula: see text] and [Formula: see text] be respectively the numbers of cycles of length i and of paths of length j in the breakpoint graph of genomes [Formula: see text] and [Formula: see text]. Then, the breakpoint distance of [Formula: see text] and [Formula: see text] is equal to [Formula: see text]. Similarly, when the considered rearrangements are those modeled by the double-cut-and-join (DCJ) operation, the rearrangement distance of [Formula: see text] and [Formula: see text] is [Formula: see text], where c is the total number of cycles and [Formula: see text] is the total number of paths of even length. MOTIVATION: The distance formulation is a basic unit for several other combinatorial problems related to genome evolution and ancestral reconstruction, such as median or double distance. Interestingly, both median and double distance problems can be solved in polynomial time for the breakpoint distance, while they are NP-hard for the rearrangement distance. One way of exploring the complexity space between these two extremes is to consider a [Formula: see text] distance, defined to be [Formula: see text], and increasingly investigate the complexities of median and double distance for the [Formula: see text] distance, then the [Formula: see text] distance, and so on. RESULTS: While for the median much effort was done in our and in other research groups but no progress was obtained even for the [Formula: see text] distance, for solving the double distance under [Formula: see text] and [Formula: see text] distances we could devise linear time algorithms, which we present here.

3.
Algorithms Mol Biol ; 18(1): 14, 2023 Sep 28.
Article in English | MEDLINE | ID: mdl-37770945

ABSTRACT

BACKGROUND: Recently we developed a gene orthology inference tool based on genome rearrangements (Journal of Bioinformatics and Computational Biology 19:6, 2021). Given a set of genomes our method first computes all pairwise gene similarities. Then it runs pairwise ILP comparisons to compute optimal gene matchings, which minimize, by taking the similarities into account, the weighted rearrangement distance between the analyzed genomes (a problem that is NP-hard). The gene matchings are then integrated into gene families in the final step. The mentioned ILP includes an optimal capping that connects each end of a linear segment of one genome to an end of a linear segment in the other genome, producing an exponential increase of the search space. RESULTS: In this work, we design and implement a heuristic capping algorithm that replaces the optimal capping by clustering (based on their gene content intersections) the linear segments into [Formula: see text] subsets, whose ends are capped independently. Furthermore, in each subset, instead of allowing all possible connections, we let only the ends of content-related segments be connected. Although there is no guarantee that m is much bigger than one, and with the possible side effect of resulting in sub-optimal instead of optimal gene matchings, the heuristic works very well in practice, from both the speed performance and the quality of computed solutions. Our experiments on primate and fruit fly genomes show two positive results. First, for complete assemblies of five primates the version with heuristic capping reports orthologies that are very similar to the orthologies computed by the version of our tool with optimal capping. Second, we were able to efficiently analyze fruit fly genomes with incomplete assemblies distributed in hundreds or even thousands of contigs, obtaining gene families that are very similar to [Formula: see text] families. Indeed, our tool inferred a higher number of complete cliques, with a higher intersection with [Formula: see text], when compared to gene families computed by other inference tools. We added a post-processing for refining, with the aid of the [Formula: see text] algorithm, our ambiguous families (those with more than one gene per genome), improving even more the accuracy of our results. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities and the post-processing refinement of ambiguous families with [Formula: see text]. Both the original version with optimal capping and the new modified version with heuristic capping can be downloaded, together with their detailed documentations, at https://gitlab.ub.uni-bielefeld.de/gi/FFGC or as a Conda package at https://anaconda.org/bioconda/ffgc .

4.
J Bioinform Comput Biol ; 19(6): 2140014, 2021 12.
Article in English | MEDLINE | ID: mdl-34775922

ABSTRACT

Recently, we proposed an efficient ILP formulation [Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, Algorithms Mol Biol 16:4, 2021] for exactly computing the rearrangement distance of two genomes in a family-free setting. In such a setting, neither prior classification of genes into families, nor further restrictions on the genomes are imposed. Given two genomes, the mentioned ILP computes an optimal matching of the genes taking into account simultaneously local mutations, given by gene similarities, and large-scale genome rearrangements. Here, we explore the potential of using this ILP for inferring groups of orthologs across several species. More precisely, given a set of genomes, our method first computes all pairwise optimal gene matchings, which are then integrated into gene families in the second step. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities. It can be downloaded from gitlab.ub.uni-bielefeld.de/gi/FFGC. We obtained promising results with experiments on both simulated and real data.


Subject(s)
Genome , Models, Genetic , Algorithms , Gene Rearrangement , Genomics , Humans
5.
Algorithms Mol Biol ; 16(1): 4, 2021 May 10.
Article in English | MEDLINE | ID: mdl-33971908

ABSTRACT

BACKGROUND: A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome. The traditional approaches in this area are family-based, i.e., require the classification of DNA fragments of both genomes into families. Furthermore, the most elementary family-based models, which are able to compute distances in polynomial time, restrict the families to occur at most once in each genome. In contrast, the distance computation in models that allow multifamilies (i.e., families with multiple occurrences) is NP-hard. Very recently, Bohnenkämper et al. (J Comput Biol 28:410-431, 2021) proposed an ILP formulation for computing the genomic distance of genomes with multifamilies, allowing structural rearrangements, represented by the generic double cut and join (DCJ) operation, and content-modifying insertions and deletions of DNA segments. This ILP is very efficient, but must maximize a matching of the genes in each multifamily, in order to prevent the free lunch artifact that would otherwise let empty or almost empty matchings give smaller distances. RESULTS: In this paper, we adopt the alternative family-free setting that, instead of family classification, simply uses the pairwise similarities between DNA fragments of both genomes to compute their rearrangement distance. We adapted the ILP mentioned above and developed a model in which pairwise similarities are used to assign weights to both matched and unmatched genes, so that an optimal solution does not necessarily maximize the matching. Our model then results in a natural family-free genomic distance, that takes into consideration all given genes, without prior classification into families, and has a search space composed of matchings of any size. In spite of its bigger search space, our ILP seems to be boosted by a reduction of the number of co-optimal solutions due to the weights. Indeed, it converged faster than the original one by Bohnenkämper et al. for instances with the same number of multiple connections. We can handle not only bacterial genomes, but also fungi and insects, or sets of chromosomes of mammals and plants. In a comparison study of six fruit fly genomes, we obtained accurate results.

6.
J Comput Biol ; 28(4): 410-431, 2021 04.
Article in English | MEDLINE | ID: mdl-33393848

ABSTRACT

The computation of genomic distances has been a very active field of computational comparative genomics over the past 25 years. Substantial results include the polynomial-time computability of the inversion distance by Hannenhalli and Pevzner in 1995 and the introduction of the double cut and join distance by Yancopoulos et al. in 2005. Both results, however, rely on the assumption that the genomes under comparison contain the same set of unique markers (syntenic genomic regions, sometimes also referred to as genes). In 2015, Shao et al. relax this condition by allowing for duplicate markers in the analysis. This generalized version of the genomic distance problem is NP-hard, and they give an integer linear programming (ILP) solution that is efficient enough to be applied to real-world datasets. A restriction of their approach is that it can be applied only to balanced genomes that have equal numbers of duplicates of any marker. Therefore, it still needs a delicate preprocessing of the input data in which excessive copies of unbalanced markers have to be removed. In this article, we present an algorithm solving the genomic distance problem for natural genomes, in which any marker may occur an arbitrary number of times. Our method is based on a new graph data structure, the multi-relational diagram, that allows an elegant extension of the ILP by Shao et al. to count runs of markers that are under- or over-represented in one genome with respect to the other and need to be inserted or deleted, respectively. With this extension, previous restrictions on the genome configurations are lifted, for the first time enabling an uncompromising rearrangement analysis. Any marker sequence can directly be used for the distance calculation. The evaluation of our approach shows that it can be used to analyze genomes with up to a few 10,000 markers, which we demonstrate on simulated and real data.


Subject(s)
Computational Biology , Gene Rearrangement/genetics , Genome/genetics , Genomics , Algorithms , Models, Genetic , Programming, Linear
7.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2314-2326, 2021.
Article in English | MEDLINE | ID: mdl-32324562

ABSTRACT

The inversion distance, that is the distance between two unichromosomal genomes with the same content allowing only inversions of DNA segments, can be exactly computed thanks to a pioneering approach of Hannenhalli and Pevzner from 1995. In 2000, El-Mabrouk extended the inversion model to perform the comparison of unichromosomal genomes with unequal contents, combining inversions with insertions and deletions (indels) of DNA segments, giving rise to the inversion-indel distance. However, only a heuristic was provided for its computation. In 2005, Yancopoulos, Attie and Friedberg started a new branch of research by introducing the generic double cut and join (DCJ) operation, that can represent several genome rearrangements (including inversions). In 2006, Bergeron, Mixtacki and Stoye showed that the DCJ distance can be computed in linear time with a very simple procedure. As a consequence, in 2010 we gave a linear-time algorithm to compute the DCJ-indel distance. This result allowed the inversion-indel model to be revisited from another angle. In 2013, we could show that, when the diagram that represents the relation between the two compared genomes has no bad components, the inversion-indel distance is equal to the DCJ-indel distance. In the present work we complete the study of the inversion-indel distance by giving the first algorithm to compute it exactly even in the presence of bad components.


Subject(s)
Genomics/methods , INDEL Mutation/genetics , Algorithms , Gene Rearrangement/genetics
8.
BMC Bioinformatics ; 19(Suppl 6): 152, 2018 05 08.
Article in English | MEDLINE | ID: mdl-29745861

ABSTRACT

BACKGROUND: The genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. This problem is called family-free DCJ similarity. RESULTS: We propose an exact ILP algorithm to solve the family-free DCJ similarity problem, then we show its APX-hardness and present four combinatorial heuristics with computational experiments comparing their results to the ILP. CONCLUSIONS: We show that the family-free DCJ similarity can be computed in reasonable time, although for larger genomes it is necessary to resort to heuristics. This provides a basis for further studies on the applicability and model refinement of family-free whole genome similarity measures.


Subject(s)
Models, Genetic , Phylogeny , Algorithms , Animals , Computer Simulation , Databases, Genetic , Genome , Genomics , Heuristics , Humans , Mice , Rats
9.
Article in English | MEDLINE | ID: mdl-28113562

ABSTRACT

We determine complexity of computing the DCJ-indel distance, when DCJ and indel operations have distinct constant costs, by showing an exact formula that can be computed in linear time for any choice of (constant) costs for DCJ and indel operations. We additionally consider the problem of triangular inequality disruption and propose an algorithmically efficient correction on each member of the family of DCJ-indel.


Subject(s)
Algorithms , Chromosome Mapping/methods , Gene Rearrangement/genetics , Genome/genetics , INDEL Mutation/genetics
10.
Article in English | MEDLINE | ID: mdl-26357261

ABSTRACT

Rearrangements are mutations that can change the organization of a genome, but not its content. Examples are inversions of DNA segments, translocations of chromosome ends, fusions and fissions of chromosomes. All mentioned rearrangements can be represented by the generic Double Cut and Join (DCJ) operation. However, the DCJ operation also allows circular chromosomes to be created at intermediate steps, even if the compared genomes are linear. In this case it is more plausible to consider a restriction in which the reincorporation of a circular chromosome has to be done immediately after its creation. We call these two consecutive operations an ER composition. It has been shown that an ER composition mimics either an internal block interchange (when two segments in the same chromosome exchange their positions), or an internal transposition (the special case of a block interchange when the two segments are adjacent). The DCJ distance of two genomes is the same, regardless of this restriction, and can be computed in linear time. For comparing two genomes with unequal contents, in addition to rearrangements we have to allow insertions and deletions of DNA segments-named indels. It is already known that the distance in the model combining DCJ and indel operations can be exactly computed. Again, for linear genomes it would be more plausible to adopt a restricted version with ER compositions. This model was studied recently by da Silva et al. (BMC Bioinformatics 13, Suppl. 19, S14, 2012), but only an upper bound for the restricted DCJ-indel distance was provided. Here we first solve an open problem posed in that paper and present a very simple proof showing that the distance, which can be computed in linear time, is the same for both the unrestricted and the restricted DCJ-indel models. We then give a simpler algorithm for computing an optimal restricted DCJ-indel sorting scenario in O(n log n) time. We also relate the DCJ-indel distance to the restricted DCJ-substitution distance, which instead of indels considers a more powerful operation that allows the substitution of a DNA segment by another DNA segment. We show that the DCJ-indel distance is a 2-approximation for the restricted DCJ-substitution distance.


Subject(s)
Gene Rearrangement/genetics , Genome/genetics , Genomics/methods , INDEL Mutation/genetics , Algorithms , Models, Genetic
11.
BMC Bioinformatics ; 14 Suppl 15: S3, 2013.
Article in English | MEDLINE | ID: mdl-24564182

ABSTRACT

BACKGROUND: The inversion distance, that is the distance between two unichromosomal genomes with the same content allowing only inversions of DNA segments, can be computed thanks to a pioneering approach of Hannenhalli and Pevzner in 1995. In 2000, El-Mabrouk extended the inversion model to allow the comparison of unichromosomal genomes with unequal contents, thus insertions and deletions of DNA segments besides inversions. However, an exact algorithm was presented only for the case in which we have insertions alone and no deletion (or vice versa), while a heuristic was provided for the symmetric case, that allows both insertions and deletions and is called the inversion-indel distance. In 2005, Yancopoulos, Attie and Friedberg started a new branch of research by introducing the generic double cut and join (DCJ) operation, that can represent several genome rearrangements (including inversions). Among others, the DCJ model gave rise to two important results. First, it has been shown that the inversion distance can be computed in a simpler way with the help of the DCJ operation. Second, the DCJ operation originated the DCJ-indel distance, that allows the comparison of genomes with unequal contents, considering DCJ, insertions and deletions, and can be computed in linear time. RESULTS: In the present work we put these two results together to solve an open problem, showing that, when the graph that represents the relation between the two compared genomes has no bad components, the inversion-indel distance is equal to the DCJ-indel distance. We also give a lower and an upper bound for the inversion-indel distance in the presence of bad components.


Subject(s)
Models, Genetic , Algorithms , Chromosome Inversion , Genome , INDEL Mutation
12.
BMC Bioinformatics ; 13 Suppl 19: S14, 2012.
Article in English | MEDLINE | ID: mdl-23281630

ABSTRACT

BACKGROUND: The double-cut-and-join (DCJ) is a model that is able to efficiently sort a genome into another, generalizing the typical mutations (inversions, fusions, fissions, translocations) to which genomes are subject, but allowing the existence of circular chromosomes at the intermediate steps. In the general model many circular chromosomes can coexist in some intermediate step. However, when the compared genomes are linear, it is more plausible to use the so-called restricted DCJ model, in which we proceed the reincorporation of a circular chromosome immediately after its creation. These two consecutive DCJ operations, which create and reincorporate a circular chromosome, mimic a transposition or a block-interchange. When the compared genomes have the same content, it is known that the genomic distance for the restricted DCJ model is the same as the distance for the general model. If the genomes have unequal contents, in addition to DCJ it is necessary to consider indels, which are insertions and deletions of DNA segments. Linear time algorithms were proposed to compute the distance and to find a sorting scenario in a general, unrestricted DCJ-indel model that considers DCJ and indels. RESULTS: In the present work we consider the restricted DCJ-indel model for sorting linear genomes with unequal contents. We allow DCJ operations and indels with the following constraint: if a circular chromosome is created by a DCJ, it has to be reincorporated in the next step (no other DCJ or indel can be applied between the creation and the reincorporation of a circular chromosome). We then develop a sorting algorithm and give a tight upper bound for the restricted DCJ-indel distance. CONCLUSIONS: We have given a tight upper bound for the restricted DCJ-indel distance. The question whether this bound can be reduced so that both the general and the restricted DCJ-indel distances are equal remains open.


Subject(s)
DNA Breaks, Double-Stranded , Genome/genetics , INDEL Mutation , Models, Genetic , Algorithms , Chromosomes , Genomics , Mutation , Translocation, Genetic
13.
BMC Bioinformatics ; 12 Suppl 9: S8, 2011 Oct 05.
Article in English | MEDLINE | ID: mdl-22151231

ABSTRACT

BACKGROUND: The distance between two genomes is often computed by comparing only the common markers between them. Some approaches are also able to deal with non-common markers, allowing the insertion or the deletion of such markers. In these models, a deletion and a subsequent insertion that occur at the same position of the genome count for two sorting steps. RESULTS: Here we propose a new model that sorts non-common markers with substitutions, which are more powerful operations that comprehend insertions and deletions. A deletion and an insertion that occur at the same position of the genome can be modeled as a substitution, counting for a single sorting step. CONCLUSIONS: Comparing genomes with unequal content, but without duplicated markers, we give a linear time algorithm to compute the genomic distance considering substitutions and double-cut-and-join (DCJ) operations. This model provides a parsimonious genomic distance to handle genomes free of duplicated markers, that is in practice a lower bound to the real genomic distances. The method could also be used to refine orthology assignments, since in some cases a substitution could actually correspond to an unannotated orthology.


Subject(s)
Genomics/methods , INDEL Mutation , Models, Genetic , Algorithms , Genetic Markers , Genome , Humans
14.
BMC Bioinformatics ; 12 Suppl 9: S13, 2011 Oct 05.
Article in English | MEDLINE | ID: mdl-22151784

ABSTRACT

BACKGROUND: Classical approaches to compute the genomic distance are usually limited to genomes with the same content, without duplicated markers. However, differences in the gene content are frequently observed and can reflect important evolutionary aspects. A few polynomial time algorithms that include genome rearrangements, insertions and deletions (or substitutions) were already proposed. These methods often allow a block of contiguous markers to be inserted, deleted or substituted at once but result in distance functions that do not respect the triangular inequality and hence do not constitute metrics. RESULTS: In the present study we discuss the disruption of the triangular inequality in some of the available methods and give a framework to establish an efficient correction for two models recently proposed, one that includes insertions, deletions and double cut and join (DCJ) operations, and one that includes substitutions and DCJ operations. CONCLUSIONS: We show that the proposed framework establishes the triangular inequality in both distances, by summing a surcharge on indel operations and on substitutions that depends only on the number of markers affected by these operations. This correction can be applied a posteriori, without interfering with the already available formulas to compute these distances. We claim that this correction leads to distances that are biologically more plausible.


Subject(s)
Genomics/methods , INDEL Mutation , Algorithms , Genome , Models, Genetic
15.
J Comput Biol ; 18(9): 1167-84, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21899423

ABSTRACT

Many approaches to compute the genomic distance are still limited to genomes with the same content, without duplicated markers. However, differences in the gene content are frequently observed and can reflect important evolutionary aspects. While duplicated markers can hardly be handled by exact models, when duplicated markers are not allowed, a few polynomial time algorithms that include genome rearrangements, insertions and deletions were already proposed. In an attempt to improve these results, in the present work we give the first linear time algorithm to compute the distance between two multichromosomal genomes with unequal content, but without duplicated markers, considering insertions, deletions and double cut and join (DCJ) operations. We derive from this approach algorithms to sort one genome into another one also using DCJ operations, insertions and deletions. The optimal sorting scenarios can have different compositions and we compare two types of sorting scenarios: one that maximizes and one that minimizes the number of DCJ operations with respect to the number of insertions and deletions. We also show that, although the triangle inequality can be disrupted in the proposed genomic distance, it is possible to correct this problem adopting a surcharge on the number of non-common markers. We use our method to analyze six species of Rickettsia, a group of obligate intracellular parasites, and identify preliminary evidence of clusters of deletions.


Subject(s)
Genome, Bacterial , INDEL Mutation , Rickettsia/genetics , Algorithms , Computer Simulation , Evolution, Molecular , Models, Genetic , Phylogeny , Recombination, Genetic
16.
J Comput Biol ; 18(9): 1231-41, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21899428

ABSTRACT

We study three classical problems of genome rearrangement--sorting, halving, and the median problem--in a restricted double cut and join (DCJ) model. In the DCJ model, introduced by Yancopoulos et al., we can represent rearrangement events that happen in multichromosomal genomes, such as inversions, translocations, fusions, and fissions. Two DCJ operations can mimic transpositions or block interchanges by first extracting an appropriate segment of a chromosome, creating a temporary circular chromosome, and then reinserting it in its proper place. In the restricted model, we are concerned with multichromosomal linear genomes and we require that each circular excision is immediately followed by its reincorporation. Existing linear-time DCJ sorting and halving algorithms ignore this reincorporation constraint. In this article, we propose a new algorithm for the restricted sorting problem running in O(n log n) time, thus improving on the known quadratic time algorithm. We solve the restricted halving problem and give an algorithm that computes a multilinear halved genome in linear time. Finally, we show that the restricted median problem is NP-hard as conjectured.


Subject(s)
Chromosomes/genetics , Genome , Mutation , Algorithms , Computer Simulation , Evolution, Molecular , Gene Rearrangement , Models, Genetic , Ploidies
17.
J Comput Biol ; 17(9): 1145-65, 2010 Sep.
Article in English | MEDLINE | ID: mdl-20874401

ABSTRACT

In genome rearrangements, the double cut and join (DCJ) operation, introduced by Yancopoulos et al. in 2005, allows one to represent most rearrangement events that could happen in multichromosomal genomes, such as inversions, translocations, fusions, and fissions. No restriction on the genome structure considering linear and circular chromosomes is imposed. An advantage of this general model is that it leads to considerable algorithmic simplifications compared to other genome rearrangement models. Recently, several works concerning the DCJ operation have been published, and in particular, an algorithm was proposed to find an optimal DCJ sequence for sorting one genome into another one. Here we study the solution space of this problem and give an easy-to-compute formula that corresponds to the exact number of optimal DCJ sorting sequences for a particular subset of instances of the problem. We also give an algorithm to count the number of optimal sorting sequences for any instance of the problem. Another interesting result is the demonstration of the possibility of obtaining one optimal sorting sequence by properly replacing any pair of consecutive operations in another optimal sequence. As a consequence, any optimal sorting sequence can be obtained from one other by applying such replacements successively, but the problem of finding the shortest number of replacements between two sorting sequences is still open.


Subject(s)
Algorithms , Base Sequence , Gene Rearrangement , Genome , Animals , Evolution, Molecular , Humans , Models, Genetic
18.
Bioinformatics ; 25(14): 1833-5, 2009 Jul 15.
Article in English | MEDLINE | ID: mdl-19401401

ABSTRACT

SUMMARY: Computing the reversal distance and searching for an optimal sequence of reversals to transform a unichromosomal genome into another are useful algorithmic tools to analyse real evolutionary scenarios. Currently, these problems can be solved by at least two available softwares, the prominent of which are GRAPPA and GRIMM. However, the number of different optimal sequences is usually huge and taking only the distance and/or one example is often insufficient to do a proper analysis. Here, we offer an alternative and present baobabLUNA, a framework that contains an algorithm to give a compact representation of the whole space of solutions for the sorting by reversals problem. AVAILABILITY AND IMPLEMENTATION: Compiled code implemented in Java is freely available for download at http://pbil.univ-lyon1.fr/software/luna/. Documentation with methodological background, technical aspects, download and setup instructions, interface description and tutorial are available at http://pbil.univ-lyon1.fr/software/luna/doc/luna-doc.pdf.


Subject(s)
Algorithms , Genomics , Software
19.
Algorithms Mol Biol ; 4: 16, 2009 Dec 30.
Article in English | MEDLINE | ID: mdl-20042101

ABSTRACT

BACKGROUND: The reversal distance and optimal sequences of reversals to transform a genome into another are useful tools to analyse evolutionary scenarios. However, the number of sequences is huge and some additional criteria should be used to obtain a more accurate analysis. One strategy is searching for sequences that respect constraints, such as the common intervals (clusters of co-localised genes). Another approach is to explore the whole space of sorting sequences, eventually grouping them into classes of equivalence. Recently both strategies started to be put together, to restrain the space to the sequences that respect constraints. In particular an algorithm has been proposed to list classes whose sorting sequences do not break the common intervals detected between the two initial genomes A and B. This approach may reduce the space of sequences and is symmetric (the result of the analysis sorting A into B can be obtained from the analysis sorting B into A). RESULTS: We propose an alternative approach to restrain the space of sorting sequences, using progressive instead of initial detection of common intervals (the list of common intervals is updated after applying each reversal). This may reduce the space of sequences even more, but is shown to be asymmetric. CONCLUSIONS: We suggest that our method may be more realistic when the relation ancestor-descendant between the analysed genomes is clear and we apply it to do a better characterisation of the evolutionary scenario of the bacterium Rickettsia felis with respect to one of its ancestors.

20.
Genome Biol Evol ; 1: 56-66, 2009 Apr 30.
Article in English | MEDLINE | ID: mdl-20333177

ABSTRACT

The human sex chromosomes have stopped recombining gradually, which has left five evolutionary strata on the X chromosome. Y inversions are thought to have suppressed X-Y recombination but clear evidence is missing. Here, we looked for such evidence by focusing on a region--the X-added region (XAR)--that includes the pseudoautosomal region and the most recent strata 3 to 5. We estimated and analyzed the whole set of parsimonious scenarios of Y inversions given the gene order in XAR and its Y homolog. Comparing these to scenarios for simulated sequences suggests that the strata 4 and 5 were formed by Y inversions. By comparing the X and Y DNA sequences, we found clear evidence of two Y inversions associated with duplications that coincide with the boundaries of strata 4 and 5. Divergence between duplicates is in agreement with the timing of strata 4 and 5 formation. These duplicates show a complex pattern of gene conversion that resembles the pattern previously found for AMELXY, a stratum 3 locus. This suggests that this locus--despite AMELY being unbroken--was possibly involved in a Y inversion that formed stratum 3. However, no clear evidence supporting the formation of stratum 3 by a Y inversion was found, probably because this stratum is too old for such an inversion to be detectable. Our results strongly support the view that the most recent human strata have arisen by Y inversions and suggest that inversions have played a major role in the differentiation of our sex chromosomes.

SELECTION OF CITATIONS
SEARCH DETAIL
...