Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
Nat Genet ; 56(4): 721-731, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38622339

ABSTRACT

Coffea arabica, an allotetraploid hybrid of Coffea eugenioides and Coffea canephora, is the source of approximately 60% of coffee products worldwide, and its cultivated accessions have undergone several population bottlenecks. We present chromosome-level assemblies of a di-haploid C. arabica accession and modern representatives of its diploid progenitors, C. eugenioides and C. canephora. The three species exhibit largely conserved genome structures between diploid parents and descendant subgenomes, with no obvious global subgenome dominance. We find evidence for a founding polyploidy event 350,000-610,000 years ago, followed by several pre-domestication bottlenecks, resulting in narrow genetic variation. A split between wild accessions and cultivar progenitors occurred ~30.5 thousand years ago, followed by a period of migration between the two populations. Analysis of modern varieties, including lines historically introgressed with C. canephora, highlights their breeding histories and loci that may contribute to pathogen resistance, laying the groundwork for future genomics-based breeding of C. arabica.


Subject(s)
Coffea , Coffea/genetics , Coffee , Genome, Plant/genetics , Metagenomics , Plant Breeding
2.
EJHaem ; 4(3): 770-774, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37601854

ABSTRACT

Assessment of minimal residual disease in acute lymphoblastic leukemia by immune repertoire NGS requires spiking CDR3 sequences at known quantities into the patient's sample. Recently, the EuroClonality-NGS group released one of the most comprehensive protocols for this purpose. ARResT/Interrogate is a closed-source software for processing these NGS libraries, developed by this same group. Vidjil, an open-source alternative, currently cannot handle libraries prepared using this protocol. Here, we present a Vidjil add-on to solve this issue. EuroClonality-NGS prepared samples analyzed with Vidjil and ARResT/Interrogate were highly concordant (r = 0.998) and presented low error (root-mean-square error, RMSE = 0.112).

3.
Article in English | MEDLINE | ID: mdl-37200133

ABSTRACT

An important problem in genome comparison is the genome sorting problem, that is, the problem of finding a sequence of basic operations that transforms one genome into another whose length (possibly weighted) equals the distance between them. These sequences are called optimal sorting scenarios. However, there is usually a large number of such scenarios, and a naïve algorithm is very likely to be biased towards a specific type of scenario, impairing its usefulness in real-world applications. One way to go beyond the traditional sorting algorithms is to explore all possible solutions, looking at all the optimal sorting scenarios instead of just an arbitrary one. Another related approach is to analyze all the intermediate genomes, that is, all the genomes that can occur in an optimal sorting scenario. In this paper, we show how to enumerate the optimal sorting scenarios and the intermediate genomes between any two given genomes, under the rank distance.

4.
Bioinformatics ; 39(3)2023 03 01.
Article in English | MEDLINE | ID: mdl-36790056

ABSTRACT

MOTIVATION: The rank distance model represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to accommodate a broader range of biological contexts. We approach this generalization by using a matrix representation of genomes. This leads to simple distance formulas and sorting algorithms for genomes with different gene contents, but without duplications. RESULTS: We generalize the rank distance to genomes with different gene content in two different ways. The first approach adds insertions, deletions and the substitution of a single extremity to the basic operations. We show how to efficiently compute this distance. To avoid genomes with incomplete markers, our alternative distance, the rank-indel distance, only uses insertions and deletions of entire chromosomes. We construct phylogenetic trees with our distances and the DCJ-Indel distance for simulated data and real prokaryotic genomes, and compare them against reference trees. For simulated data, our distances outperform the DCJ-Indel distance using the Quartet metric as baseline. This suggests that rank distances are more robust for comparing distantly related species. For real prokaryotic genomes, all rearrangement-based distances yield phylogenetic trees that are topologically distant from the reference (65% similarity with Quartet metric), but are able to cluster related species within their respective clades and distinguish the Shigella strains as the farthest relative of the Escherichia coli strains, a feature not seen in the reference tree. AVAILABILITY AND IMPLEMENTATION: Code and instructions are available at https://github.com/meidanis-lab/rank-indel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics , Models, Genetic , Phylogeny , Genome , INDEL Mutation , Algorithms
6.
Algorithms Mol Biol ; 14: 16, 2019.
Article in English | MEDLINE | ID: mdl-31832081

ABSTRACT

BACKGROUND: The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the median of three genomes can be computed exactly in polynomial time O ( n ω ) , where ω ≤ 3 , with respect to this distance, when the median is allowed to be an arbitrary orthogonal matrix. RESULTS: We define the five fundamental subspaces depending on three input genomes, and use their properties to show that a particular action on each of these subspaces produces a median. In the process we introduce the notion of M-stable subspaces. We also show that the median found by our algorithm is always orthogonal, symmetric, and conserves any adjacencies or telomeres present in at least 2 out of 3 input genomes. CONCLUSIONS: We test our method on both simulated and real data. We find that the majority of the realistic inputs result in genomic outputs, and for those that do not, our two heuristics perform well in terms of reconstructing a genomic matrix attaining a score close to the lower bound, while running in a reasonable amount of time. We conclude that the rank distance is not only theoretically intriguing, but also practically useful for median-finding, and potentially ancestral genome reconstruction.

7.
Article in English | MEDLINE | ID: mdl-30072336

ABSTRACT

We outline an integrated approach to speciation and whole genome duplication (WGD) to resolve the occurrence of these events in phylogenetic analysis. We propose a more principled way of estimating the parameters of gene divergence and fractionation than the standard mixture of normals analysis. We formulate an algorithm for resolving data on local peaks in the distributions of duplicate gene similarities for a number of related genomes. We illustrate with a comprehensive analysis of WGD-origin duplicate gene data from the family Brassicaceae.

8.
BMC Bioinformatics ; 19(Suppl 6): 142, 2018 05 08.
Article in English | MEDLINE | ID: mdl-29745865

ABSTRACT

BACKGROUND: Recently, Pereira Zanetti, Biller and Meidanis have proposed a new definition of a rearrangement distance between genomes. In this formulation, each genome is represented as a matrix, and the distance d is the rank distance between these matrices. Although defined in terms of matrices, the rank distance is equal to the minimum total weight of a series of weighted operations that leads from one genome to the other, including inversions, translocations, transpositions, and others. The computational complexity of the median-of-three problem according to this distance is currently unknown. The genome matrices are a special kind of permutation matrices, which we study in this paper. In their paper, the authors provide an [Formula: see text] algorithm for determining three candidate medians, prove the tight approximation ratio [Formula: see text], and provide a sufficient condition for their candidates to be true medians. They also conduct some experiments that suggest that their method is accurate on simulated and real data. RESULTS: In this paper, we extend their results and provide the following: Three invariants characterizing the problem of finding the median of 3 matrices A sufficient condition for uniqueness of medians that can be checked in O(n) A faster, [Formula: see text] algorithm for determining the median under this condition A new heuristic algorithm for this problem based on compressed sensing A [Formula: see text] algorithm that exactly solves the problem when the inputs are orthogonal matrices, a class that includes both permutations and genomes as special cases. CONCLUSIONS: Our work provides the first proof that, with respect to the rank distance, the problem of finding the median of 3 genomes, as well as the median of 3 permutations, is exactly solvable in polynomial time, a result which should be contrasted with its NP-hardness for the DCJ (double cut-and-join) distance and most other families of genome rearrangement operations. This result, backed by our experimental tests, indicates that the rank distance is a viable alternative to the DCJ distance widely used in genome comparisons.


Subject(s)
Models, Genetic , Algorithms , Computer Simulation , Databases, Genetic , Gene Rearrangement , Genome , Genomics/methods , Mutation/genetics
9.
Bull Math Biol ; 78(4): 786-814, 2016 04.
Article in English | MEDLINE | ID: mdl-27072561

ABSTRACT

The genome median problem is an important problem in phylogenetic reconstruction under rearrangement models. It can be stated as follows: Given three genomes, find a fourth that minimizes the sum of the pairwise rearrangement distances between it and the three input genomes. In this paper, we model genomes as matrices and study the matrix median problem using the rank distance. It is known that, for any metric distance, at least one of the corners is a [Formula: see text]-approximation of the median. Our results allow us to compute up to three additional matrix median candidates, all of them with approximation ratios at least as good as the best corner, when the input matrices come from genomes. We also show a class of instances where our candidates are optimal. From the application point of view, it is usually more interesting to locate medians farther from the corners, and therefore, these new candidates are potentially more useful. In addition to the approximation algorithm, we suggest a heuristic to get a genome from an arbitrary square matrix. This is useful to translate the results of our median approximation algorithm back to genomes, and it has good results in our tests. To assess the relevance of our approach in the biological context, we ran simulated evolution tests and compared our solutions to those of an exact DCJ median solver. The results show that our method is capable of producing very good candidates.


Subject(s)
Genome , Models, Genetic , Algorithms , Computer Simulation , Evolution, Molecular , Mathematical Concepts , Models, Statistical , Phylogeny
10.
Article in English | MEDLINE | ID: mdl-24334378

ABSTRACT

Algebraic rearrangement theory, as introduced by Meidanis and Dias, focuses on representing the order in which genes appear in chromosomes, and applies to circular chromosomes only. By shifting our attention to genome adjacencies, we introduce the adjacency algebraic theory, extending the original algebraic theory to linear chromosomes in a very natural way, also allowing the original algebraic distance formula to be used to the general multichromosomal case, with both linear and circular chromosomes. The resulting distance, which we call algebraic distance here, is very similar to, but not quite the same as, double-cut-and-join distance. We present linear time algorithms to compute it and to sort genomes. We show how to compute the rearrangement distance from the adjacency graph, for an easier comparison with other rearrangement distances. A thorough discussion on the relationship between the chromosomal and adjacency representation is also given, and we show how all classic rearrangement operations can be modeled using the algebraic theory.


Subject(s)
Algorithms , Gene Rearrangement/genetics , Genomics/methods , Models, Genetic , Genome , Linear Models , Telomere
11.
Article in English | MEDLINE | ID: mdl-23702549

ABSTRACT

Recently, the Single-Cut-or-Join (SCJ) operation was proposed as a basis for a new rearrangement distance between multichromosomal genomes, leading to very fast algorithms, both in theory and in practice. However, it was not clear how well this new distance fares when it comes to using it to solve relevant problems, such as the reconstruction of evolutionary history. In this paper, we advance current knowledge, by testing SCJ's ability regarding evolutionary reconstruction in two aspects: 1) How well does SCJ reconstruct evolutionary topologies? and 2) How well does SCJ reconstruct ancestral genomes? In the process of answering these questions, we implemented SCJ-based methods, and made them available to the community. We ran experiments using as many as 200 genomes, with as many as 3,000 genes. For the first question, we found out that SCJ can recover typically between 60 percent and more than 95 percent of the topology, as measured through the Robinson-Foulds distance (a.k.a. split distance) between trees. In other words, 60 percent to more than 95 percent of the original splits are also present in the reconstructed tree. For the second question, given a topology, SCJ's ability to reconstruct ancestral genomes depends on how far from the leaves the ancestral is. For nodes close to the leaves, about 85 percent of the gene adjacencies can be recovered. This percentage decreases as we move up the tree, but, even at the root, about 50 percent of the adjacencies are recovered, for as many as 64 leaves. Our findings corroborate the fact that SCJ leads to very conservative genome reconstructions, yielding very few false-positive gene adjacencies in the ancestrals, at the expense of a relatively larger amount of false negatives. In addition, experiments with real data from the Campanulaceae and Protostomes groups show that SCJ reconstructs topologies of quality comparable to the accepted trees of the species involved. As far as time is concerned, the methods we implemented can find a topology for 64 genomes with 2,000 genes each in about 10.7 minutes, and reconstruct the ancestral genomes in a 64-leaf tree in about 3 seconds, both on a typical desktop computer. It should be noted that our code is written in Java and we made no significant effort to optimize it.


Subject(s)
Gene Rearrangement , Genomics/methods , Models, Genetic , Phylogeny , Animals , Campanulaceae , Computer Simulation , Evolution, Molecular , Genome , Software
12.
J Hered ; 103(3): 342-8, 2012.
Article in English | MEDLINE | ID: mdl-22315242

ABSTRACT

Cattle are divided into 2 groups referred to as taurine and indicine, both of which have been under strong artificial selection due to their importance for human nutrition. A side effect of this domestication includes a loss of genetic diversity within each specialized breed. Recently, the first taurine genome was sequenced and assembled, allowing for a better understanding of this ruminant species. However, genetic information from indicine breeds has been limited. Here, we present the first genome sequence of an indicine breed (Nellore) generated with 52X coverage by SOLiD sequencing platform. As expected, both genomes share high similarity at the nucleotide level for all autosomes and the X chromosome. Regarding the Y chromosome, the homology was considerably lower, most likely due to uncompleted assembly of the taurine Y chromosome. We were also able to cover 97% of the annotated taurine protein-coding genes.


Subject(s)
Cattle/genetics , Genome , Animals , Chromosomes, Mammalian/genetics , Codon/genetics , Contig Mapping , Male , Sequence Analysis, DNA , Sequence Homology, Nucleic Acid
13.
Article in English | MEDLINE | ID: mdl-21339538

ABSTRACT

The breakpoint distance is one of the most straightforward genome comparison measures. Surprisingly, when it comes to defining it precisely for multichromosomal genomes with both linear and circular chromosomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, Tannier et al. defined it differently in 2008, and in this paper we provide yet another alternative, calling it SCJ for single-cut-or-join, in analogy to the popular double cut and join (DCJ) measure. We show that several genome rearrangement problems, such as median and halving, become easy for SCJ, and provide linear and higher polynomial time algorithms for them. For the multichromosomal linear genome median problem, this is the first polynomial time algorithm described, since for other distances this problem is NP-hard. In addition, we show that small parsimony under SCJ is also easy, and can be solved by a variant of Fitch's algorithm. In contrast, big parsimony is NP-hard under SCJ. This new distance measure may be of value as a speedily computable, first approximation to distances based on more realistic rearrangement models.


Subject(s)
Algorithms , Gene Rearrangement/genetics , Genomics/methods , Models, Genetic , Phylogeny
14.
Genet Mol Res ; 5(1): 269-83, 2006 Mar 31.
Article in English | MEDLINE | ID: mdl-16755517

ABSTRACT

Nowadays, there are many phylogeny reconstruction methods, each with advantages and disadvantages. We explored the advantages of each method, putting together the common parts of trees constructed by several methods, by means of a consensus computation. A number of phylogenetic consensus methods are already known. Unfortunately, there is also a taboo concerning consensus methods, because most biologists see them mainly as comparators and not as phylogenetic tree constructors. We challenged this taboo by defining a consensus method that builds a fully resolved phylogenetic tree based on the most common parts of fully resolved trees in a given collection. We also generated results showing that this consensus is in a way a kind of "median" of the input trees; as such it can be closer to the correct tree in many situations.


Subject(s)
Algorithms , Consensus Sequence/genetics , Evolution, Molecular , Models, Genetic , Phylogeny , Animals , Cluster Analysis , Humans , Software
15.
J Comput Biol ; 9(5): 743-5, 2002.
Article in English | MEDLINE | ID: mdl-12487761

ABSTRACT

One possible model to study genome evolution is to represent genomes as permutations of genes and compute distances based on the minimum number of certain operations (rearrangements) needed to transform one permutation into another. Under this model, the shorter the distance, the closer the genomes are. Two operations that have been extensively studied are the reversal and the transposition. A reversal is an operation that reverses the order of the genes on a certain portion of the permutation. A transposition is an operation that "cuts" a certain portion of the permutation and "pastes" it elsewhere in the same permutation. In this note, we show that the reversal and transposition distance of the signed permutation pi(n) = (-1 -2.-(n - 1)-n) with respect to the identity is left floor n/2 right floor + 2 for all n>or=3. We conjecture that this value is the diameter of the permutation group under these operations.


Subject(s)
Evolution, Molecular , Genome , Models, Genetic , Computational Biology/methods , Genes , Mathematics
16.
Microbiol Mol Biol Rev ; 66(2): 272-99, 2002 Jun.
Article in English | MEDLINE | ID: mdl-12040127

ABSTRACT

The transport systems of the first completely sequenced genome of a plant parasite, Xylella fastidiosa, were analyzed. In all, 209 proteins were classified here as constitutive members of transport families; thus, we have identified 69 new transporters in addition to the 140 previously annotated. The analysis lead to several hints on potential ways of controlling the disease it causes on citrus trees. An ADP:ATP translocator, previously found in intracellular parasites only, was found in X. fastidiosa. A P-type ATPase is missing-among the 24 completely sequenced eubacteria to date, only three (including X. fastidiosa) do not have a P-type ATPase, and they are all parasites transmitted by insect vectors. An incomplete phosphotransferase system (PTS) was found, without the permease subunits-we conjecture either that they are among the hypothetical proteins or that the PTS plays a solely metabolic regulatory role. We propose that the Ttg2 ABC system might be an import system eventually involved in glutamate import rather than a toluene exporter, as previously annotated. X. fastidiosa exhibits fewer proteins with > or =4 alpha-helical transmembrane spanners than any other completely sequenced prokaryote to date. X. fastidiosa has only 2.7% of all open reading frames identifiable as major transporters, which puts it as the eubacterium having the lowest percentage of open reading frames involved in transport, closer to two archaea, Methanococcus jannaschii (2.4%) and Methanobacterium thermoautotrophicum (2.4%).


Subject(s)
Bacterial Proteins/genetics , Carrier Proteins/genetics , Gammaproteobacteria/genetics , Gammaproteobacteria/metabolism , Bacterial Outer Membrane Proteins/genetics , Bacterial Outer Membrane Proteins/metabolism , Bacterial Proteins/metabolism , Biological Transport, Active , Carrier Proteins/metabolism , Gammaproteobacteria/pathogenicity , Genome, Bacterial , Plants/microbiology
17.
Genet. mol. biol ; 24(1/4): 9-15, 2001. ilus
Article in English | LILACS | ID: lil-313867

ABSTRACT

O projeto SUCEST (Sugarcane EST Project) produziu 291.904 ESTs de cana-de-açúcar. Nesse projeto, o Laboratório de Bioinformática criou o web site que foi o "ponto de encontro" dos 74 laboratórios de sequenciamento e data mining que fizeram parte do consórcio para o projeto. O Laboratório de Bioinformática (LBI) recebeu, processou, analisou e disponibilizou ferramentas para a exploraçäo dos dados. Neste artigo os dados, serviços e programas implementados pelo LBI para o projeto säo descritos, incluindo o procedimento de clustering que gerou 43.141 clusters.


Subject(s)
Computational Biology , Expressed Sequence Tags , Cluster Analysis , Gene Library , Plants , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...