Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Methods Mol Biol ; 2231: 89-97, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33289888

RESUMO

Many fields of biology rely on the inference of accurate multiple sequence alignments (MSA) of biological sequences. Unfortunately, the problem of assembling an MSA is NP-complete thus limiting computation to approximate solutions using heuristics solutions. The progressive algorithm is one of the most popular frameworks for the computation of MSAs. It involves pre-clustering the sequences and aligning them starting with the most similar ones. The scalability of this framework is limited, especially with respect to accuracy. We present here an alternative approach named regressive algorithm. In this framework, sequences are first clustered and then aligned starting with the most distantly related ones. This approach has been shown to greatly improve accuracy during scale-up, especially on datasets featuring 10,000 sequences or more. Another benefit is the possibility to integrate third-party clustering methods and third-party MSA aligners. The regressive algorithm has been tested on up to 1.5 million sequences, its implementation is available in the T-Coffee package.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Software , Algoritmos , Análise por Conglomerados , Biologia Computacional/instrumentação , Alinhamento de Sequência/instrumentação
2.
Nat Biotechnol ; 37(12): 1466-1470, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31792410

RESUMO

Multiple sequence alignments (MSAs) are used for structural1,2 and evolutionary predictions1,2, but the complexity of aligning large datasets requires the use of approximate solutions3, including the progressive algorithm4. Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf to root, based on a guide tree. Their accuracy declines substantially as the number of sequences is scaled up5. We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around from the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes6.


Assuntos
Algoritmos , Alinhamento de Sequência/métodos , Bases de Dados Genéticas , Eucariotos/genética , Genômica/métodos , Análise de Regressão
3.
Biol Aujourdhui ; 211(3): 233-237, 2017.
Artigo em Francês | MEDLINE | ID: mdl-29412134

RESUMO

Reproducing routine bioinformatics analysis is challenging owing to a combination of factors hard to control for. Nextflow is a flow management framework that uses container technology to insure efficient deployment and reproducibility of computational analysis pipelines. Third party pipelines can be ported into Nextflow with minimum re-coding. We used RNA-Seq quantification, genome annotation and phylogeny reconstruction examples to show how two seemingly irreproducible analyzes can be made stable across platforms when ported into Nextflow.


Assuntos
Algoritmos , Biologia Computacional/métodos , Biologia Computacional/organização & administração , Genômica , Fluxo de Trabalho , Animais , Genômica/métodos , Genômica/organização & administração , Genômica/normas , Humanos , Reprodutibilidade dos Testes , Interface Usuário-Computador
4.
Nucleic Acids Res ; 44(W1): W339-43, 2016 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-27106060

RESUMO

The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee.


Assuntos
Algoritmos , Proteínas de Membrana/química , Análise de Sequência de Proteína/estatística & dados numéricos , Interface Usuário-Computador , Sequência de Aminoácidos , Gráficos por Computador , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação , Internet , Proteínas de Membrana/genética , Domínios Proteicos , Estrutura Secundária de Proteína , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
5.
Brief Bioinform ; 17(6): 1009-1023, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-26615024

RESUMO

This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.


Assuntos
Alinhamento de Sequência , Algoritmos , DNA , Genômica , Proteínas , Reprodutibilidade dos Testes
6.
Methods Mol Biol ; 1079: 117-29, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24170398

RESUMO

T-Coffee, for Tree-based consistency objective function for alignment evaluation, is a versatile multiple sequence alignment (MSA) method suitable for aligning virtually any type of biological sequences. T-Coffee provides more than a simple sequence aligner; rather it is a framework in which alternative alignment methods and/or extra information (i.e., structural, evolutionary, or experimental information) can be combined to reach more accurate and more meaningful MSAs. T-Coffee can be used either by running input data via the Web server ( http://tcoffee.crg.cat/apps/tcoffee/index.html ) or by downloading the T-Coffee package. Here, we present how the package can be used in its command line mode to carry out the most common tasks and multiply align proteins, DNA, and RNA sequences. This chapter particularly emphasizes on the description of T-Coffee special flavors also called "modes," designed to address particular biological problems.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , DNA/genética , Internet , Dados de Sequência Molecular , Proteínas/química , RNA/genética
7.
Nucleic Acids Res ; 41(Web Server issue): W358-62, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23716642

RESUMO

This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd.


Assuntos
Conformação Proteica , Proteínas/classificação , Software , Algoritmos , Análise por Conglomerados , Internet
8.
Trends Biochem Sci ; 37(9): 353-63, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22789664

RESUMO

Tumor Necrosis Factor Ligand (TNFL)-Tumor Necrosis Factor Receptor (TNFR) interactions control key cellular processes; however, the molecular basis of the specificity of these interactions remains poorly understood. Using the T-RMSD (tree based on root mean square deviation), a newly developed structure-based sequence clustering method, we have re-analyzed the available structural data to re-interpret the interactions between TNFLs and TNFRs. This improves the classification of both TNFLs and TNFRs, such that the new groups defined here are in much stronger agreement with structural and functional features than existing schemes. Our clustering approach also identifies traces of a convergent evolutionary process for TNFLs and TNFRs, leading us to propose the co-evolution of TNFLs and the third cysteine rich domain (CRD) of large TNFRs.


Assuntos
Receptores do Fator de Necrose Tumoral/química , Fatores de Necrose Tumoral/química , Animais , Humanos , Ligantes , Filogenia , Domínios e Motivos de Interação entre Proteínas , Receptores do Fator de Necrose Tumoral/genética , Receptores do Fator de Necrose Tumoral/metabolismo , Fatores de Necrose Tumoral/metabolismo
9.
Nat Protoc ; 6(11): 1669-82, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21979275

RESUMO

T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.


Assuntos
DNA/química , Conformação de Ácido Nucleico , Proteínas/química , RNA/química , Alinhamento de Sequência/métodos , Algoritmos , Sequência de Aminoácidos , Sequência de Bases , Modelos Moleculares , Dados de Sequência Molecular , Software
10.
J Mol Biol ; 400(3): 605-17, 2010 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-20471393

RESUMO

This study addresses the relation between structural and functional similarity in proteins. We introduce a novel method named tree based on root mean square deviation (T-RMSD), which uses distance RMSD (dRMSD) variations to build fine-grained structure-based classifications of proteins. The main improvement of the T-RMSD over similar methods, such as Dali, is its capacity to produce the equivalent of a bootstrap value for each cluster node. We validated our approach on two domain families studied extensively for their role in many biological and pathological pathways: the small GTPase RAS superfamily and the cysteine-rich domains (CRDs) associated with the tumor necrosis factor receptors (TNFRs) family. Our analysis showed that T-RMSD is able to automatically recover and refine existing classifications. In the case of the small GTPase ARF subfamily, T-RMSD can distinguish GTP- from GDP-bound states, while in the case of CRDs it can identify two new subgroups associated with well defined functional features (ligand binding and formation of ligand pre-assembly complex). We show how hidden Markov models (HMMs) can be built on these new groups and propose a methodology to use these models simultaneously in order to do fine-grained functional genomic annotation without known 3D structures. T-RMSD, an open source freeware incorporated in the T-Coffee package, is available online.


Assuntos
Biologia Computacional/métodos , Receptores do Fator de Necrose Tumoral/química , Receptores do Fator de Necrose Tumoral/classificação , Análise por Conglomerados , Proteínas Monoméricas de Ligação ao GTP/química , Proteínas Monoméricas de Ligação ao GTP/classificação , Proteínas Monoméricas de Ligação ao GTP/metabolismo , Estrutura Terciária de Proteína , Receptores do Fator de Necrose Tumoral/imunologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...