Pesquisa | Portal Regional da BVS

DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.

Balech, Bachir; Monaco, Alfonso; Perniola, Michele; Santamaria, Monica; Donvito, Giacinto; Vicario, Saverio; Maggi, Giorgio; Pesole, Graziano.

Methods Mol Biol ; 1746: 173-180, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29492894

RESUMO

Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/ .

Assuntos

Bases de Dados Factuais , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Animais , Genoma , Humanos , Domínios Proteicos , Proteínas/genética

The bologna annotation resource: a non hierarchical method for the functional and structural annotation of protein sequences relying on a comparative large-scale genome analysis.

Bartoli, Lisa; Montanucci, Ludovica; Fronza, Raffaele; Martelli, Pier Luigi; Fariselli, Piero; Carota, Luciana; Donvito, Giacinto; Maggi, Giorgio P; Casadio, Rita.

J Proteome Res ; 8(9): 4362-71, 2009 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-19552451

RESUMO

Protein sequence annotation is a major challenge in the postgenomic era. Thanks to the availability of complete genomes and proteomes, protein annotation has recently taken invaluable advantage from cross-genome comparisons. In this work, we describe a new non hierarchical clustering procedure characterized by a stringent metric which ensures a reliable transfer of function between related proteins even in the case of multidomain and distantly related proteins. The method takes advantage of the comparative analysis of 599 completely sequenced genomes, both from prokaryotes and eukaryotes, and of a GO and PDB/SCOP mapping over the clusters. A statistical validation of our method demonstrates that our clustering technique captures the essential information shared between homologous and distantly related protein sequences. By this, uncharacterized proteins can be safely annotated by inheriting the annotation of the cluster. We validate our method by blindly annotating other 201 genomes and finally we develop BAR (the Bologna Annotation Resource), a prediction server for protein functional annotation based on a total of 800 genomes (publicly available at http://microserf.biocomp.unibo.it/bar/).

Assuntos

Biologia Computacional/métodos , Genômica/métodos , Proteínas/análise , Análise de Sequência de Proteína/métodos , Animais , Análise por Conglomerados , Bases de Dados Genéticas , Pongo pygmaeus/genética , Mapeamento de Interação de Proteínas , Proteínas/genética , Reprodutibilidade dos Testes , Alinhamento de Sequência , Terminologia como Assunto

The GENIUS Grid Portal and robot certificates: a new tool for e-Science.

Barbera, Roberto; Donvito, Giacinto; Falzone, Alberto; La Rocca, Giuseppe; Milanesi, Luciano; Maggi, Giorgio Pietro; Vicario, Saverio.

BMC Bioinformatics ; 10 Suppl 6: S21, 2009 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-19534747

RESUMO

BACKGROUND: Grid technology is the computing model which allows users to share a wide pletora of distributed computational resources regardless of their geographical location. Up to now, the high security policy requested in order to access distributed computing resources has been a rather big limiting factor when trying to broaden the usage of Grids into a wide community of users. Grid security is indeed based on the Public Key Infrastructure (PKI) of X.509 certificates and the procedure to get and manage those certificates is unfortunately not straightforward. A first step to make Grids more appealing for new users has recently been achieved with the adoption of robot certificates. METHODS: Robot certificates have recently been introduced to perform automated tasks on Grids on behalf of users. They are extremely useful for instance to automate grid service monitoring, data processing production, distributed data collection systems. Basically these certificates can be used to identify a person responsible for an unattended service or process acting as client and/or server. Robot certificates can be installed on a smart card and used behind a portal by everyone interested in running the related applications in a Grid environment using a user-friendly graphic interface. In this work, the GENIUS Grid Portal, powered by EnginFrame, has been extended in order to support the new authentication based on the adoption of these robot certificates. RESULTS: The work carried out and reported in this manuscript is particularly relevant for all users who are not familiar with personal digital certificates and the technical aspects of the Grid Security Infrastructure (GSI). The valuable benefits introduced by robot certificates in e-Science can so be extended to users belonging to several scientific domains, providing an asset in raising Grid awareness to a wide number of potential users. CONCLUSION: The adoption of Grid portals extended with robot certificates, can really contribute to creating transparent access to computational resources of Grid Infrastructures, enhancing the spread of this new paradigm in researchers' working life to address new global scientific challenges. The evaluated solution can of course be extended to other portals, applications and scientific communities.

Assuntos

Segurança Computacional , Armazenamento e Recuperação da Informação/métodos , Software , Redes de Comunicação de Computadores , Internet

Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes.

Mignone, Flavio; Anselmo, Anna; Donvito, Giacinto; Maggi, Giorgio P; Grillo, Giorgio; Pesole, Graziano.

BMC Genomics ; 9: 277, 2008 Jun 11.

Artigo em Inglês | MEDLINE | ID: mdl-18547402

RESUMO

BACKGROUND: The accurate detection of genes and the identification of functional regions is still an open issue in the annotation of genomic sequences. This problem affects new genomes but also those of very well studied organisms such as human and mouse where, despite the great efforts, the inventory of genes and regulatory regions is far from complete. Comparative genomics is an effective approach to address this problem. Unfortunately it is limited by the computational requirements needed to perform genome-wide comparisons and by the problem of discriminating between conserved coding and non-coding sequences. This discrimination is often based (thus dependent) on the availability of annotated proteins. RESULTS: In this paper we present the results of a comprehensive comparison of human and mouse genomes performed with a new high throughput grid-based system which allows the rapid detection of conserved sequences and accurate assessment of their coding potential. By detecting clusters of coding conserved sequences the system is also suitable to accurately identify potential gene loci. Following this analysis we created a collection of human-mouse conserved sequence tags and carefully compared our results to reliable annotations in order to benchmark the reliability of our classifications. Strikingly we were able to detect several potential gene loci supported by EST sequences but not corresponding to as yet annotated genes. CONCLUSION: Here we present a new system which allows comprehensive comparison of genomes to detect conserved coding and non-coding sequences and the identification of potential gene loci. Our system does not require the availability of any annotated sequence thus is suitable for the analysis of new or poorly annotated genomes.

Assuntos

Sequência Conservada , Genoma Humano , Algoritmos , Animais , Etiquetas de Sequências Expressas , Genoma , Humanos , Camundongos , Família Multigênica , Fases de Leitura Aberta , RNA Mensageiro/genética , RNA não Traduzido/genética , Especificidade da Espécie

Gene analogue finder: a GRID solution for finding functionally analogous gene products.

Tulipano, Angelica; Donvito, Giacinto; Licciulli, Flavio; Maggi, Giorgio; Gisel, Andreas.

BMC Bioinformatics ; 8: 329, 2007 Sep 03.

Artigo em Inglês | MEDLINE | ID: mdl-17767718

RESUMO

BACKGROUND: To date more than 2,1 million gene products from more than 100000 different species have been described specifying their function, the processes they are involved in and their cellular localization using a very well defined and structured vocabulary, the gene ontology (GO). Such vast, well defined knowledge opens the possibility of compare gene products at the level of functionality, finding gene products which have a similar function or are involved in similar biological processes without relying on the conventional sequence similarity approach. Comparisons within such a large space of knowledge are highly data and computing intensive. For this reason this project was based upon the use of the computational GRID, a technology offering large computing and storage resources. RESULTS: We have developed a tool, GENe AnaloGue FINdEr (ENGINE) that parallelizes the search process and distributes the calculation and data over the computational GRID, splitting the process into many sub-processes and joining the calculation and the data on the same machine and therefore completing the whole search in about 3 days instead of occupying one single machine for more than 5 CPU years. The results of the functional comparison contain potential functional analogues for more than 79000 gene products from the most important species. 46% of the analyzed gene products are well enough described for such an analysis to individuate functional analogues, such as well-known members of the same gene family, or gene products with similar functions which would never have been associated by standard methods. CONCLUSION: ENGINE has produced a list of potential functionally analogous relations between gene products within and between species using, in place of the sequence, the gene description of the GO, thus demonstrating the potential of the GO. However, the current limiting factor is the quality of the associations of many gene products from non-model organisms that often have electronic associations, since experimental information is missing. With future improvements of the GO, this limit will be reduced. ENGINE will manifest its power when it is applied to the whole GODB of more than 2,1 million gene products from more than 100000 organisms. The data produced by this search is planed to be available as a supplement to the GO database as soon as we are able to provide regular updates.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Software , Algoritmos , Animais , Biologia Computacional/normas , Sistemas de Gerenciamento de Base de Dados/normas , Bases de Dados Genéticas/normas , Humanos , Software/normas

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA