Pesquisa | Portal Regional da BVS

The InterPro protein families and domains database: 20 years on.

Blum, Matthias; Chang, Hsin-Yu; Chuguransky, Sara; Grego, Tiago; Kandasaamy, Swaathi; Mitchell, Alex; Nuka, Gift; Paysan-Lafosse, Typhaine; Qureshi, Matloob; Raj, Shriya; Richardson, Lorna; Salazar, Gustavo A; Williams, Lowri; Bork, Peer; Bridge, Alan; Gough, Julian; Haft, Daniel H; Letunic, Ivica; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Necci, Marco; Orengo, Christine A; Pandurangan, Arun P; Rivoire, Catherine; Sigrist, Christian J A; Sillitoe, Ian; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Bateman, Alex; Finn, Robert D.

Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33156333

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.

Assuntos

Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , COVID-19/metabolismo , Internet , Anotação de Sequência Molecular , Domínios Proteicos , Mapas de Interação de Proteínas , SARS-CoV-2/metabolismo , Alinhamento de Sequência

Pfam: The protein families database in 2021.

Mistry, Jaina; Chuguransky, Sara; Williams, Lowri; Qureshi, Matloob; Salazar, Gustavo A; Sonnhammer, Erik L L; Tosatto, Silvio C E; Paladin, Lisanna; Raj, Shriya; Richardson, Lorna J; Finn, Robert D; Bateman, Alex.

Nucleic Acids Res ; 49(D1): D412-D419, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33125078

RESUMO

The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.

Assuntos

Biologia Computacional/estatística & dados numéricos , Bases de Dados de Proteínas , Proteínas/metabolismo , Proteoma/metabolismo , Animais , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Biologia Computacional/métodos , Epidemias , Humanos , Internet , Modelos Moleculares , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/genética , Proteoma/classificação , Proteoma/genética , Sequências Repetitivas de Aminoácidos/genética , SARS-CoV-2/genética , SARS-CoV-2/fisiologia , Análise de Sequência de Proteína/métodos

InterPro in 2019: improving coverage, classification and access to protein sequence annotations.

Mitchell, Alex L; Attwood, Teresa K; Babbitt, Patricia C; Blum, Matthias; Bork, Peer; Bridge, Alan; Brown, Shoshana D; Chang, Hsin-Yu; El-Gebali, Sara; Fraser, Matthew I; Gough, Julian; Haft, David R; Huang, Hongzhan; Letunic, Ivica; Lopez, Rodrigo; Luciani, Aurélien; Madeira, Fabio; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Necci, Marco; Nuka, Gift; Orengo, Christine; Pandurangan, Arun P; Paysan-Lafosse, Typhaine; Pesseat, Sebastien; Potter, Simon C; Qureshi, Matloob A; Rawlings, Neil D; Redaschi, Nicole; Richardson, Lorna J; Rivoire, Catherine; Salazar, Gustavo A; Sangrador-Vegas, Amaia; Sigrist, Christian J A; Sillitoe, Ian; Sutton, Granger G; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Yong, Siew-Yit; Finn, Robert D.

Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-30398656

RESUMO

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.

Assuntos

Bases de Dados de Proteínas , Anotação de Sequência Molecular , Animais , Bases de Dados Genéticas , Ontologia Genética , Humanos , Internet , Família Multigênica , Domínios Proteicos/genética , Homologia de Sequência de Aminoácidos , Software , Interface Usuário-Computador

The Pfam protein families database in 2019.

El-Gebali, Sara; Mistry, Jaina; Bateman, Alex; Eddy, Sean R; Luciani, Aurélien; Potter, Simon C; Qureshi, Matloob; Richardson, Lorna J; Salazar, Gustavo A; Smart, Alfredo; Sonnhammer, Erik L L; Hirsh, Layla; Paladin, Lisanna; Piovesan, Damiano; Tosatto, Silvio C E; Finn, Robert D.

Nucleic Acids Res ; 47(D1): D427-D432, 2019 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-30357350

RESUMO

The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.

Assuntos

Bases de Dados de Proteínas , Proteínas/classificação , Anotação de Sequência Molecular , Domínios Proteicos , Proteínas/química , Sequências Repetitivas de Aminoácidos

EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies.

Mitchell, Alex L; Scheremetjew, Maxim; Denise, Hubert; Potter, Simon; Tarkowska, Aleksandra; Qureshi, Matloob; Salazar, Gustavo A; Pesseat, Sebastien; Boland, Miguel A; Hunter, Fiona M I; Ten Hoopen, Petra; Alako, Blaise; Amid, Clara; Wilkinson, Darren J; Curtis, Thomas P; Cochrane, Guy; Finn, Robert D.

Nucleic Acids Res ; 46(D1): D726-D735, 2018 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-29069476

RESUMO

EBI metagenomics (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the analysis and archiving of sequence data derived from the microbial populations found in a particular environment. Over the past two years, EBI metagenomics has increased the number of datasets analysed 10-fold. In addition to increased throughput, the underlying analysis pipeline has been overhauled to include both new or updated tools and reference databases. Of particular note is a new workflow for taxonomic assignments that has been extended to include assignments based on both the large and small subunit RNA marker genes and to encompass all cellular micro-organisms. We also describe the addition of metagenomic assembly as a new analysis service. Our pilot studies have produced over 2400 assemblies from datasets in the public domain. From these assemblies, we have produced a searchable, non-redundant protein database of over 50 million sequences. To provide improved access to the data stored within the resource, we have developed a programmatic interface that provides access to the analysis results and associated sample metadata. Finally, we have integrated the results of a series of statistical analyses that provide estimations of diversity and sample comparisons.

Assuntos

Bases de Dados Genéticas , Metagenômica , Microbiota , Algoritmos , Sequência de Bases , Classificação/métodos , Conjuntos de Dados como Assunto , Metagenômica/métodos , RNA Arqueal/genética , RNA Bacteriano/genética , RNA Viral/genética , Ribotipagem , Software , Transcriptoma , Interface Usuário-Computador , Navegador , Fluxo de Trabalho

The Pfam protein families database: towards a more sustainable future.

Finn, Robert D; Coggill, Penelope; Eberhardt, Ruth Y; Eddy, Sean R; Mistry, Jaina; Mitchell, Alex L; Potter, Simon C; Punta, Marco; Qureshi, Matloob; Sangrador-Vegas, Amaia; Salazar, Gustavo A; Tate, John; Bateman, Alex.

Nucleic Acids Res ; 44(D1): D279-85, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26673716

RESUMO

In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

Assuntos

Bases de Dados de Proteínas , Proteínas/classificação , Proteoma/química , Alinhamento de Sequência , Análise de Sequência de Proteína , Anotação de Sequência Molecular

A software framework for microarray and gene expression object model (MAGE-OM) array design annotation.

Qureshi, Matloob; Ivens, Alasdair.

BMC Genomics ; 9: 133, 2008 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-18366695

RESUMO

BACKGROUND: The MIAME and MAGE-OM standards defined by the MGED society provide a specification and implementation of a software infrastructure to facilitate the submission and sharing of data from microarray studies via public repositories. However, although the MAGE object model is flexible enough to support different annotation strategies, the annotation of array descriptions can be complex. RESULTS: We have developed a graphical Java-based application (Adamant) to assist with submission of Microarray designs to public repositories. Output of the application is fully compliant with the standards prescribed by the various public data repositories. CONCLUSION: Adamant will allow researchers to annotate and submit their own array designs to public repositories without requiring programming expertise, knowledge of the MAGE-OM or XML. The application has been used to submit a number of ArrayDesigns to the Array Express database.

Assuntos

Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Biblioteca Gênica

Microarray-based comparative genomic analyses of the human malaria parasite Plasmodium falciparum using Affymetrix arrays.

Carret, Céline Karine; Horrocks, Paul; Konfortov, Bernard; Winzeler, Elizabeth; Qureshi, Matloob; Newbold, Chris; Ivens, Alasdair.

Mol Biochem Parasitol ; 144(2): 177-86, 2005 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-16174539

RESUMO

Microarray-based comparative genomic hybridization (CGH) provides a powerful tool for whole genome analyses and the rapid detection of genomic variation that underlies virulence and disease. In the field of Plasmodium research, many of the parasite genomes that one might wish to study in a high throughput manner are not laboratory clones, but clinical isolates. One of the key limitations to the use of clinical samples in CGH, however, is the miniscule amounts of genomic DNA available. Here we describe the successful application of multiple displacement amplification (MDA), a non-PCR-based amplification method that exhibits clear advantages over all other currently available methods. Using MDA, CGH was performed on a panel of NF54 and IT/FCR3 clones, identifying previously published deletions on chromosomes 2 and 9 as well as polymorphism in genes associated with disease pathology.

Assuntos

Genoma de Protozoário/genética , Técnicas de Amplificação de Ácido Nucleico/métodos , Plasmodium falciparum/genética , Animais , Cromossomos Humanos Par 2/genética , Cromossomos Humanos Par 9/genética , Deleção de Genes , Humanos , Malária Falciparum/parasitologia , Análise em Microsséries , Reação em Cadeia da Polimerase , Polimorfismo Genético

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA