Your browser doesn't support javascript.
loading
Seqrutinator: scrutiny of large protein superfamily sequence datasets for the identification and elimination of non-functional homologues.
Amalfitano, Agustín; Stocchi, Nicolás; Atencio, Hugo Marcelo; Villarreal, Fernando; Ten Have, Arjen.
Afiliação
  • Amalfitano A; Laboratorio de Procesamiento de Imágenes, ICyTE-CONICET-UNMdP, Mar del Plata, Argentina.
  • Stocchi N; Computational Biology and Comparative Genomics, IIB-CONICET-UNMdP, Mar del Plata, Argentina.
  • Atencio HM; Banco Activo de Germoplasma de Papa Andina, EEA-Balcarce INTA, Balcarce, Argentina.
  • Villarreal F; Computational Biology and Comparative Genomics, IIB-CONICET-UNMdP, Mar del Plata, Argentina. fernandovillarreal@mdp.edu.ar.
  • Ten Have A; Computational Biology and Comparative Genomics, IIB-CONICET-UNMdP, Mar del Plata, Argentina.
Genome Biol ; 25(1): 230, 2024 Aug 26.
Article em En | MEDLINE | ID: mdl-39187866
ABSTRACT
Seqrutinator is an objective, flexible pipeline that removes sequences with sequencing and/or gene model errors and sequences from pseudogenes from complex, eukaryotic protein superfamilies. Testing Seqrutinator on major superfamilies BAHD, CYP, and UGT removes only 1.94% of SwissProt entries, 14% of entries from the model plant Arabidopsis thaliana, but 80% of entries from Pinus taeda's recent complete proteome. Application of Seqrutinator on crude BAHDomes, CYPomes, and UGTomes obtained from 16 plant proteomes shows convergence of the numbers of paralogues. MSAs, phylogenies, and particularly functional clustering improve drastically upon Seqrutinator application, indicating good performance.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas de Plantas Idioma: En Revista: Genome Biol Assunto da revista: BIOLOGIA MOLECULAR / GENETICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Argentina País de publicação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Proteínas de Plantas Idioma: En Revista: Genome Biol Assunto da revista: BIOLOGIA MOLECULAR / GENETICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Argentina País de publicação: Reino Unido