Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Nucleic Acids Res ; 52(W1): W287-W293, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38747351

RESUMO

The PSIRED Workbench is a long established and popular bioinformatics web service offering a wide range of machine learning based analyses for characterizing protein structure and function. In this paper we provide an update of the recent additions and developments to the webserver, with a focus on new Deep Learning based methods. We briefly discuss some trends in server usage since the publication of AlphaFold2 and we give an overview of some upcoming developments for the service. The PSIPRED Workbench is available at http://bioinf.cs.ucl.ac.uk/psipred.


Assuntos
Aprendizado Profundo , Proteínas , Software , Proteínas/química , Proteínas/genética , Internet , Conformação Proteica , Biologia Computacional/métodos , Análise de Sequência de Proteína/métodos
2.
Brief Funct Genomics ; 23(4): 441-451, 2024 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-38242863

RESUMO

Cell type identification is an important task for single-cell RNA-sequencing (scRNA-seq) data analysis. Many prediction methods have recently been proposed, but the predictive accuracy of difficult cell type identification tasks is still low. In this work, we proposed a novel Gaussian noise augmentation-based scRNA-seq contrastive learning method (GsRCL) to learn a type of discriminative feature representations for cell type identification tasks. A large-scale computational evaluation suggests that GsRCL successfully outperformed other state-of-the-art predictive methods on difficult cell type identification tasks, while the conventional random genes masking augmentation-based contrastive learning method also improved the accuracy of easy cell type identification tasks in general.


Assuntos
RNA-Seq , Análise de Célula Única , Análise de Célula Única/métodos , RNA-Seq/métodos , Distribuição Normal , Aprendizado de Máquina , Humanos , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos , Algoritmos , Análise da Expressão Gênica de Célula Única
3.
Orthopedics ; 44(4): 229-234, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34292808

RESUMO

Unstable pelvic ring disruption is most commonly treated with closed reduction and percutaneous screw fixation. Traditional methods involve screw placement under fluoroscopic imaging, but with recent technologic advances, intraoperative 3D navigation can now be used to help with the insertion of sacroiliac screws. Various cadaver studies have shown that placement of sacroiliac screws under 3D navigation is more accurate than placement under traditional fluoroscopic guidance. This retrospective review of 134 patients evaluated the clinical use of 3D navigation vs traditional fluoroscopy for sacroiliac screw insertion at an urban level I trauma center. Analysis of surgical data showed a significantly longer imaging time with the conventional method compared with the more experimental 3D navigation (204.06 seconds vs 66.90 seconds, P<.01). Further, a significantly larger radiation dose to both the patient and the staff was seen with traditional fluoroscopy (80.1 mGy for each) compared with that of 3D navigation (39.0 mGy and 25.1 mGy, respectively). No statistically significant difference was seen for outcome or follow-up variables between the 2 extrapolated groups. These variables included length of hospital stay, infection, nerve injury, and hardware breakage. The authors advocate that 3D navigated sacroiliac screws are safe and effective for pelvic ring stabilization; this method may be especially applicable in certain difficult imaging situations, such as morbid obesity, bowel gas interference, and overlapping pelvic structures that make the sacral corridor difficult to discern with traditional 2D fluoroscopy. Safe placement of transiliac-transsacral screws (P<.01) occurred with 3D navigation, and there was a statistically significant increase in adequate screw placement in multiple sacral segments compared with single-level stabilization (P<.01). [Orthopedics. 2021;44(4):229-234.].


Assuntos
Fraturas Ósseas , Cirurgia Assistida por Computador , Parafusos Ósseos , Fluoroscopia , Fixação Interna de Fraturas , Fraturas Ósseas/diagnóstico por imagem , Fraturas Ósseas/cirurgia , Humanos , Imageamento Tridimensional , Estudos Retrospectivos
4.
Methods Mol Biol ; 2165: 27-67, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32621218

RESUMO

Genome3D consortium is a collaborative project involving protein structure prediction and annotation resources developed by six world-leading structural bioinformatics groups, based in the United Kingdom (namely Blundell, Murzin, Gough, Sternberg, Orengo, and Jones). The main objective of Genome3D serves as a common portal to provide both predicted models and annotations of proteins in model organisms, using several resources developed by these labs such as CATH-Gene3D, DOMSERF, pDomTHREADER, PHYRE, SUPERFAMILY, FUGUE/TOCATTA, and VIVACE. These resources primarily use SCOP- and/or CATH-based protein domain assignments. Another objective of Genome3D is to compare structural classifications of protein domains in CATH and SCOP databases and to provide a consensus mapping of CATH and SCOP protein superfamilies. CATH/SCOP mapping analyses led to the identification of total of 1429 consensus superfamilies.Currently, Genome3D provides structural annotations for ten model organisms, including Homo sapiens, Arabidopsis thaliana, Mus musculus, Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Plasmodium falciparum, Staphylococcus aureus, and Schizosaccharomyces pombe. Thus, Genome3D serves as a common gateway to each structure prediction/annotation resource and allows users to perform comparative assessment of the predictions. It, thus, assists researchers to broaden their perspective on structure/function predictions of their query protein of interest in selected model organisms.


Assuntos
Genômica/organização & administração , Bases de Conhecimento , Anotação de Sequência Molecular/métodos , Proteoma/química , Animais , Arabidopsis , Genoma , Genômica/métodos , Humanos , Disseminação de Informação , Alinhamento de Sequência/métodos , Reino Unido , Leveduras
5.
Nucleic Acids Res ; 48(D1): D314-D319, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31733063

RESUMO

Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.


Assuntos
Proteínas/química , Bases de Dados de Proteínas , Proteínas/classificação , Proteínas/genética , Interface Usuário-Computador
6.
Proteins ; 88(4): 616-624, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31703152

RESUMO

In this paper, using Word2vec, a widely-used natural language processing method, we demonstrate that protein domains may have a learnable implicit semantic "meaning" in the context of their functional contributions to the multi-domain proteins in which they are found. Word2vec is a group of models which can be used to produce semantically meaningful embeddings of words or tokens in a fixed-dimension vector space. In this work, we treat multi-domain proteins as "sentences" where domain identifiers are tokens which may be considered as "words." Using all InterPro (Finn et al. 2017) pfam domain assignments we observe that the embedding could be used to suggest putative GO assignments for Pfam (Finn et al. 2016) domains of unknown function.


Assuntos
Anotação de Sequência Molecular/métodos , Processamento de Linguagem Natural , Proteínas/química , Semântica , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Ontologia Genética , Humanos , Domínios Proteicos , Proteínas/fisiologia
7.
Nucleic Acids Res ; 47(W1): W402-W407, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31251384

RESUMO

The PSIPRED Workbench is a web server offering a range of predictive methods to the bioscience community for 20 years. Here, we present the work we have completed to update the PSIPRED Protein Analysis Workbench and make it ready for the next 20 years. The main focus of our recent website upgrade work has been the acceleration of analyses in the face of increasing protein sequence database size. We additionally discuss any new software, the new hardware infrastructure, our webservices and web site. Lastly we survey updates to some of the key predictive algorithms available through our website.


Assuntos
Ontologia Genética/tendências , Anotação de Sequência Molecular/métodos , Proteínas/química , Software/história , Sequência de Aminoácidos , Sítios de Ligação , Ontologia Genética/história , História do Século XXI , Internet , Modelos Moleculares , Anotação de Sequência Molecular/história , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Proteínas/história , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
8.
JBJS Case Connect ; 9(1): e6, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30676344

RESUMO

CASE: Comminuted fractures of the capitate, in the absence of associated carpal injuries, are exceedingly rare. Treatment of this complex injury is not well-documented in the literature. We describe the case of a comminuted capitate fracture that was successfully managed with Kirschner wire fixation. CONCLUSION: Based on this case and a review of the literature, management of a comminuted capitate fracture with Kirschner wire fixation can lead to successful treatment and positive patient outcomes.


Assuntos
Capitato , Fixação Interna de Fraturas , Fraturas Cominutivas , Acidentes de Trânsito , Adulto , Fios Ortopédicos , Capitato/diagnóstico por imagem , Capitato/lesões , Capitato/cirurgia , Feminino , Fixação Interna de Fraturas/instrumentação , Fixação Interna de Fraturas/métodos , Fraturas Cominutivas/diagnóstico por imagem , Fraturas Cominutivas/cirurgia , Humanos , Adulto Jovem
9.
Proteins ; 86 Suppl 1: 78-83, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-28901583

RESUMO

In this paper, we present the results for the MetaPSICOV2 contact prediction server in the CASP12 community experiment (http://predictioncenter.org). Over the 35 assessed Free Modelling target domains the MetaPSICOV2 server achieved a mean precision of 43.27%, a substantial increase relative to the server's performance in the CASP11 experiment. In the following paper, we discuss improvements to the MetaPSICOV2 server, covering both changes to the neural network and attempts to integrate contact predictions on a domain basis into the prediction pipeline. We also discuss some limitations in the CASP12 assessment which may have overestimated the performance of our method.


Assuntos
Biologia Computacional/métodos , Internet , Aprendizado de Máquina , Modelos Moleculares , Redes Neurais de Computação , Conformação Proteica , Proteínas/química , Algoritmos , Cristalografia por Raios X , Humanos , Domínios e Motivos de Interação entre Proteínas , Software
10.
Sci Rep ; 7(1): 6999, 2017 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-28765603

RESUMO

Intrinsically disordaered proteins (IDPs) are a prevalent phenomenon with over 30% of human proteins estimated to have long disordered regions. Computational methods are widely used to study IDPs, however, nearly all treat disorder in a binary fashion, not accounting for the structural heterogeneity present in disordered regions. Here, we present a new de novo method, FRAGFOLD-IDP, which addresses this problem. Using 200 protein structural ensembles derived from NMR, we show that FRAGFOLD-IDP achieves superior results compared to methods which can predict related data (NMR order parameter, or crystallographic B-factor). FRAGFOLD-IDP produces very good predictions for 33.5% of cases and helps to get a better insight into the dynamics of the disordered ensembles. The results also show it is not necessary to predict the correct fold of the protein to reliably predict per-residue fluctuations. It implies that disorder is a local property and it does not depend on the fold. Our results are orthogonal to DynaMine, the only other method significantly better than the naïve prediction. We therefore combine these two using a neural network. FRAGFOLD-IDP enables better insight into backbone dynamics in IDPs and opens exciting possibilities for the design of disordered ensembles, disorder-to-order transitions, or design for protein dynamics.


Assuntos
Biologia Computacional/métodos , Proteínas Intrinsicamente Desordenadas/química , Biologia Molecular/métodos , Cristalografia por Raios X , Espectroscopia de Ressonância Magnética , Modelos Moleculares , Redes Neurais de Computação
11.
Bioinformatics ; 33(17): 2684-2690, 2017 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-28419258

RESUMO

MOTIVATION: Protein fold recognition when appropriate, evolutionarily-related, structural templates can be identified is often trivial and may even be viewed as a solved problem. However in cases where no homologous structural templates can be detected, fold recognition is a notoriously difficult problem ( Moult et al., 2014 ). Here we present EigenTHREADER, a novel fold recognition method capable of identifying folds where no homologous structures can be identified. EigenTHREADER takes a query amino acid sequence, generates a map of intra-residue contacts, and then searches a library of contact maps of known structures. To allow the contact maps to be compared, we use eigenvector decomposition to resolve the principal eigenvectors these can then be aligned using standard dynamic programming algorithms. The approach is similar to the Al-Eigen approach of Di Lena et al. (2010) , but with improvements made both to speed and accuracy. With this search strategy, EigenTHREADER does not depend directly on sequence homology between the target protein and entries in the fold library to generate models. This in turn enables EigenTHREADER to correctly identify analogous folds where little or no sequence homology information is. RESULTS: EigenTHREADER outperforms well-established fold recognition methods such as pGenTHREADER and HHSearch in terms of True Positive Rate in the difficult task of analogous fold recognition. This should allow template-based modelling to be extended to many new protein families that were previously intractable to homology based fold recognition methods. AVAILABILITY AND IMPLEMENTATION: All code used to generate these results and the computational protocol can be downloaded from https://github.com/DanBuchan/eigen_scripts . EigenTHREADER, the benchmark code and the data this paper is based on can be downloaded from: http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/ . CONTACT: d.t.jones@ucl.ac.uk.


Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Dobramento de Proteína , Análise de Sequência de Proteína/métodos , Software , Algoritmos
12.
Nucleic Acids Res ; 43(Database issue): D382-6, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25348407

RESUMO

Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Algoritmos , Genômica , Internet , Modelos Moleculares , Estrutura Terciária de Proteína/genética , Análise de Sequência de Proteína
13.
Nucleic Acids Res ; 41(Web Server issue): W349-57, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23748958

RESUMO

Here, we present the new UCL Bioinformatics Group's PSIPRED Protein Analysis Workbench. The Workbench unites all of our previously available analysis methods into a single web-based framework. The new web portal provides a greatly streamlined user interface with a number of new features to allow users to better explore their results. We offer a number of additional services to enable computationally scalable execution of our prediction methods; these include SOAP and XML-RPC web server access and new HADOOP packages. All software and services are available via the UCL Bioinformatics Group website at http://bioinf.cs.ucl.ac.uk/.


Assuntos
Conformação Proteica , Software , Animais , Internet , Proteínas de Membrana/química , Camundongos , Proteínas/química , Análise de Sequência de Proteína , Homologia Estrutural de Proteína
14.
BMC Bioinformatics ; 14 Suppl 3: S1, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23514099

RESUMO

BACKGROUND: Accurate protein function annotation is a severe bottleneck when utilizing the deluge of high-throughput, next generation sequencing data. Keeping database annotations up-to-date has become a major scientific challenge that requires the development of reliable automatic predictors of protein function. The CAFA experiment provided a unique opportunity to undertake comprehensive 'blind testing' of many diverse approaches for automated function prediction. We report on the methodology we used for this challenge and on the lessons we learnt. METHODS: Our method integrates into a single framework a wide variety of biological information sources, encompassing sequence, gene expression and protein-protein interaction data, as well as annotations in UniProt entries. The methodology transfers functional categories based on the results from complementary homology-based and feature-based analyses. We generated the final molecular function and biological process assignments by combining the initial predictions in a probabilistic manner, which takes into account the Gene Ontology hierarchical structure. RESULTS: We propose a novel scoring function called COmbined Graph-Information Content similarity (COGIC) score for the comparison of predicted functional categories and benchmark data. We demonstrate that our integrative approach provides increased scope and accuracy over both the component methods and the naïve predictors. In line with previous studies, we find that molecular function predictions are more accurate than biological process assignments. CONCLUSIONS: Overall, the results indicate that there is considerable room for improvement in the field. It still remains for the community to invest a great deal of effort to make automated function prediction a useful and routine component in the toolbox of life scientists. As already witnessed in other areas, community-wide blind testing experiments will be pivotal in establishing standards for the evaluation of prediction accuracy, in fostering advancements and new ideas, and ultimately in recording progress.


Assuntos
Proteínas/fisiologia , Biologia Computacional/métodos , Bases de Dados de Proteínas , Evolução Molecular , Expressão Gênica , Anotação de Sequência Molecular , Mapeamento de Interação de Proteínas , Proteínas/química , Proteínas/genética , Análise de Sequência
15.
Nat Methods ; 10(3): 221-7, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23353650

RESUMO

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.


Assuntos
Biologia Computacional/métodos , Biologia Molecular/métodos , Anotação de Sequência Molecular , Proteínas/fisiologia , Algoritmos , Animais , Bases de Dados de Proteínas , Exorribonucleases/classificação , Exorribonucleases/genética , Exorribonucleases/fisiologia , Previsões , Humanos , Proteínas/química , Proteínas/classificação , Proteínas/genética , Especificidade da Espécie
16.
Nucleic Acids Res ; 41(Database issue): D499-507, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23203986

RESUMO

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Genômica , Humanos , Internet , Anotação de Sequência Molecular , Proteínas/química , Proteínas/classificação , Proteínas/genética , Software
17.
Bioinformatics ; 28(2): 184-90, 2012 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-22101153

RESUMO

MOTIVATION: The accurate prediction of residue-residue contacts, critical for maintaining the native fold of a protein, remains an open problem in the field of structural bioinformatics. Interest in this long-standing problem has increased recently with algorithmic improvements and the rapid growth in the sizes of sequence families. Progress could have major impacts in both structure and function prediction to name but two benefits. Sequence-based contact predictions are usually made by identifying correlated mutations within multiple sequence alignments (MSAs), most commonly through the information-theoretic approach of calculating mutual information between pairs of sites in proteins. These predictions are often inaccurate because the true covariation signal in the MSA is often masked by biases from many ancillary indirect-coupling or phylogenetic effects. Here we present a novel method, PSICOV, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. Our method builds on work which had previously demonstrated corrections for phylogenetic and entropic correlation noise and allows accurate discrimination of direct from indirectly coupled mutation correlations in the MSA. RESULTS: PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks. For 118 out of 150 targets, the L/5 (i.e. top-L/5 predictions for a protein of length L) precision for long-range contacts (sequence separation >23) was ≥ 0.5, which represents an improvement sufficient to be of significant benefit in protein structure prediction or model quality assessment. AVAILABILITY: The PSICOV source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/PSICOV.


Assuntos
Algoritmos , Proteínas/química , Alinhamento de Sequência/métodos , Teorema de Bayes , Mutação , Filogenia , Proteínas/genética
18.
PLoS One ; 6(2): e16774, 2011 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-21386962

RESUMO

Protein-protein interactions are critically dependent on just a few 'hot spot' residues at the interface. Hot spots make a dominant contribution to the free energy of binding and they can disrupt the interaction if mutated to alanine. Here, we present HSPred, a support vector machine(SVM)-based method to predict hot spot residues, given the structure of a complex. HSPred represents an improvement over a previously described approach (Lise et al, BMC Bioinformatics 2009, 10:365). It achieves higher accuracy by treating separately predictions involving either an arginine or a glutamic acid residue. These are the amino acid types on which the original model did not perform well. We have therefore developed two additional SVM classifiers, specifically optimised for these cases. HSPred reaches an overall precision and recall respectively of 61% and 69%, which roughly corresponds to a 10% improvement. An implementation of the described method is available as a web server at http://bioinf.cs.ucl.ac.uk/hspred. It is free to non-commercial users.


Assuntos
Motivos de Aminoácidos/fisiologia , Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Análise de Sequência de Proteína/métodos , Software , Biologia Computacional/métodos , Previsões , Humanos , Interleucina-4/química , Interleucina-4/metabolismo , Modelos Biológicos , Modelos Moleculares , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas/fisiologia , Mapeamento de Interação de Proteínas/instrumentação , Receptores de Interleucina-4/química , Receptores de Interleucina-4/metabolismo , Análise de Sequência de Proteína/instrumentação
19.
J Mol Biol ; 336(4): 871-87, 2004 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-15095866

RESUMO

We present the structural annotation of 56 different bacterial species based on the assignment of genes to 816 evolutionary superfamilies in the CATH domain structure database. These assignments have enabled us to analyse the recurrence of specific superfamilies within and across the genomes. We have selected the superfamilies that have a very broad representation and therefore appear to be universally distributed in a significant number of bacterial lineages. Occurrence profiles of these universally distributed superfamilies are compared with genome size in order to estimate the correlation between superfamily duplication and the increase in proteome size. This distinguishes between those size-dependent superfamilies where frequency of occurrence is highly correlated with increase in genome size, and size-independent superfamilies where no correlation is observed. Consideration of the size correlation and the ratio between the mean and the standard deviations for all the superfamily profiles allows more detailed subdivisions and classification of superfamilies. For example, within the size-independent superfamilies, we distinguished a group that are distributed evenly amongst all the genomes. Within the size-dependent superfamilies we differentiated two groups: linearly distributed and non-linearly distributed. Functional annotation using the COG database was performed for all superfamilies in each of these groups, and this revealed significant differences amongst the three sets of superfamilies. Evenly distributed, size-independent domains are shown to be involved primarily in protein translation and biosynthesis. For the size-dependent superfamilies, linearly distributed superfamilies are involved mainly in metabolism, and non-linearly distributed superfamily domains are involved principally in gene regulation.


Assuntos
Evolução Molecular , Genoma Bacteriano , Proteínas/classificação , Proteínas/genética , Bases de Dados de Proteínas , Fases de Leitura Aberta , Conformação Proteica , Proteínas/química , Estatística como Assunto
20.
Curr Opin Struct Biol ; 13(3): 359-69, 2003 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12831888

RESUMO

Protein translations of over 100 complete genomes are now available. About half of these sequences can be provided with structural annotation, thereby enabling some profound insights into protein and pathway evolution. Whereas the major domain structure families are common to all kingdoms of life, these are combined in different ways in multidomain proteins to give various domain architectures that are specific to kingdoms or individual genomes, and contribute to the diverse phenotypes observed. These data argue for more targets in structural genomics initiatives and particularly for the selection of different domain architectures to gain better insights into protein functions.


Assuntos
Evolução Molecular , Genoma , Conformação Proteica , Estrutura Terciária de Proteína , Filogenia , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...