Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Structure ; 25(3): 546-558, 2017 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-28190781

RESUMO

The related concepts of protein dynamics, conformational ensembles and allostery are often difficult to study with molecular dynamics (MD) due to the timescales involved. We present ExProSE (Exploration of Protein Structural Ensembles), a distance geometry-based method that generates an ensemble of protein structures from two input structures. ExProSE provides a unified framework for the exploration of protein structure and dynamics in a fast and accessible way. Using a dataset of apo/holo pairs it is shown that existing coarse-grained methods often cannot span large conformational changes. For T4-lysozyme, ExProSE is able to generate ensembles that are more native-like than tCONCOORD and NMSim, and comparable with targeted MD. By adding additional constraints representing potential modulators, ExProSE can predict allosteric sites. ExProSE ranks an allosteric pocket first or second for 27 out of 58 allosteric proteins, which is similar and complementary to existing methods. The ExProSE source code is freely available.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Regulação Alostérica , Sítios de Ligação , Modelos Moleculares , Simulação de Dinâmica Molecular , Conformação Proteica
2.
Plant J ; 88(4): 633-647, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27472661

RESUMO

Cucurbits are well-studied models for phloem biology but unusually possess both fascicular phloem (FP) within vascular bundles and additional extrafascicular phloem (EFP). Although the functional differences between the two systems are not yet clear, sugar analysis and limited protein profiling have established that FP and EFP have divergent compositions. Here we report a detailed comparative proteomics study of FP and EFP in two cucurbits, pumpkin and cucumber. We re-examined the sites of exudation by video microscopy, and confirmed that in both species, the spontaneous exudate following tissue cutting derives almost exclusively from EFP. Comparative gel electrophoresis and mass spectrometry-based proteomics of exudates, sieve element contents and microdissected stem tissues established that EFP and FP profiles are highly dissimilar, and that there are also species differences. Searches against cucurbit databases enabled identification of more than 300 FP proteins from each species. Few of the detected proteins (about 10%) were shared between the sieve element contents of FP and EFP, and enriched Gene Ontology categories also differed. To explore quantitative differences in the proteomes, we developed multiple reaction monitoring methods for cucumber proteins that are representative markers for FP or EFP and assessed exudate composition at different times after tissue cutting. Based on failure to detect FP markers in exudate samples, we conclude that FP is blocked very rapidly and therefore makes a minimal contribution to the exudates. Overall, the highly divergent contents of FP and EFP indicate that they are substantially independent vascular compartments.


Assuntos
Cucurbita/metabolismo , Floema/metabolismo , Proteômica/métodos , Cucumis sativus/metabolismo , Proteínas de Plantas/metabolismo
3.
Genome Med ; 7: 95, 2015 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-26330083

RESUMO

BACKGROUND: Each cell type found within the human body performs a diverse and unique set of functions, the disruption of which can lead to disease. However, there currently exists no systematic mapping between cell types and the diseases they can cause. METHODS: In this study, we integrate protein-protein interaction data with high-quality cell-type-specific gene expression data from the FANTOM5 project to build the largest collection of cell-type-specific interactomes created to date. We develop a novel method, called gene set compactness (GSC), that contrasts the relative positions of disease-associated genes across 73 cell-type-specific interactomes to map genes associated with 196 diseases to the cell types they affect. We conduct text-mining of the PubMed database to produce an independent resource of disease-associated cell types, which we use to validate our method. RESULTS: The GSC method successfully identifies known disease-cell-type associations, as well as highlighting associations that warrant further study. This includes mast cells and multiple sclerosis, a cell population currently being targeted in a multiple sclerosis phase 2 clinical trial. Furthermore, we build a cell-type-based diseasome using the cell types identified as manifesting each disease, offering insight into diseases linked through etiology. CONCLUSIONS: The data set produced in this study represents the first large-scale mapping of diseases to the cell types in which they are manifested and will therefore be useful in the study of disease systems. Overall, we demonstrate that our approach links disease-associated genes to the phenotypes they produce, a key goal within systems medicine.


Assuntos
Predisposição Genética para Doença , Células , Mineração de Dados , Bases de Dados Genéticas , Doença/genética , Expressão Gênica , Humanos , Fenótipo , Mapas de Interação de Proteínas
4.
Nucleic Acids Res ; 43(Database issue): D382-6, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25348407

RESUMO

Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Algoritmos , Genômica , Internet , Modelos Moleculares , Estrutura Terciária de Proteína/genética , Análise de Sequência de Proteína
5.
J Mol Biol ; 426(14): 2692-701, 2014 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-24810707

RESUMO

Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html.


Assuntos
Substituição de Aminoácidos , Suscetibilidade a Doenças , Proteínas/química , Software , Criança , Maus-Tratos Infantis , Biologia Computacional/métodos , Humanos , Modelos Moleculares , Mutação de Sentido Incorreto , Fenótipo , Conformação Proteica , Proteínas/genética , Proteínas/metabolismo
6.
Plant J ; 75(6): 1039-49, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23725542

RESUMO

Plant organs are made from multiple cell types, and defining the expression level of a gene in any one cell or group of cells from a complex mixture is difficult. Dicotyledonous plants normally have three distinct layers of cells, L1, L2 and L3. Layer L1 is the single layer of cells making up the epidermis, layer L2 the single cell sub-epidermal layer and layer L3 constitutes the rest of the internal cells. Here we show how it is possible to harvest an organ and characterise the level of layer-specific expression by using a periclinal chimera that has its L1 layer from Solanum pennellii and its L2 and L3 layers from Solanum lycopersicum. This is possible by measuring the level of the frequency of species-specific transcripts. RNA-seq analysis enabled the genome-wide assessment of whether a gene is expressed in the L1 or L2/L3 layers. From 13 277 genes that are expressed in both the chimera and the parental lines and with at least one polymorphism between the parental alleles, we identified 382 genes that are preferentially expressed in L1 in contrast to 1159 genes in L2/L3. Gene ontology analysis shows that many genes preferentially expressed in L1 are involved in cutin and wax biosynthesis, whereas numerous genes that are preferentially expressed in L2/L3 tissue are associated with chloroplastic processes. These data indicate the use of such chimeras and provide detailed information on the level of layer-specific expression of genes.


Assuntos
Solanum lycopersicum/genética , Solanum lycopersicum/metabolismo , Quimera , Regulação da Expressão Gênica de Plantas , Genoma de Planta , Solanum lycopersicum/citologia , Anotação de Sequência Molecular , Epiderme Vegetal/genética , Epiderme Vegetal/metabolismo , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de RNA
7.
Philos Trans A Math Phys Eng Sci ; 371(1983): 20120073, 2013 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-23230157

RESUMO

Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.


Assuntos
Algoritmos , Metodologias Computacionais , Internet , Ciência/métodos , Software
8.
Nucleic Acids Res ; 41(Database issue): D499-507, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23203986

RESUMO

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Genômica , Humanos , Internet , Anotação de Sequência Molecular , Proteínas/química , Proteínas/classificação , Proteínas/genética , Software
9.
BMC Bioinformatics ; 11: 283, 2010 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-20507547

RESUMO

BACKGROUND: Contact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a protein's fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the model's simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact map's biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure. RESULTS: We use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11A around the Cbeta atoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2A RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity. CONCLUSIONS: Thus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína
10.
PLoS Comput Biol ; 5(12): e1000584, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19997489

RESUMO

The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this "structural essence" has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts-such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed "cone-peeling" that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 A Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This "structural essence" opens new avenues in the fields of structure prediction, empirical potentials and docking.


Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Conformação Proteica , Proteínas/química , Algoritmos , Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas
11.
Curr Opin Biotechnol ; 20(4): 437-46, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19713097

RESUMO

Novel high-throughput technologies for directed evolution enable experimental coverage of an impressive number of sequences. Nevertheless, the success of such experiments hinges on the initial sequence libraries. Here we consider the computational design of smart focused libraries and review insights from experimental strategies and theoretic advances in modelling their energy landscapes. In library design as in structure prediction, the applied energy function is the key. Current knowledge-based potentials have proven more successful than purely physics-based ones. Here we summarize novel approaches that extend the classical pairwise treatment of residue contacts towards adaptive knowledge-based multi-body potentials. We suggest that minimal sets of probabilistic constraints will lead to much more efficient sampling of permissible conformations and sequence space.


Assuntos
Técnicas de Química Combinatória , Evolução Molecular Direcionada , Avaliação Pré-Clínica de Medicamentos , Modelos Moleculares , Probabilidade , Termodinâmica
12.
PLoS One ; 4(6): e5967, 2009 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-19557139

RESUMO

Much attention has recently been given to the statistical significance of topological features observed in biological networks. Here, we consider residue interaction graphs (RIGs) as network representations of protein structures with residues as nodes and inter-residue interactions as edges. Degree-preserving randomized models have been widely used for this purpose in biomolecular networks. However, such a single summary statistic of a network may not be detailed enough to capture the complex topological characteristics of protein structures and their network counterparts. Here, we investigate a variety of topological properties of RIGs to find a well fitting network null model for them. The RIGs are derived from a structurally diverse protein data set at various distance cut-offs and for different groups of interacting atoms. We compare the network structure of RIGs to several random graph models. We show that 3-dimensional geometric random graphs, that model spatial relationships between objects, provide the best fit to RIGs. We investigate the relationship between the strength of the fit and various protein structural features. We show that the fit depends on protein size, structural class, and thermostability, but not on quaternary structure. We apply our model to the identification of significantly over-represented structural building blocks, i.e., network motifs, in protein structure networks. As expected, choosing geometric graphs as a null model results in the most specific identification of motifs. Our geometric random graph model may facilitate further graph-based studies of protein conformation space and have important implications for protein structure comparison and prediction. The choice of a well-fitting null model is crucial for finding structural motifs that play an important role in protein folding, stability and function. To our knowledge, this is the first study that addresses the challenge of finding an optimized null model for RIGs, by comparing various RIG definitions against a series of network models.


Assuntos
Proteínas/química , Proteômica/métodos , Algoritmos , Motivos de Aminoácidos , Animais , Análise por Conglomerados , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Modelos Teóricos , Dobramento de Proteína , Proteoma
13.
BMC Bioinformatics ; 9: 517, 2008 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-19055796

RESUMO

BACKGROUND: Identifying the active site of an enzyme is a crucial step in functional studies. While protein sequences and structures can be experimentally characterized, determining which residues build up an active site is not a straightforward process. In the present study a new method for the detection of protein active sites is introduced. This method uses local network descriptors derived from protein three-dimensional structures to determine whether a residue is part of an active site. It thus does not involve any sequence alignment or structure similarity to other proteins. A scoring function is elaborated over a set of more than 220 proteins having different structures and functions, in order to detect protein catalytic sites with a high precision, i.e. with a minimal rate of false positives. RESULTS: The scoring function was based on the counts of first-neighbours on side-chain contacts, third-neighbours and residue type. Precision of the detection using this function was 28.1%, which represents a more than three-fold increase compared to combining closeness centrality with residue surface accessibility, a function which was proposed in recent years. The performance of the scoring function was also analysed into detail over a smaller set of eight proteins. For the detection of 'functional' residues, which were involved either directly in catalytic activity or in the binding of substrates, precision reached a value of 72.7% on this second set. These results suggested that our scoring function was effective at detecting not only catalytic residues, but also any residue that is part of the functional site of a protein. CONCLUSION: As having been validated on the majority of known structural families, this method should prove useful for the detection of active sites in any protein with unknown function, and for direct application to the design of site-directed mutagenesis experiments.


Assuntos
Biologia Computacional/métodos , Algoritmos , Animais , Catálise , Domínio Catalítico , Humanos , Modelos Biológicos , Modelos Estatísticos , Conformação Molecular , Conformação Proteica , Dobramento de Proteína , Reprodutibilidade dos Testes , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Ubiquitina/química
14.
BMC Struct Biol ; 8: 53, 2008 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-19063740

RESUMO

BACKGROUND: For over 30 years potentials of mean force have been used to evaluate the relative energy of protein structures. The most commonly used potentials define the energy of residue-residue interactions and are derived from the empirical analysis of the known protein structures. However, single-body residue 'environment' potentials, although widely used in protein structure analysis, have not been rigorously compared to these classical two-body residue-residue interaction potentials. Here we do not try to combine the two different types of residue interaction potential, but rather to assess their independent contribution to scoring protein structures. RESULTS: A data set of nearly three thousand monomers was used to compare pairwise residue-residue 'contact-type' propensities to single-body residue 'contact-count' propensities. Using a large and standard set of protein decoys we performed an in-depth comparison of these two types of residue interaction propensities. The scores derived from the contact-type and contact-count propensities were assessed using two different performance metrics and were compared using 90 different definitions of residue-residue contact. Our findings show that both types of score perform equally well on the task of discriminating between near-native protein decoys. However, in a statistical sense, the contact-count based scores were found to carry more information than the contact-type based scores. CONCLUSION: Our analysis has shown that the performance of either type of score is very similar on a range of different decoys. This similarity suggests a common underlying biophysical principle for both types of residue interaction propensity. However, several features of the contact-count based propensity suggests that it should be used in preference to the contact-type based propensity. Specifically, it has been shown that contact-counts can be predicted from sequence information alone. In addition, the use of a single-body term allows for efficient alignment strategies using dynamic programming, which is useful for fold recognition, for example. These facts, combined with the relative simplicity of the contact-count propensity, suggests that contact-counts should be studied in more detail in the future.


Assuntos
Aminoácidos/química , Proteínas/química , Bases de Dados de Proteínas , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...