Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 116(22): 10734-10743, 2019 05 28.
Artigo em Inglês | MEDLINE | ID: mdl-30992374

RESUMO

While studying spontaneous mutations at the maize bronze (bz) locus, we made the unexpected discovery that specific low-copy number retrotransposons are mobile in the pollen of some maize lines, but not of others. We conducted large-scale genetic experiments to isolate new bz mutations from several Bz stocks and recovered spontaneous stable mutations only in the pollen parent in reciprocal crosses. Most of the new stable bz mutations resulted from either insertions of low-copy number long terminal repeat (LTR) retrotransposons or deletions, the same two classes of mutations that predominated in a collection of spontaneous wx mutations [Wessler S (1997) The Mutants of Maize, pp 385-386]. Similar mutations were recovered at the closely linked sh locus. These events occurred with a frequency of 2-4 × 10-5 in two lines derived from W22 and in 4Co63, but not at all in B73 or Mo17, two inbreds widely represented in Corn Belt hybrids. Surprisingly, the mutagenic LTR retrotransposons differed in the active lines, suggesting differences in the autonomous element make-up of the lines studied. Some active retrotransposons, like Hopscotch, Magellan, and Bs2, a Bs1 variant, were described previously; others, like Foto and Focou in 4Co63, were not. By high-throughput sequencing of retrotransposon junctions, we established that retrotranposition of Hopscotch, Magellan, and Bs2 occurs genome-wide in the pollen of active lines, but not in the female germline or in somatic tissues. We discuss here the implications of these results, which shed light on the source, frequency, and nature of spontaneous mutations in maize.


Assuntos
Mutação/genética , Pólen/genética , Retroelementos/genética , Deleção de Sequência/genética , Zea mays/genética , DNA de Plantas/genética , Sequenciamento de Nucleotídeos em Larga Escala
2.
Plant J ; 92(6): 1143-1156, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29072883

RESUMO

The complex interactions between transcription factors (TFs) and their target genes in a spatially and temporally specific manner are crucial to all cellular processes. Reconstruction of gene regulatory networks (GRNs) from gene expression profiles can help to decipher TF-gene regulations in a variety of contexts; however, the inevitable prediction errors of GRNs hinder optimal data mining of RNA-Seq transcriptome profiles. Here we perform an integrative study of Zea mays (maize) seed development in order to identify key genes in a complex developmental process. First, we reverse engineered a GRN from 78 maize seed transcriptome profiles. Then, we studied collective gene interaction patterns and uncovered highly interwoven network communities as the building blocks of the GRN. One community, composed of mostly unknown genes interacting with opaque2, brittle endosperm1 and shrunken2, contributes to seed phenotypes. Another community, composed mostly of genes expressed in the basal endosperm transfer layer, is responsible for nutrient transport. We further integrated our inferred GRN with gene expression patterns in different seed compartments and at various developmental stages and pathways. The integration facilitated a biological interpretation of the GRN. Our yeast one-hybrid assays verified six out of eight TF-promoter bindings in the reconstructed GRN. This study identified topologically important genes in interwoven network communities that may be crucial to maize seed development.


Assuntos
Redes Reguladoras de Genes/genética , Zea mays/genética , Endosperma/genética , Endosperma/crescimento & desenvolvimento , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Regiões Promotoras Genéticas , Sementes/genética , Sementes/crescimento & desenvolvimento , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Transcriptoma , Zea mays/crescimento & desenvolvimento
3.
Int J Mol Sci ; 18(9)2017 Aug 24.
Artigo em Inglês | MEDLINE | ID: mdl-28837076

RESUMO

Grain weight is one of the most important yield components and a developmentally complex structure comprised of two major compartments (endosperm and pericarp) in maize (Zea mays L.), however, very little is known concerning the coordinated accumulation of the numerous proteins involved. Herein, we used isobaric tags for relative and absolute quantitation (iTRAQ)-based comparative proteomic method to analyze the characteristics of dynamic proteomics for endosperm and pericarp during grain development. Totally, 9539 proteins were identified for both components at four development stages, among which 1401 proteins were non-redundant, 232 proteins were specific in pericarp and 153 proteins were specific in endosperm. A functional annotation of the identified proteins revealed the importance of metabolic and cellular processes, and binding and catalytic activities for the tissue development. Three and 76 proteins involved in 49 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were integrated for the specific endosperm and pericarp proteins, respectively, reflecting their complex metabolic interactions. In addition, four proteins with important functions and different expression levels were chosen for gene cloning and expression analysis. Different concordance between mRNA level and the protein abundance was observed across different proteins, stages, and tissues as in previous research. These results could provide useful message for understanding the developmental mechanisms in grain development in maize.


Assuntos
Proteoma , Proteômica , Zea mays/metabolismo , Análise por Conglomerados , Biologia Computacional/métodos , Grão Comestível/citologia , Grão Comestível/crescimento & desenvolvimento , Grão Comestível/metabolismo , Endosperma/metabolismo , Proteínas de Plantas/metabolismo , Mapeamento de Interação de Proteínas , Proteômica/métodos , Sementes/crescimento & desenvolvimento , Sementes/metabolismo , Zea mays/citologia , Zea mays/crescimento & desenvolvimento
4.
Sci Rep ; 7(1): 6769, 2017 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-28754917

RESUMO

Salinization is one of the major factors that threaten the existence of plants worldwide. Populus euphratica has been deemed to be a promising candidate for stress response research because of its high capacity to tolerate extreme salt stress. We carried out a genome-wide transcriptome analysis to identify the differentially expressed genes (DEGs) response to salt shock and elucidate the early salt tolerance mechanisms in P. euphratica. Both hierarchical clustering and DEG analysis demonstrated a predominant variation from time-course rather than NaCl intensity within 24 hours salt shock. Among the identified 1,678 salt-responsive DEGs, 74.1% (1,244) have not been reported before. We further created an integrated regulatory gene network of the salt response in P. euphratica by combining DEGs, transcription factors (TFs), Helitrons, miRNAs and their targets. The prominent pathways in this network are plant hormone transduction, starch and sucrose metabolism, RNA transport, protein processing in endoplasmic reticulum, etc. In addition, the network indicates calcium-related genes play key roles in P. euphratica response to salt shock. These results illustrated an overview of the systematic molecular response in P. euphratica under different intensities of salt shock and revealed the complex regulatory mechanism.


Assuntos
Redes Reguladoras de Genes , Populus/genética , Populus/fisiologia , Tolerância ao Sal/genética , Regulação para Baixo/efeitos dos fármacos , Regulação para Baixo/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/efeitos dos fármacos , Ontologia Genética , Redes Reguladoras de Genes/efeitos dos fármacos , Genes de Plantas , Populus/efeitos dos fármacos , Reprodutibilidade dos Testes , Tolerância ao Sal/efeitos dos fármacos , Cloreto de Sódio/farmacologia , Estresse Fisiológico/efeitos dos fármacos , Estresse Fisiológico/genética , Transcrição Gênica/efeitos dos fármacos
5.
6.
Plant J ; 88(6): 1038-1045, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27553634

RESUMO

The unusual eukaryotic Helitron transposons can readily capture host sequences and are, thus, evolutionarily important. They are presumed to amplify by rolling-circle replication (RCR) because some elements encode predicted proteins homologous to RCR prokaryotic transposases. In support of this replication mechanism, it was recently shown that transposition of a bat Helitron generates covalently closed circular intermediates. Another strong prediction is that RCR should generate tandem Helitron concatemers, yet almost all Helitrons identified to date occur as solo elements in the genome. To investigate alternative modes of Helitron organization in present-day genomes, we have applied the novel computational tool HelitronScanner to 27 plant genomes and have uncovered numerous tandem arrays of partially decayed, truncated Helitrons in all of them. Strikingly, most of these Helitron tandem arrays are interspersed with other repeats in centromeres. Many of these arrays have multiple Helitron 5' ends, but a single 3' end. The number of repeats in any one array can range from a handful to several hundreds. We propose here an RCR model that conforms to the present Helitron landscape of plant genomes. Our study provides strong evidence that plant Helitrons amplify by RCR and that the tandemly arrayed replication products accumulate mostly in centromeres.


Assuntos
Arabidopsis/metabolismo , Centrômero/metabolismo , Elementos de DNA Transponíveis/genética , Genoma de Planta/genética , Arabidopsis/genética , Centrômero/genética , Sequências de Repetição em Tandem/genética
7.
PLoS One ; 10(11): e0143181, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26587848

RESUMO

The formation and development of maize kernel is a complex dynamic physiological and biochemical process that involves the temporal and spatial expression of many proteins and the regulation of metabolic pathways. In this study, the protein profiles of the endosperm and pericarp at three important developmental stages were analyzed by isobaric tags for relative and absolute quantification (iTRAQ) labeling coupled with LC-MS/MS in popcorn inbred N04. Comparative quantitative proteomic analyses among developmental stages and between tissues were performed, and the protein networks were integrated. A total of 6,876 proteins were identified, of which 1,396 were nonredundant. Specific proteins and different expression patterns were observed across developmental stages and tissues. The functional annotation of the identified proteins revealed the importance of metabolic and cellular processes, and binding and catalytic activities for the development of the tissues. The whole, endosperm-specific and pericarp-specific protein networks integrated 125, 9 and 77 proteins, respectively, which were involved in 54 KEGG pathways and reflected their complex metabolic interactions. Confirmation for the iTRAQ endosperm proteins by two-dimensional gel electrophoresis showed that 44.44% proteins were commonly found. However, the concordance between mRNA level and the protein abundance varied across different proteins, stages, tissues and inbred lines, according to the gene cloning and expression analyses of four relevant proteins with important functions and different expression levels. But the result by western blot showed their same expression tendency for the four proteins as by iTRAQ. These results could provide new insights into the developmental mechanisms of endosperm and pericarp, and grain formation in maize.


Assuntos
Regulação da Expressão Gênica de Plantas , Proteínas de Plantas/metabolismo , Proteoma , Sementes/crescimento & desenvolvimento , Zea mays/metabolismo , Catálise , Cromatografia Líquida , Análise por Conglomerados , Eletroforese em Gel Bidimensional , Endosperma/metabolismo , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica , Microscopia Eletrônica de Varredura , Proteômica , Espectrometria de Massas em Tandem
8.
Proc Natl Acad Sci U S A ; 111(28): 10263-8, 2014 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-24982153

RESUMO

Transposons make up the bulk of eukaryotic genomes, but are difficult to annotate because they evolve rapidly. Most of the unannotated portion of sequenced genomes is probably made up of various divergent transposons that have yet to be categorized. Helitrons are unusual rolling circle eukaryotic transposons that often capture gene sequences, making them of considerable evolutionary importance. Unlike other DNA transposons, Helitrons do not end in inverted repeats or create target site duplications, so they are particularly challenging to identify. Here we present HelitronScanner, a two-layered local combinational variable (LCV) tool for generalized Helitron identification that represents a major improvement over previous identification programs based on DNA sequence or structure. HelitronScanner identified 64,654 Helitrons from a wide range of plant genomes in a highly automated way. We tested HelitronScanner's predictive ability in maize, a species with highly heterogeneous Helitron elements. LCV scores for the 5' and 3' termini of the predicted Helitrons provide a primary confidence level and element copy number provides a secondary one. Newly identified Helitrons were validated by PCR assays or by in silico comparative analysis of insertion site polymorphism among multiple accessions. Many new Helitrons were identified in model species, such as maize, rice, and Arabidopsis, and in a variety of organisms where Helitrons had not been reported previously to our knowledge, leading to a major upward reassessment of their abundance in plant genomes. HelitronScanner promises to be a valuable tool in future comparative and evolutionary studies of this major transposon superfamily.


Assuntos
Elementos de DNA Transponíveis/fisiologia , Evolução Molecular , Genoma de Planta/fisiologia , Plantas/genética , Reação em Cadeia da Polimerase/métodos , Análise de Sequência de DNA/métodos
9.
Mob Genet Elements ; 4(5): 1-5, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26442169

RESUMO

As a major driving force of genome evolution, transposons have been deviating from their original connotation as "junk" DNA ever since their important roles were revealed. The recently discovered Helitron transposons have been investigated in diverse eukaryotic genomes because of their remarkable gene-capture ability and other features that are crucial to our current understanding of genome dynamics. Helitrons are not canonical transposons in that they do not end in inverted repeats or create target site duplications, which makes them difficult to identify. Previous methods mainly rely on sequence alignment of conserved Helitron termini or manual curation. The abundance of Helitrons in genomes is still underestimated. We developed an automated and generalized tool, HelitronScanner, that identified a plethora of divergent Helitrons in many plant genomes. A local combinational variable approach as the key component of HelitronScanner offers a more granular representation of conserved nucleotide combinations and therefore is more sensitive in finding divergent Helitrons. This commentary provides an in-depth view of the local combinational variable approach and its association with Helitron sequence patterns. Analysis of Helitron terminal sequences shows that the local combinational variable approach is an efficacious representation of nucleotide patterns imperceptible at a full-sequence level.

10.
BMC Genomics ; 14: 679, 2013 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-24090499

RESUMO

BACKGROUND: The advent of next-generation high-throughput technologies has revolutionized whole genome sequencing, yet some experiments require sequencing only of targeted regions of the genome from a very large number of samples. These regions can be amplified by PCR and sequenced by next-generation methods using a multidimensional pooling strategy. However, there is at present no available generalized tool for the computational analysis of target-enriched NGS data from multidimensional pools. RESULTS: Here we present InsertionMapper, a pipeline tool for the identification of targeted sequences from multidimensional high throughput sequencing data. InsertionMapper consists of four independently working modules: Data Preprocessing, Database Modeling, Dimension Deconvolution and Element Mapping. We illustrate InsertionMapper with an example from our project 'New reverse genetics resources for maize', which aims to sequence-index a collection of 15,000 independent insertion sites of the transposon Ds in maize. Identified sequences are validated by PCR assays. This pipeline tool is applicable to similar scenarios requiring analysis of the tremendous output of short reads produced in NGS sequencing experiments of targeted genome sequences. CONCLUSIONS: InsertionMapper is proven efficacious for the identification of target-enriched sequences from multidimensional high throughput sequencing data. With adjustable parameters and experiment configurations, this tool can save great computational effort to biologists interested in identifying their sequences of interest within the huge output of modern DNA sequencers. InsertionMapper is freely accessible at https://sourceforge.net/p/insertionmapper and http://bo.csam.montclair.edu/du/insertionmapper.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Zea mays/genética , Sequência de Bases , Biologia Computacional/métodos , Elementos de DNA Transponíveis/genética , Genoma de Planta/genética , Reação em Cadeia da Polimerase , Reprodutibilidade dos Testes
11.
Nucleic Acids Res ; 41(Web Server issue): W441-7, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23729470

RESUMO

Knowledge of subcellular localizations (SCLs) of plant proteins relates to their functions and aids in understanding the regulation of biological processes at the cellular level. We present PlantLoc, a highly accurate and fast webserver for predicting the multi-label SCLs of plant proteins. The PlantLoc server has two innovative characters: building localization motif libraries by a recursive method without alignment and Gene Ontology information; and establishing simple architecture for rapidly and accurately identifying plant protein SCLs without a machine learning algorithm. PlantLoc provides predicted SCLs results, confidence estimates and which is the substantiality motif and where it is located on the sequence. PlantLoc achieved the highest accuracy (overall accuracy of 80.8%) of identification of plant protein SCLs as benchmarked by using a new test dataset compared other plant SCL prediction webservers. The ability of PlantLoc to predict multiple sites was also significantly higher than for any other webserver. The predicted substantiality motifs of queries also have great potential for analysis of relationships with protein functional regions. The PlantLoc server is available at http://cal.tongji.edu.cn/PlantLoc/.


Assuntos
Proteínas de Plantas/química , Sinais Direcionadores de Proteínas , Software , Motivos de Aminoácidos , Internet , Proteínas de Plantas/análise , Análise de Sequência de Proteína
12.
Genome Biol ; 14(5): R41, 2013 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-23663246

RESUMO

BACKGROUND: Sacred lotus is a basal eudicot with agricultural, medicinal, cultural and religious importance. It was domesticated in Asia about 7,000 years ago, and cultivated for its rhizomes and seeds as a food crop. It is particularly noted for its 1,300-year seed longevity and exceptional water repellency, known as the lotus effect. The latter property is due to the nanoscopic closely packed protuberances of its self-cleaning leaf surface, which have been adapted for the manufacture of a self-cleaning industrial paint, Lotusan. RESULTS: The genome of the China Antique variety of the sacred lotus was sequenced with Illumina and 454 technologies, at respective depths of 101× and 5.2×. The final assembly has a contig N50 of 38.8 kbp and a scaffold N50 of 3.4 Mbp, and covers 86.5% of the estimated 929 Mbp total genome size. The genome notably lacks the paleo-triplication observed in other eudicots, but reveals a lineage-specific duplication. The genome has evidence of slow evolution, with a 30% slower nucleotide mutation rate than observed in grape. Comparisons of the available sequenced genomes suggest a minimum gene set for vascular plants of 4,223 genes. Strikingly, the sacred lotus has 16 COG2132 multi-copper oxidase family proteins with root-specific expression; these are involved in root meristem phosphate starvation, reflecting adaptation to limited nutrient availability in an aquatic environment. CONCLUSIONS: The slow nucleotide substitution rate makes the sacred lotus a better resource than the current standard, grape, for reconstructing the pan-eudicot genome, and should therefore accelerate comparative analysis between eudicots and monocots.


Assuntos
Genoma de Planta , Nelumbo/genética , Adaptação Biológica , Substituição de Aminoácidos , Evolução Molecular , Dados de Sequência Molecular , Taxa de Mutação , Nelumbo/classificação , Nelumbo/fisiologia , Filogenia , Vitis/genética
13.
Biochimie ; 95(2): 354-8, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23116714

RESUMO

Protein-DNA interactions are involved in many biological processes essential for gene expression and regulation. To understand the molecular mechanisms of protein-DNA recognition, it is crucial to analyze and identify DNA-binding residues of protein-DNA complexes. Here, we proposed a novel descriptor shape string and another two related features shape string PSSM and shape string pair composition to characterize DNA-binding residues. We employed the new features and the position-specific scoring matrix (PSSM) for modeling and prediction. The results of a benchmark dataset showed that our approach significantly improved the accuracy of the predictor. The overall accuracy of our approach reached 85.86% with 85.02% sensitivity and 86.02% specificity. The results also demonstrated that shape string is a powerful descriptor for the prediction of DNA-binding residues. The additional two related features enhanced the predictive value.


Assuntos
Algoritmos , DNA/química , Matrizes de Pontuação de Posição Específica , Proteínas/química , Software , Sequência de Aminoácidos , Sítios de Ligação , Bases de Dados de Proteínas , Modelos Moleculares , Dados de Sequência Molecular , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , Sensibilidade e Especificidade
14.
J Theor Biol ; 308: 135-40, 2012 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-22683368

RESUMO

The subcellular localization of proteins is closely related to their functions. In this work, we propose a novel approach based on localization motifs to improve the accuracy of predicting subcellular localization of Gram-positive bacterial proteins. Our approach performed well on a five-fold cross validation with an overall success rate of 89.5%. Besides, the overall success rate of an independent testing dataset was 97.7%. Moreover, our approach was tested using a new experimentally-determined set of Gram-positive bacteria proteins and achieved an overall success rate of 96.3%.


Assuntos
Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Bactérias Gram-Positivas/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Modelos Biológicos , Dados de Sequência Molecular , Transporte Proteico , Reprodutibilidade dos Testes , Frações Subcelulares/metabolismo
15.
Nucleic Acids Res ; 40(Web Server issue): W298-302, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22553364

RESUMO

Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/.


Assuntos
Conformação Proteica , Software , Internet , Alinhamento de Sequência , Análise de Sequência de Proteína
16.
Mol Cell Proteomics ; 11(7): M111.016808, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22415040

RESUMO

Identification of protein structural neighbors to a query is fundamental in structure and function prediction. Here we present BS-align, a systematic method to retrieve backbone string neighbors from primary sequences as templates for protein modeling. The backbone conformation of a protein is represented by the backbone string, as defined in Ramachandran space. The backbone string of a query can be accurately predicted by two innovative technologies: a knowledge-driven sequence alignment and encoding of a backbone string element profile. Then, the predicted backbone string is employed to align against a backbone string database and retrieve a set of backbone string neighbors. The backbone string neighbors were shown to be close to native structures of query proteins. BS-align was successfully employed to predict models of 10 membrane proteins with lengths ranging between 229 and 595 residues, and whose high-resolution structural determinations were difficult to elucidate both by experiment and prediction. The obtained TM-scores and root mean square deviations of the models confirmed that the models based on the backbone string neighbors retrieved by the BS-align were very close to the native membrane structures although the query and the neighbor shared a very low sequence identity. The backbone string system represents a new road for the prediction of protein structure from sequence, and suggests that the similarity of the backbone string would be more informative than describing a protein as belonging to a fold.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas de Membrana/química , Sequência de Aminoácidos , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Conformação Proteica , Proteus mirabilis , Alinhamento de Sequência , Análise de Sequência de Proteína , Homologia de Sequência de Aminoácidos , Homologia Estrutural de Proteína
17.
Biochimie ; 94(3): 847-53, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22182488

RESUMO

Mycobacterium, the most common disease-causing genus, infects billions of people and is notoriously difficult to treat. Understanding the subcellular localization of mycobacterial proteins can provide essential clues for protein function and drug discovery. In this article, we present a novel approach that focuses on local sequence information to identify localization motifs that are generated by a merging algorithm and are selected based on a binomially distributed model. These localization motifs are employed as features for identifying the subcellular localization of mycobacterial proteins. Our approach provides more accurate results than previous methods and was tested on an independent dataset recently obtained from an experimental study to provide a first and reasonably accurate prediction of subcellular localization. Our approach can also be used for large-scale prediction of new protein entries in the UniportKB database and of protein sequences obtained experimentally. In addition, our approach identified many local motifs involved with the subcellular localization that also interact with the environment. Thus, our method may have widespread applications both in the study of the functions of mycobacterial proteins and in the search for a potential vaccine target for designing drugs.


Assuntos
Proteínas de Bactérias/metabolismo , Biologia Computacional/métodos , Mycobacterium/metabolismo , Algoritmos
18.
Amino Acids ; 42(5): 1749-55, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-21424809

RESUMO

Numerous methods for predicting γ-turns in proteins have been developed. However, the results they generally provided are not very good, with a Matthews correlation coefficient (MCC)≤0.18. Here, an attempt has been made to develop a method to improve the accuracy of γ-turn prediction. First, we employ the geometric mean metric as optimal criterion to evaluate the performance of support vector machine for the highly imbalanced γ-turn dataset. This metric tries to maximize both the sensitivity and the specificity while keeping them balanced. Second, a predictor to generate protein shape string by structure alignment against the protein structure database has been designed and the predicted shape string is introduced as new variable for γ-turn prediction. Based on this perception, we have developed a new method for γ-turn prediction. After training and testing the benchmark dataset of 320 non-homologous protein chains using a fivefold cross-validation technique, the present method achieves excellent performance. The overall prediction accuracy Qtotal can achieve 92.2% and the MCC is 0.38, which outperform the existing γ-turn prediction methods. Our results indicate that the protein shape string is useful for predicting protein tight turns and it is reasonable to use the dihedral angle information as a variable for machine learning to predict protein folding. The dataset used in this work and the software to generate predicted shape string from structure database can be obtained from anonymous ftp site ftp://cheminfo.tongji.edu.cn/GammaTurnPrediction/ freely.


Assuntos
Bases de Dados de Proteínas , Conformação Proteica , Estrutura Secundária de Proteína , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Dados de Sequência Molecular , Redes Neurais de Computação , Dobramento de Proteína , Alinhamento de Sequência , Software , Máquina de Vetores de Suporte
19.
Bioinformatics ; 28(1): 32-9, 2012 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-22065541

RESUMO

MOTIVATION: The precise prediction of protein secondary structure is of key importance for the prediction of 3D structure and biological function. Although the development of many excellent methods over the last few decades has allowed the achievement of prediction accuracies of up to 80%, progress seems to have reached a bottleneck, and further improvements in accuracy have proven difficult. RESULTS: We propose for the first time a structural position-specific scoring matrix (SPSSM), and establish an unprecedented database of 9 million sequences and their SPSSMs. This database, when combined with a purpose-designed BLAST tool, provides a novel prediction tool: SPSSMPred. When the SPSSMPred was validated on a large dataset (10,814 entries), the Q3 accuracy of the protein secondary structure prediction was 93.4%. Our approach was tested on the two latest EVA sets; accuracies of 82.7 and 82.0% were achieved, far higher than can be achieved using other predictors. For further evaluation, we tested our approach on newly determined sequences (141 entries), and obtained an accuracy of 89.6%. For a set of low-homology proteins (40 entries), the SPSSMPred still achieved a Q3 value of 84.6%. AVAILABILITY: The SPSSMPred server is available at http://cal.tongji.edu.cn/SPSSMPred/ CONTACT: lith@tongji.edu.cn


Assuntos
Matrizes de Pontuação de Posição Específica , Estrutura Secundária de Proteína , Proteínas/química , Animais , Humanos , Homologia de Sequência de Aminoácidos
20.
BMC Bioinformatics ; 12: 283, 2011 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-21749732

RESUMO

BACKGROUND: The ß-turn is a secondary protein structure type that plays an important role in protein configuration and function. Development of accurate prediction methods to identify ß-turns in protein sequences is valuable. Several methods for ß-turn prediction have been developed; however, the prediction quality is still a challenge and there is substantial room for improvement. Innovations of the proposed method focus on discovering effective features, and constructing a new architectural model. RESULTS: We utilized predicted secondary structures, predicted shape strings and the position-specific scoring matrix (PSSM) as input features, and proposed a novel two-layer model to enhance the prediction. We achieved the highest values according to four evaluation measures, i.e. Q(total) = 87.2%, MCC = 0.66, Q(observed) = 75.9%, and Q(predicted) = 73.8% on the BT426 dataset. The results show that our proposed two-layer model discriminates better between ß-turns and non-ß-turns than the single model due to obtaining higher Q(predicted). Moreover, the predicted shape strings based on the structural alignment approach greatly improve the performance, and the same improvements were observed on BT547 and BT823 datasets as well. CONCLUSION: In this article, we present a comprehensive method for the prediction of ß-turns. Experiments show that the proposed method constitutes a great improvement over the competing prediction methods.


Assuntos
Matrizes de Pontuação de Posição Específica , Estrutura Secundária de Proteína , Proteínas/química , Algoritmos , Humanos , Análise de Sequência de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...