Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Int J Mol Sci ; 17(11)2016 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-27792167

RESUMO

Information about the interface sites of Protein-Protein Interactions (PPIs) is useful for many biological research works. However, despite the advancement of experimental techniques, the identification of PPI sites still remains as a challenging task. Using a statistical learning technique, we proposed a computational tool for predicting PPI interaction sites. As an alternative to similar approaches requiring structural information, the proposed method takes all of the input from protein sequences. In addition to typical sequence features, our method takes into consideration that interaction sites are not randomly distributed over the protein sequence. We characterized this positional preference using protein complexes with known structures, proposed a numerical index to estimate the propensity and then incorporated the index into a learning system. The resulting predictor, without using structural information, yields an area under the ROC curve (AUC) of 0.675, recall of 0.597, precision of 0.311 and accuracy of 0.583 on a ten-fold cross-validation experiment. This performance is comparable to the previous approach in which structural information was used. Upon introducing the B-factor data to our predictor, we demonstrated that the AUC can be further improved to 0.750. The tool is accessible at http://bsaltools.ym.edu.tw/predppis.


Assuntos
Aprendizado de Máquina , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteínas/metabolismo , Aminoácidos/química , Aminoácidos/metabolismo , Animais , Bases de Dados de Proteínas , Humanos , Proteínas Intrinsicamente Desordenadas/química , Proteínas Intrinsicamente Desordenadas/metabolismo , Modelos Biológicos , Domínios e Motivos de Interação entre Proteínas , Proteínas/química
2.
J Theor Biol ; 318: 1-12, 2013 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-23137835

RESUMO

The type information of un-annotated membrane proteins provides an important hint for their biological functions. The experimental determination of membrane protein types, despite being more accurate and reliable, is not always feasible due to the costly laboratory procedures, thereby creating a need for the development of bioinformatics methods. This article describes a novel computational classifier for the prediction of membrane protein types using proteins' sequences. The classifier, comprising a collection of one-versus-one support vector machines, makes use of the following sequence attributes: (1) the cationic patch sizes, the orientation, and the topology of transmembrane segments; (2) the amino acid physicochemical properties; (3) the presence of signal peptides or anchors; and (4) the specific protein motifs. A new voting scheme was implemented to cope with the multi-class prediction. Both the training and the testing sequences were collected from SwissProt. Homologous proteins were removed such that there is no pair of sequences left in the datasets with a sequence identity higher than 40%. The performance of the classifier was evaluated by a Jackknife cross-validation and an independent testing experiments. Results show that the proposed classifier outperforms earlier predictors in prediction accuracy in seven of the eight membrane protein types. The overall accuracy was increased from 78.3% to 88.2%. Unlike earlier approaches which largely depend on position-specific substitution matrices and amino acid compositions, most of the sequence attributes implemented in the proposed classifier have supported literature evidences. The classifier has been deployed as a web server and can be accessed at http://bsaltools.ym.edu.tw/predmpt.


Assuntos
Aminoácidos/química , Proteínas de Membrana/química , Sinais Direcionadores de Proteínas , Estrutura Terciária de Proteína , Animais , Físico-Química , Biologia Computacional/métodos , Interações Hidrofóbicas e Hidrofílicas , Análise de Sequência de Proteína/métodos , Máquina de Vetores de Suporte
3.
BMC Genomics ; 10: 218, 2009 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-19435500

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are small non-coding RNAs affecting the expression of target genes via translational repression or mRNA degradation mechanisms. With the increasing availability of mRNA and miRNA expression data, it might be possible to assess functional targets using the fact that a miRNA might down-regulate its target mRNAs. In this work we computed the correlation of expression profiles between miRNAs and target mRNAs using the NCI-60 expression data. The aim is to investigate whether the correlations between miRNA and mRNA expression profiles, either positive or negative, can be used to assist the identification of functional miRNA-mRNA relationships. RESULTS: Predicted miRNA-mRNA interactions were taken from TargetScan 4.1 and miRBase release 5. Pearson correlation coefficients between the miRNA and the mRNA expression profiles were computed using NCI-60 data. The correlation coefficients were then subject to the Benjamini and Hochberg correction. Our results show that the percentage of TargetScan-predicted miRNA-mRNA interactions having negative correlation in expression profiles is higher than that of miRBase-predicted pairs. Using the experimentally validated miRNA targets listed in TarBase, genes involved in mRNA degradation show more negative correlations between miRNA and mRNA expression profiles, comparing with genes involved in translational repression. Furthermore, correlation analysis for miRNAs and mRNAs transcribed from the same genes shows that correlations of expression profiles between intronic miRNAs and host genes tend to be positive. Finally we found that a target gene might be down-regulated by more than one miRNAs sharing the same seed region. CONCLUSION: Our results suggest that expression profiles can be used in the computational identification of functional miRNA-target associations. One can expect a higher chance of finding negatively correlated expression profiles for TargetScan-predicted interactions than for miRBase-predicted ones. With limited experimentally validated miRNA-target interactions, expression profiles can only serve as a supplementary role in finding interactions between miRNAs and mRNAs.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica , MicroRNAs/metabolismo , RNA Mensageiro/metabolismo , Linhagem Celular Tumoral , Bases de Dados Genéticas , Humanos , MicroRNAs/genética , Análise de Sequência com Séries de Oligonucleotídeos , RNA Mensageiro/genética
4.
BMC Bioinformatics ; 9 Suppl 12: S4, 2008 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-19091027

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are a set of small non-coding RNAs serving as important negative gene regulators. In animals, miRNAs turn down protein translation by binding to the 3' UTR regions of target genes with imperfect complementary pairing. The identification of microRNA targets has become one of the major challenges of miRNA research. Bioinformatics investigations on miRNA target have resulted in a number of target prediction tools. Although these tools are capable of predicting hundreds of targets for a given miRNA, many of them suffer from high false positive rates, indicating the need for a post-processing filter for the predicted targets. Once trained with experimentally validated true and false targets, machine learning methods appear to be ideal approaches to distinguish the true targets from the false ones. RESULTS: We present a miRNA target filtering system named MiRTif (miRNA:target interaction filter). The system is a support vector machine (SVM) classifier trained with 195 positive and 38 negative miRNA:target interaction pairs, all experimentally validated. Each miRNA:target interaction pair is divided into a seed and a non-seed region. The encoded feature vector contains various k-gram frequencies in the seed, the non-seed and the entire regions. Informative features are selected based on their discriminating abilities. Prediction accuracies are assessed using 10-fold cross-validation experiments. Our system achieves AUC (area under the ROC curve) of 0.86, sensitivity of 83.59%, and specificity of 73.68%. More importantly, the system correctly predicts majority of the false positive miRNA:target interactions (28 out of 38). The possibility of over-fitting due to the relatively small negative sample set has also been investigated using a set of non-validated and randomly selected targets (from miRBase). CONCLUSION: MiRTif is designed as a post-processing filter that takes miRNA:target interactions predicted by other target prediction softwares such as TargetScanS, PicTar and miRanda as inputs, and determines how likely the given interaction is a real or a pseudo one. MiRTif can be accessed from http://bsal.ym.edu.tw/mirtif.


Assuntos
MicroRNAs/genética , Regiões 3' não Traduzidas , Algoritmos , Área Sob a Curva , Inteligência Artificial , Biologia Computacional/métodos , Simulação por Computador , Computadores , Reações Falso-Positivas , Humanos , MicroRNAs/metabolismo , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de RNA/métodos , Software
5.
Amino Acids ; 35(3): 615-26, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-18415037

RESUMO

Determining if missense mutations are deleterious is critical for the analysis of genes implicated in disease. However, the mutational effects of many missense mutations in databases like the Breast Cancer Information Core are unclassified. Several approaches have emerged recently to determine such mutational effects but none have utilized amino acid property indices. We modified a previously described phylogenetic approach by first classifying benign substitutions based on the assumption that missense mutations that are maintained in orthologs are unlikely to affect function. A consensus conservation score based on 16 amino acid properties was used to characterize the remaining substitutions. This approach was evaluated with experimentally verified T4 lysozyme missense mutations and is shown to be able to sieve out putative biochemical and structurally important residues. The use of amino acid properties can enhance the prediction of biochemical and structurally important residues and thus also predict the significance of missense mutations.


Assuntos
Aminoácidos/química , Proteína BRCA1/química , Proteína BRCA1/genética , Mutação de Sentido Incorreto , Substituição de Aminoácidos , Animais , Bacteriófago T4/enzimologia , Humanos , Modelos Moleculares , Muramidase/química , Muramidase/genética
6.
J Biol Chem ; 283(19): 13205-15, 2008 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-18319255

RESUMO

Like other cancers, aberrant gene regulation features significantly in hepatocellular carcinoma (HCC). MicroRNAs (miRNAs) were recently found to regulate gene expression at the post-transcriptional/translational levels. The expression profiles of 157 miRNAs were examined in 19 HCC patients, and 19 up-regulated and 3 down-regulated miRNAs were found to be associated with HCC. Putative gene targets of these 22 miRNAs were predicted in silico and were significantly enriched in 34 biological pathways, most of which are frequently dysregulated during carcinogenesis. Further characterization of microRNA-224 (miR-224), the most significantly up-regulated miRNA in HCC patients, revealed that miR-224 increases apoptotic cell death as well as proliferation and targets apoptosis inhibitor-5 (API-5) to inhibit API-5 transcript expression. Significantly, miR-224 expression was found to be inversely correlated with API-5 expression in HCC patients (p < 0.05). Hence, our findings define a true in vivo target of miR-224 and reaffirm the important role of miRNAs in the dysregulation of cellular processes that may ultimately lead to tumorigenesis.


Assuntos
Proteínas Reguladoras de Apoptose/genética , Carcinoma Hepatocelular/genética , Regulação Neoplásica da Expressão Gênica/genética , MicroRNAs/genética , Proteínas Nucleares/genética , Regulação para Cima/genética , Apoptose , Sequência de Bases , Carcinoma Hepatocelular/patologia , Transformação Celular Neoplásica/genética , Perfilação da Expressão Gênica , Humanos , Dados de Sequência Molecular , Especificidade por Substrato , Transcrição Gênica/genética , Células Tumorais Cultivadas
7.
J Theor Biol ; 252(1): 145-54, 2008 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-18342336

RESUMO

Remote homology detection refers to the detection of structure homology in evolutionarily related proteins with low sequence similarity. Supervised learning algorithms such as support vector machine (SVM) are currently the most accurate methods. In most of these SVM-based methods, efforts have been dedicated to developing new kernels to better use the pairwise alignment scores or sequence profiles. Moreover, amino acids' physicochemical properties are not generally used in the feature representation of protein sequences. In this article, we present a remote homology detection method that incorporates two novel features: (1) a protein's primary sequence is represented using amino acid's physicochemical properties and (2) the similarity between two proteins is measured using recurrence quantification analysis (RQA). An optimization scheme was developed to select different amino acid indices (up to 10 for a protein family) that are best to characterize the given protein family. The selected amino acid indices may enable us to draw better biological explanation of the protein family classification problem than using other alignment-based methods. An SVM-based classifier will then work on the space described by the RQA metrics. The classification scheme is named as SVM-RQA. Experiments at the superfamily level of the SCOP1.53 dataset show that, without using alignment or sequence profile information, the features generated from amino acid indices are able to produce results that are comparable to those obtained by the published state-of-the-art SVM kernels. In the future, better prediction accuracies can be expected by combining the alignment-based features with our amino acids property-based features. Supplementary information including the raw dataset, the best-performing amino acid indices for each protein family and the computed RQA metrics for all protein sequences can be downloaded from http://ym151113.ym.edu.tw/svm-rqa.


Assuntos
Aminoácidos/química , Homologia de Sequência de Aminoácidos , Sequência de Aminoácidos , Fenômenos Químicos , Físico-Química , Bases de Dados de Proteínas , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência de Proteína/métodos
8.
Amino Acids ; 35(2): 345-53, 2008 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-18163182

RESUMO

Identifying a protein's subcellular localization is an important step to understand its function. However, the involved experimental work is usually laborious, time consuming and costly. Computational prediction hence becomes valuable to reduce the inefficiency. Here we provide a method to predict protein subcellular localization by using amino acid composition and physicochemical properties. The method concatenates the information extracted from a protein's N-terminal, middle and full sequence. Each part is represented by amino acid composition, weighted amino acid composition, five-level grouping composition and five-level dipeptide composition. We divided our dataset into training and testing set. The training set is used to determine the best performing amino acid index by using five-fold cross validation, whereas the testing set acts as the independent dataset to evaluate the performance of our model. With the novel representation method, we achieve an accuracy of approximately 75% on independent dataset. We conclude that this new representation indeed performs well and is able to extract the protein sequence information. We have developed a web server for predicting protein subcellular localization. The web server is available at http://aaindexloc.bii.a-star.edu.sg .


Assuntos
Algoritmos , Inteligência Artificial , Biologia Computacional/métodos , Espaço Intracelular/química , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Sequência de Aminoácidos , Bases de Dados de Proteínas , Valor Preditivo dos Testes
9.
In Silico Biol ; 7(1): 61-75, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17688428

RESUMO

P53 is probably the most important tumor suppressor known. Over the years, information about this gene has increased dramatically. We have built a comprehensive knowledgebase of p53, which aims to facilitate wet-lab biologists to formulate their experiments and new-comers to learn whatever they need about the gene and bioinformaticians to make new discoveries through data analysis. Using the information curated, including mutation information, transcription factors, transcriptional targets, and single nucleotide polymorphisms, we have performed extensive bioinformatics analysis, and made several new discoveries about p53. We have identified point missense mutations that are over-represented in cancers, but lack of functional studies. By assessing the capability of six p53 transcriptional targets' tag SNPs selected from HapMap to capture SNPs obtained from National Institute of Environmental Health Sciences (NIEHS) Environmental Genome project and vice versa, we conclude that NIEHS data is a better source for tagSNP selections of these genes in future association studies. Analysis of microRNA regulation in the transcriptional network of the p53 gene reveals potentially important regulatory relationships between oncogenic microRNAs and transcription factors of p53. By mapping transcription factors of p53 to pathways involved in cell cycle and apoptosis, we have identified distinctive transcriptional controls of p53 in these two physiological states.


Assuntos
Genes p53 , MicroRNAs/genética , Mutação de Sentido Incorreto , Mutação Puntual , Polimorfismo Genético , Proteína Supressora de Tumor p53/metabolismo , Apoptose , Códon , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , MicroRNAs/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , Transcrição Gênica
10.
BMC Bioinformatics ; 7: 525, 2006 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-17137522

RESUMO

BACKGROUND: The advent of genotype data from large-scale efforts that catalog the genetic variants of different populations have given rise to new avenues for multifactorial disease association studies. Recent work shows that genotype data from the International HapMap Project have a high degree of transferability to the wider population. This implies that the design of genotyping studies on local populations may be facilitated through inferences drawn from information contained in HapMap populations. RESULTS: To facilitate analysis of HapMap data for characterizing the haplotype structure of genes or any chromosomal regions, we have developed an integrated web-based resource, iHAP. In addition to incorporating genotype and haplotype data from the International HapMap Project and gene information from the UCSC Genome Browser Database, iHAP also provides capabilities for inferring haplotype blocks and selecting tag SNPs that are representative of haplotype patterns. These include block partitioning algorithms, block definitions, tag SNP definitions, as well as SNPs to be "force included" as tags. Based on the parameters defined at the input stage, iHAP performs on-the-fly analysis and displays the result graphically as a webpage. To facilitate analysis, intermediate and final result files can be downloaded. CONCLUSION: The iHAP resource, available at http://ihap.bii.a-star.edu.sg, provides a convenient yet flexible approach for the user community to analyze HapMap data and identify candidate targets for genotyping studies.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Haplótipos/genética , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Variação Genética/genética , Dados de Sequência Molecular
11.
BMC Genomics ; 7: 238, 2006 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-16982009

RESUMO

BACKGROUND: The recent advancement in human genome sequencing and genotyping has revealed millions of single nucleotide polymorphisms (SNP) which determine the variation among human beings. One of the particular important projects is The International HapMap Project which provides the catalogue of human genetic variation for disease association studies. In this paper, we analyzed the genotype data in HapMap project by using National Institute of Environmental Health Sciences Environmental Genome Project (NIEHS EGP) SNPs. We first determine whether the HapMap data are transferable to the NIEHS data. Then, we study how well the HapMap SNPs capture the untyped SNPs in the region. Finally, we provide general guidelines for determining whether the SNPs chosen from HapMap may be able to capture most of the untyped SNPs. RESULTS: Our analysis shows that HapMap data are not robust enough to capture the untyped variants for most of the human genes. The performance of SNPs for European and Asian samples are marginal in capturing the untyped variants, i.e. approximately 55%. Expectedly, the SNPs from HapMap YRI panel can only capture approximately 30% of the variants. Although the overall performance is low, however, the SNPs for some genes perform very well and are able to capture most of the variants along the gene. This is observed in the European and Asian panel, but not in African panel. Through observation, we concluded that in order to have a well covered SNPs reference panel, the SNPs density and the association among reference SNPs are important to estimate the robustness of the chosen SNPs. CONCLUSION: We have analyzed the coverage of HapMap SNPs using NIEHS EGP data. The results show that HapMap SNPs are transferable to the NIEHS SNPs. However, HapMap SNPs cannot capture some of the untyped SNPs and therefore resequencing may be needed to uncover more SNPs in the missing region.


Assuntos
Polimorfismo de Nucleotídeo Único/genética , Povo Asiático/genética , População Negra/genética , Mapeamento Cromossômico/métodos , Variação Genética , Genoma Humano , Humanos , Modelos Genéticos , Reprodutibilidade dos Testes , População Branca/genética
12.
J Bioinform Comput Biol ; 4(6): 1245-67, 2006 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-17245813

RESUMO

Physicochemcial properties of amino acids are important factors in determining protein structure and function. Most approaches make use of averaged properties over entire domains or even proteins to analyze their structure or function. This level of coarseness tends to hide the richness of the variability in the different properties across functional domains. This paper studies the conservation of physicochemical properties in a functionally similar family of proteins using a novel wavelet-based technique known as multiresolution analysis. Such an analysis can help uncover characteristics that can otherwise remain hidden. We have studied the protein kinase family of sequences and our findings are as follows: (a) a number of different properties are conserved over the functional catalytic domain irrespective of the sequence identities; (b) conservation of properties can be observed at different frequency levels and they agree well with the known structural/functional properties of the subdomains for the protein kinase family; (c) structural differences between the different kinase family members are reflected in the waveforms; and (d) functionally important mutations show distortions in the waveforms of conserved properties. The potential usefulness of the above findings in identifying functionally similar sequences in the twilight and midnight zones is demonstrated through a simple prediction model for the protein kinase family which achieved a recall of 93.7% and a precision of 96.75% in cross-validation tests.


Assuntos
Algoritmos , Sequência Conservada , Proteínas Quinases/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Catálise , Dados de Sequência Molecular , Proteínas Quinases/classificação , Relação Estrutura-Atividade
13.
In Silico Biol ; 5(4): 367-77, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16268781

RESUMO

In humans an estimated 35-60% of genes are alternatively spliced. A large number of genes also show alternative initiation or termination. Regulation of these processes is still poorly understood. For alternative splicing it is believed that the relative concentration of certain proteins and the presence of certain regulatory elements are the key factors determining alterations in splicing pattern. However, there is evidence that antisense RNA might be part of the regulatory processes. Antisense RNA molecules could bind to the target pre-mRNA in a sequence-specific fashion, sterically blocking targeted splice sites and redirecting the spliceosome to available and unhindered splice sites. Here we describe an in silico investigation to identify human sense/antisense pairs with alternative initiation or termination in the sense gene and where only one of the isoforms overlaps the antisense transcript. Alternatively spliced genes with antisense transcripts covering the alternatively used splice site are also identified. Our analyses are based on the ASAP splicing annotation database from UCLA, the antisense transcripts data from Yelin et al., 2003, and the H-invitational full-length cDNA database from JBIRC, Japan. These data gives new insight into the complexity of genomic organization and provide candidate loci for experimentalists to study antisense mediated regulation of alternative initiation, splicing and termination. Our result contains 468 clusters with this characteristic genomic organization and can be found at http://aistar.bii.a-star.edu.sg/.


Assuntos
Processamento Alternativo , Regulação da Expressão Gênica , RNA Antissenso/metabolismo , Transcrição Gênica , Bases de Dados Genéticas , Humanos , Dados de Sequência Molecular , RNA Antissenso/genética , Análise de Sequência de RNA
14.
In Silico Biol ; 5(4): 415-8, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16268786

RESUMO

CMDWave (Conserved Motif Detection using WAVElets) is a web server that predicts conserved motifs in protein sequences. A set of query protein sequences are first aligned using ClustalW to obtain equal sized sequences. CMDWave then converts the sequences into a numerical representation using electron-ion interaction potential (EIIP). This is followed by a wavelet decomposition and reconstruction. A new similarity metric along with thresholding is then used to identify conserved motifs across all the query sequences. Users need not specify the number of motifs to be identified. For larger groups of sequences, results can be emailed to the users.


Assuntos
Motivos de Aminoácidos , Análise de Sequência de Proteína , Software , Bases de Dados de Proteínas , Internet , Proteínas/química , Proteínas/genética
15.
BMC Bioinformatics ; 6: 174, 2005 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-16011808

RESUMO

BACKGROUND: Predicting the subcellular localization of proteins is important for determining the function of proteins. Previous works focused on predicting protein localization in Gram-negative bacteria obtained good results. However, these methods had relatively low accuracies for the localization of extracellular proteins. This paper studies ways to improve the accuracy for predicting extracellular localization in Gram-negative bacteria. RESULTS: We have developed a system for predicting the subcellular localization of proteins for Gram-negative bacteria based on amino acid subalphabets and a combination of multiple support vector machines. The recall of the extracellular site and overall recall of our predictor reach 86.0% and 89.8%, respectively, in 5-fold cross-validation. To the best of our knowledge, these are the most accurate results for predicting subcellular localization in Gram-negative bacteria. CONCLUSION: Clustering 20 amino acids into a few groups by the proposed greedy algorithm provides a new way to extract features from protein sequences to cover more adjacent amino acids and hence reduce the dimensionality of the input vector of protein features. It was observed that a good amino acid grouping leads to an increase in prediction performance. Furthermore, a proper choice of a subset of complementary support vector machines constructed by different features of proteins maximizes the prediction accuracy.


Assuntos
Bactérias Gram-Negativas/química , Bactérias Gram-Negativas/genética , Análise de Sequência de Proteína/métodos , Algoritmos , Análise por Conglomerados , Modelos Estatísticos , Valor Preditivo dos Testes , Estatística como Assunto
16.
BMC Bioinformatics ; 6: 152, 2005 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-15963230

RESUMO

BACKGROUND: Protein subcellular localization is an important determinant of protein function and hence, reliable methods for prediction of localization are needed. A number of prediction algorithms have been developed based on amino acid compositions or on the N-terminal characteristics (signal peptides) of proteins. However, such approaches lead to a loss of contextual information. Moreover, where information about the physicochemical properties of amino acids has been used, the methods employed to exploit that information are less than optimal and could use the information more effectively. RESULTS: In this paper, we propose a new algorithm called pSLIP which uses Support Vector Machines (SVMs) in conjunction with multiple physicochemical properties of amino acids to predict protein subcellular localization in eukaryotes across six different locations, namely, chloroplast, cytoplasmic, extracellular, mitochondrial, nuclear and plasma membrane. The algorithm was applied to the dataset provided by Park and Kanehisa and we obtained prediction accuracies for the different classes ranging from 87.7%-97.0% with an overall accuracy of 93.1%. CONCLUSION: This study presents a physicochemical property based protein localization prediction algorithm. Unlike other algorithms, contextual information is preserved by dividing the protein sequences into clusters. The prediction accuracy shows an improvement over other algorithms based on various types of amino acid composition (single, pair and gapped pair). We have also implemented a web server to predict protein localization across the six classes (available at http://pslip.bii.a-star.edu.sg/).


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas/classificação , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Frações Subcelulares/metabolismo , Bases de Dados de Proteínas , Perfilação da Expressão Gênica/métodos , Proteínas/química , Validação de Programas de Computador , Frações Subcelulares/química
17.
J Bioinform Comput Biol ; 3(2): 243-55, 2005 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15852503

RESUMO

We describe an exhaustive and greedy algorithm for improving the accuracy of multiple sequence alignment. A simple progressive alignment approach is employed to provide initial alignments. The initial alignment is then iteratively optimized against an objective function. For any working alignment, the optimization involves three operations: insertions, deletions and shuffles of gaps. The optimization is exhaustive since the algorithm applies the above operations to all eligible positions of an alignment. It is also greedy since only the operation that gives the best improving objective score will be accepted. The algorithms have been implemented in the EGMA (Exhaustive and Greedy Multiple Alignment) package using Java programming language, and have been evaluated using the BAliBASE benchmark alignment database. Although EGMA is not guaranteed to produce globally optimized alignment, the tests indicate that EGMA is able to build alignments with high quality consistently, compared with other commonly used iterative and non-iterative alignment programs. It is also useful for refining multiple alignments obtained by other methods.


Assuntos
Algoritmos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Sequência Conservada , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência de Proteína
18.
Bioinformatics ; 21(10): 2570-1, 2005 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-15746289

RESUMO

UNLABELLED: WebAllergen is a web server that predicts the potential allergenicity of proteins. The query protein will be compared against a set of prebuilt allergenic motifs that have been obtained from 664 known allergen proteins. The query will also be compared with known allergens that do not have detectable allergenic motifs. Moreover, users are allowed to upload their own allergens as alternative training sequences on which a new set of allergenic motifs will be built. The query sequences can also be compared with these motifs. AVAILABILITY: http://weballergen.bii.a-star.edu.sg/


Assuntos
Algoritmos , Alérgenos/química , Internet , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Interface Usuário-Computador , Alérgenos/análise , Alérgenos/classificação , Motivos de Aminoácidos , Inteligência Artificial , Reconhecimento Automatizado de Padrão/métodos , Proteínas/análise , Proteínas/classificação , Relação Estrutura-Atividade
19.
J Bioinform Comput Biol ; 3(1): 145-56, 2005 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-15751117

RESUMO

Tabu search is a meta-heuristic approach that is proven to be useful in solving combinatorial optimization problems. We implement the adaptive memory features of tabu search to refine a multiple sequence alignment. Adaptive memory helps the search process to avoid local optima and explores the solution space economically and effectively without getting trapped into cycles. The algorithm is further enhanced by introducing extended tabu search features such as intensification and diversification. The neighborhoods of a solution are generated stochastically and a consistency-based objective function is employed to measure its quality. The algorithm is tested with the datasets from BAliBASE benchmarking database. We have observed through experiments that tabu search is able to improve the quality of multiple alignments generated by other software such as ClustalW and T-Coffee. The source code of our algorithm is available at http://www.bii.a-star.edu.sg/~tariq/tabu/.


Assuntos
Algoritmos , Inteligência Artificial , Alinhamento de Sequência/métodos , Análise de Sequência/métodos , Sequência Conservada , Homologia de Sequência
20.
Int J Oncol ; 26(3): 607-13, 2005 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-15703814

RESUMO

The Y-box binding protein 1 (YB-1) regulates gene expression through transcription and translation. YB-1 has been shown to be associated with up-regulation of P-glycoprotein (Pgp), an ATP-binding transporter involved in multi-drug resistance. In this study, we determined the prognostic significance of YB-1 and its relationship with Pgp in patients with breast cancer. YB-1 and Pgp expression were evaluated by immunohistochemistry in resected specimens of infiltrative ductal breast cancers from 99 patients and 57 patients respectively and correlated with clinicopathological parameters and adjuvant chemotherapy regimes. The antibody for the YB-1 protein was prepared by injecting a rabbit with a purified recombinant chicken YB1 protein. The relationship between YB-1 and Pgp was also evaluated by a computational approach using the Resonant Recognition Model (RRM). We found that breast tumors which were both estrogen receptor-negative and lymph node positive were associated with high YB-1 expression (P=0.017). In patients who did not receive adjuvant chemotherapy, recurrence risk was reduced in breast cancers having lower YB-1 expression (P=0.034), suggesting that high levels of YB-1 expression in breast cancer is associated with tumor aggressiveness. We were able to demonstrate a direct interaction between YB-1 and Pgp using the computer-based RRM. Interestingly, we found that patients who were on a chemotherapy regime which contained an anthracycline (a Pgp substrate) and subsequently developed recurrence, had a higher YB-1 score compared to patients on the Cyclophosphamide/Methotrexate/5-Fluorouracil regime (P=0.024). YB-1 expression in breast cancer may be a potential marker of chemoresistance and could possibly aid in selection of the appropriate adjuvant chemotherapy regime for breast cancers.


Assuntos
Biomarcadores Tumorais/sangue , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/patologia , Carcinoma Ductal de Mama/tratamento farmacológico , Carcinoma Ductal de Mama/patologia , Proteínas de Ligação a DNA/biossíntese , Proteínas de Ligação a DNA/sangue , Membro 1 da Subfamília B de Cassetes de Ligação de ATP/biossíntese , Adulto , Idoso , Idoso de 80 Anos ou mais , Quimioterapia Adjuvante , Progressão da Doença , Resistencia a Medicamentos Antineoplásicos , Feminino , Humanos , Imuno-Histoquímica , Pessoa de Meia-Idade , Proteínas Nucleares , Prognóstico , Receptores de Estrogênio , Resultado do Tratamento , Regulação para Cima , Proteína 1 de Ligação a Y-Box
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...