Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
PLoS One ; 8(7): e68857, 2013.
Article in English | MEDLINE | ID: mdl-23874789

ABSTRACT

BACKGROUND: Initiation of transcription is essential for most of the cellular responses to environmental conditions and for cell and tissue specificity. This process is regulated through numerous proteins, their ligands and mutual interactions, as well as interactions with DNA. The key such regulatory proteins are transcription factors (TFs) and transcription co-factors (TcoFs). TcoFs are important since they modulate the transcription initiation process through interaction with TFs. In eukaryotes, transcription requires that TFs form different protein complexes with various nuclear proteins. To better understand transcription regulation, it is important to know the functional class of proteins interacting with TFs during transcription initiation. Such information is not fully available, since not all proteins that act as TFs or TcoFs are yet annotated as such, due to generally partial functional annotation of proteins. In this study we have developed a method to predict, using only sequence composition of the interacting proteins, the functional class of human TF binding partners to be (i) TF, (ii) TcoF, or (iii) other nuclear protein. This allows for complementing the annotation of the currently known pool of nuclear proteins. Since only the knowledge of protein sequences is required in addition to protein interaction, the method should be easily applicable to many species. RESULTS: Based on experimentally validated interactions between human TFs with different TFs, TcoFs and other nuclear proteins, our two classification systems (implemented as a web-based application) achieve high accuracies in distinguishing TFs and TcoFs from other nuclear proteins, and TFs from TcoFs respectively. CONCLUSION: As demonstrated, given the fact that two proteins are capable of forming direct physical interactions and using only information about their sequence composition, we have developed a completely new method for predicting a functional class of TF interacting protein partners with high precision and accuracy.


Subject(s)
Computational Biology/methods , Multiprotein Complexes/metabolism , Transcription Factors/metabolism , Databases, Protein , Humans , Protein Binding
2.
Int J Data Min Bioinform ; 7(4): 450-62, 2013.
Article in English | MEDLINE | ID: mdl-23798227

ABSTRACT

Information on Protein Interactions (Pls) is valuable for biomedical research, but often lies buried in the scientific literature and cannot be readily retrieved. While much progress has been made over the years in extracting Pls from the literature using computational methods, there is a lack of free, public, user-friendly tools for the discovery of Pls. We developed an online tool for the extraction of PI relationships from PubMed-abstracts, which we name PIMiner. Protein pairs and the words that describe their interactions are reported by PIMiner so that new interactions can be easily detected within text. The interaction likelihood levels are reported too. The option to extract only specific types of interactions is also provided. The PIMiner server can be accessed through a web browser or remotely through a client's command line. PIMiner can process 50,000 PubMed abstracts in approximately 7 min and thus appears suitable for large-scale processing of biological/biomedical literature.


Subject(s)
Protein Interaction Mapping/methods , Proteins/chemistry , Software , Binding Sites , Information Storage and Retrieval , Internet , Proteins/metabolism , PubMed
4.
Bioinformatics ; 29(1): 117-8, 2013 Jan 01.
Article in English | MEDLINE | ID: mdl-23110968

ABSTRACT

SUMMARY: In higher eukaryotes, the identification of translation initiation sites (TISs) has been focused on finding these signals in cDNA or mRNA sequences. Using Arabidopsis thaliana (A.t.) information, we developed a prediction tool for signals within genomic sequences of plants that correspond to TISs. Our tool requires only genome sequence, not expressed sequences. Its sensitivity/specificity is for A.t. (90.75%/92.2%), for Vitis vinifera (66.8%/94.4%) and for Populus trichocarpa (81.6%/94.4%), which suggests that our tool can be used in annotation of different plant genomes. We provide a list of features used in our model. Further study of these features may improve our understanding of mechanisms of the translation initiation. AVAILABILITY AND IMPLEMENTATION: Our tool is implemented as an artificial neural network. It is available as a web-based tool and, together with the source code, the list of features, and data used for model development, is accessible at http://cbrc.kaust.edu.sa/dts.


Subject(s)
Arabidopsis/genetics , Peptide Chain Initiation, Translational , Software , Genome, Plant , Genomics , Internet , Neural Networks, Computer , Nucleotide Motifs , Sensitivity and Specificity , Sequence Analysis, DNA
5.
PLoS One ; 7(4): e34480, 2012.
Article in English | MEDLINE | ID: mdl-22493694

ABSTRACT

BACKGROUND: Protein interaction networks (PINs) specific within a particular context contain crucial information regarding many cellular biological processes. For example, PINs may include information on the type and directionality of interaction (e.g. phosphorylation), location of interaction (i.e. tissues, cells), and related diseases. Currently, very few tools are capable of deriving context-specific PINs for conducting exploratory analysis. RESULTS: We developed a literature-based online system, Context-specific Protein Network Miner (CPNM), which derives context-specific PINs in real-time from the PubMed database based on a set of user-input keywords and enhanced PubMed query system. CPNM reports enriched information on protein interactions (with type and directionality), their network topology with summary statistics (e.g. most densely connected proteins in the network; most densely connected protein-pairs; and proteins connected by most inbound/outbound links) that can be explored via a user-friendly interface. Some of the novel features of the CPNM system include PIN generation, ontology-based PubMed query enhancement, real-time, user-queried, up-to-date PubMed document processing, and prediction of PIN directionality. CONCLUSIONS: CPNM provides a tool for biologists to explore PINs. It is freely accessible at http://www.biotextminer.com/CPNM/.


Subject(s)
Data Mining/methods , Protein Interaction Mapping/methods , Protein Interaction Maps , Proteins/metabolism , Software , Algorithms , Databases, Genetic , Humans , Internet , Proteins/genetics , PubMed , User-Computer Interface
6.
Am J Respir Cell Mol Biol ; 47(1): 112-9, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22383585

ABSTRACT

Many genes have been implicated in the pathogenesis of common respiratory and related diseases (RRDs), yet the underlying mechanisms are largely unknown. Differential gene expression patterns in diseased and healthy individuals suggest that RRDs affect or are affected by modified transcription regulation programs. It is thus crucial to characterize implicated genes in terms of transcriptional regulation. For this purpose, we conducted a promoter analysis of genes associated with 11 common RRDs including allergic rhinitis, asthma, bronchiectasis, bronchiolitis, bronchitis, chronic obstructive pulmonary disease, cystic fibrosis, emphysema, eczema, psoriasis, and urticaria, many of which are thought to be genetically related. The objective of the present study was to obtain deeper insight into the transcriptional regulation of these disease-associated genes by annotating their promoter regions with transcription factors (TFs) and TF binding sites (TFBSs). We discovered many TFs that are significantly enriched in the target disease groups including associations that have been documented in the literature. We also identified a number of putative TFs/TFBSs that appear to be novel. The results of our analysis are provided in an online database that is freely accessible to researchers at http://www.respiratorygenomics.com. Promoter-associated TFBS information and related genomic features, such as histone modification sites, microsatellites, CpG islands, and SNPs, are graphically summarized in the database. Users can compare and contrast underlying mechanisms of specific RRDs relative to candidate genes, TFs, gene ontology terms, micro-RNAs, and biological pathways for the conduct of metaanalyses. This database represents a novel, useful resource for RRD researchers.


Subject(s)
Databases, Genetic , Molecular Sequence Annotation , Promoter Regions, Genetic , Respiratory Tract Diseases/genetics , Gene Expression Regulation , Genomics/methods , Humans , Transcription Factors/genetics
7.
Bioinformatics ; 28(5): 747-9, 2012 Mar 01.
Article in English | MEDLINE | ID: mdl-22238258

ABSTRACT

MOTIVATION: Molecular interaction information, such as protein-protein interactions and protein-small molecule interactions, is indispensable for understanding the mechanism of biological processes and discovering treatments for diseases. Many databases have been built by manual annotation of literature to organize such information into structured form. However, most databases focus on only one type of interactions, which are often not well annotated and integrated with related functional information. RESULTS: In this study, we integrate molecular interaction information from literature by automatic information extraction and from manually annotated databases. We further integrate the relationships between protein/gene and other bio-entity terms including gene ontology terms, pathways, species and diseases to build an integrated molecular interaction database (IMID). Interactions can be selected by their associated probabilities. IMID allows complex and versatile queries for context-specific molecular interactions, which are not available currently in other molecular interaction databases. AVAILABILITY: The database is located at www.integrativebiology.org.


Subject(s)
Databases, Protein , Proteins/metabolism , Humans , Internet , Protein Binding , Protein Interaction Maps , Vocabulary, Controlled
8.
Bioinformatics ; 28(1): 127-9, 2012 Jan 01.
Article in English | MEDLINE | ID: mdl-22088842

ABSTRACT

MOTIVATION: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. CONTACT: vladimir.bajic@kaust.edu.sa SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Neural Networks, Computer , Poly A/analysis , Genome, Human , Humans , Internet , Poly A/genetics , Sensitivity and Specificity , Software
9.
Article in English | MEDLINE | ID: mdl-23367189

ABSTRACT

Electronic Health Records (EHR) contain large amounts of useful information that could potentially be used for building models for predicting onset of diseases. In this study, we have investigated the use of free-text and coded data in Marshfield Clinic's EHR, individually and in combination for building machine learning based models to predict the first ever episode of atrial fibrillation and/or atrial flutter (AFF). We trained and evaluated our AFF models on the EHR data across different time intervals (1, 3, 5 and all years) prior to first documented onset of AFF. We applied several machine learning methods, including naïve bayes, support vector machines (SVM), logistic regression and random forests for building AFF prediction models and evaluated these using 10-fold cross-validation approach. On text-based datasets, the best model achieved an F-measure of 60.1%, when applied exclusively to coded data. The combination of textual and coded data achieved comparable performance. The study results attest to the relative merit of utilizing textual data to complement the use of coded data for disease onset prediction modeling.


Subject(s)
Atrial Fibrillation/diagnosis , Atrial Flutter/diagnosis , Electronic Health Records , Humans
10.
PLoS One ; 6(6): e21474, 2011.
Article in English | MEDLINE | ID: mdl-21738677

ABSTRACT

A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein-protein interactions, protein/gene regulations, protein-small molecule interactions, protein-GO relationships, protein-pathway relationships, and pathway-disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses--the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs.


Subject(s)
Computational Biology/methods , Algorithms , Databases, Factual , Proteins
11.
BMC Syst Biol ; 4 Suppl 1: S4, 2010 May 28.
Article in English | MEDLINE | ID: mdl-20522254

ABSTRACT

BACKGROUND: The purpose of this study is to: i) develop a computational model of promoters of human histone-encoding genes (shortly histone genes), an important class of genes that participate in various critical cellular processes, ii) use the model so developed to identify regions across the human genome that have similar structure as promoters of histone genes; such regions could represent potential genomic regulatory regions, e.g. promoters, of genes that may be coregulated with histone genes, and iii/ identify in this way genes that have high likelihood of being coregulated with the histone genes. RESULTS: We successfully developed a histone promoter model using a comprehensive collection of histone genes. Based on leave-one-out cross-validation test, the model produced good prediction accuracy (94.1% sensitivity, 92.6% specificity, and 92.8% positive predictive value). We used this model to predict across the genome a number of genes that shared similar promoter structures with the histone gene promoters. We thus hypothesize that these predicted genes could be coregulated with histone genes. This hypothesis matches well with the available gene expression, gene ontology, and pathways data. Jointly with promoters of the above-mentioned genes, we found a large number of intergenic regions with similar structure as histone promoters. CONCLUSIONS: This study represents one of the most comprehensive computational analyses conducted thus far on a genome-wide scale of promoters of human histone genes. Our analysis suggests a number of other human genes that share a high similarity of promoter structure with the histone genes and thus are highly likely to be coregulated, and consequently coexpressed, with the histone genes. We also found that there are a large number of intergenic regions across the genome with their structures similar to promoters of histone genes. These regions may be promoters of yet unidentified genes, or may represent remote control regions that participate in regulation of histone and histone-coregulated gene transcription initiation. While these hypotheses still remain to be verified, we believe that these form a useful resource for researchers to further explore regulation of human histone genes and human genome. It is worthwhile to note that the regulatory regions of the human genome remain largely un-annotated even today and this study is an attempt to supplement our understanding of histone regulatory regions.


Subject(s)
Genome, Human/genetics , Genomics , Histones/genetics , Promoter Regions, Genetic/genetics , Bayes Theorem , Humans
12.
Bioinformatics ; 25(12): 1536-42, 2009 Jun 15.
Article in English | MEDLINE | ID: mdl-19369495

ABSTRACT

MOTIVATION: Protein-protein interaction (PPI) extraction from published biological articles has attracted much attention because of the importance of protein interactions in biological processes. Despite significant progress, mining PPIs from literatures still rely heavily on time- and resource-consuming manual annotations. RESULTS: In this study, we developed a novel methodology based on Bayesian networks (BNs) for extracting PPI triplets (a PPI triplet consists of two protein names and the corresponding interaction word) from unstructured text. The method achieved an overall accuracy of 87% on a cross-validation test using manually annotated dataset. We also showed, through extracting PPI triplets from a large number of PubMed abstracts, that our method was able to complement human annotations to extract large number of new PPIs from literature. AVAILABILITY: Programs/scripts we developed/used in the study are available at http://stat.fsu.edu/~jinfeng/datasets/Bio-SI-programs-Bayesian-chowdhary-zhang-liu.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Bayes Theorem , Computational Biology/methods , Information Storage and Retrieval/methods , Protein Interaction Mapping/methods , Proteins/chemistry , Binding Sites , Databases, Protein , Proteins/metabolism
13.
Bioinformatics ; 22(18): 2310-2, 2006 Sep 15.
Article in English | MEDLINE | ID: mdl-16613910

ABSTRACT

UNLABELLED: Dragon Promoter Mapper (DPM) is a tool to model promoter structure of co-regulated genes using methodology of Bayesian networks. DPM exploits an exhaustive set of motif features (such as motif, its strand, the order of motif occurrence and mutual distance between the adjacent motifs) and generates models from the target promoter sequences, which may be used to (1) detect regions in a genomic sequence which are similar to the target promoters or (2) to classify other promoters as similar or not to the target promoter group. DPM can also be used for modelling of enhancers and silencers. AVAILABILITY: http://defiant.i2r.a-star.edu.sg/projects/BayesPromoter/ CONTACT: vlad@sanbi.ac.za SUPPLEMENTARY INFORMATION: Manual for using DPM web server is provided at http://defiant.i2r.a-star.edu.sg/projects/BayesPromoter/html/manual/manual.htm.


Subject(s)
Algorithms , Chromosome Mapping/methods , Models, Genetic , Promoter Regions, Genetic/genetics , Sequence Analysis, DNA/methods , Software , User-Computer Interface , Bayes Theorem , Computer Simulation , Models, Statistical , Pattern Recognition, Automated/methods
14.
Int J Bioinform Res Appl ; 2(3): 282-8, 2006.
Article in English | MEDLINE | ID: mdl-18048166

ABSTRACT

The standard practice in the analysis of promoters is to select promoter regions of convenient length. This may lead to false results when searching for Transcription Factor Binding Sites (TFBSs), since the sequences may contain coding segments. In such cases, motif detection may single out motifs from the coding regions. The mapping of TFBSs to promoters may result in a misleading picture of 'promoter' content. We illustrate these issues using the example of histones H2A and H2B and show how such analysis could be misleading if care is not exercised to eliminate coding regions from the presumed promoter sequences.


Subject(s)
Computational Biology/methods , Promoter Regions, Genetic , Amino Acid Motifs , Binding Sites , Histones/chemistry , Humans , Models, Genetic , Protein Binding , Protein Structure, Tertiary , Software , Transcription Factors/metabolism , Transcription Initiation Site , Transcription, Genetic
15.
BMC Bioinformatics ; 7 Suppl 5: S8, 2006 Dec 18.
Article in English | MEDLINE | ID: mdl-17254313

ABSTRACT

BACKGROUND: Mammalian antimicrobial peptides (AMPs) are effectors of the innate immune response. A multitude of signals coming from pathways of mammalian pathogen/pattern recognition receptors and other proteins affect the expression of AMP-coding genes (AMPcgs). For many AMPcgs the promoter elements and transcription factors that control their tissue cell-specific expression have yet to be fully identified and characterized. RESULTS: Based upon the RIKEN full-length cDNA and public sequence data derived from human, mouse and rat, we identified 178 candidate AMP transcripts derived from 61 genes belonging to 29 AMP families. However, only for 31 mouse genes belonging to 22 AMP families we were able to determine true orthologous relationships with 30 human and 15 rat sequences. We screened the promoter regions of AMPcgs in the three species for motifs by an ab initio motif finding method and analyzed the derived promoter characteristics. Promoter models were developed for alpha-defensins, penk and zap AMP families. The results suggest a core set of transcription factors (TFs) that regulate the transcription of AMPcg families in mouse, rat and human. The three most frequent core TFs groups include liver-, nervous system-specific and nuclear hormone receptors (NHRs). Out of 440 motifs analyzed, we found that three represent potentially novel TF-binding motifs enriched in promoters of AMPcgs, while the other four motifs appear to be species-specific. CONCLUSION: Our large-scale computational analysis of promoters of 22 families of AMPcgs across three mammalian species suggests that their key transcriptional regulators are likely to be TFs of the liver-, nervous system-specific and NHR groups. The computationally inferred promoter elements and potential TF binding motifs provide a rich resource for targeted experimental validation of TF binding and signaling studies that aim at the regulation of mouse, rat or human AMPcgs.


Subject(s)
Antimicrobial Cationic Peptides/genetics , Computational Biology/methods , Promoter Regions, Genetic , Sequence Analysis, DNA/methods , Animals , Binding Sites , Carrier Proteins/genetics , Enkephalins/genetics , Humans , Mice , Multigene Family/genetics , Protein Precursors/genetics , RNA-Binding Proteins , Rats , Transcription Factors/metabolism , alpha-Defensins/genetics
16.
Bioinformatics ; 21(11): 2623-8, 2005 Jun 01.
Article in English | MEDLINE | ID: mdl-15769833

ABSTRACT

MOTIVATION: Histone proteins play important roles in chromosomal functions. They are significantly evolutionarily conserved across species, which suggests similarity in their transcription regulation. The abundance of experimental data on histone promoters provides an excellent background for the evaluation of computational methods. Our study addresses the issue of how well computational analysis can contribute to unveiling the biologically relevant content of promoter regions for a large number of mammalian histone genes taken across several species, and suggests the consensus promoter models of different histone groups. RESULTS: This is the first study to unveil the detailed promoter structures of all five mammalian histone groups and their subgroups. This is also the most comprehensive computational analysis of histone promoters performed to date. The most exciting fact is that the results correlate very well with the biologically known facts and experimental data. Our analysis convincingly demonstrates that computational approach can significantly contribute to elucidation of promoter content (identification of biologically relevant signals) complementing tedious wet-lab experiments. We believe that this type of analysis can be easily applied to other functional gene classes, thus providing a general framework for modelling promoter groups. These results also provide the basis to hunt for genes co-regulated with histone genes across mammalian genomes.


Subject(s)
Algorithms , Evolution, Molecular , Histones/genetics , Models, Genetic , Promoter Regions, Genetic/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Animals , Conserved Sequence , Humans , Mice , Phylogeny , Rats , Sequence Homology, Nucleic Acid , Species Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...