Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
Neural Netw ; 93: 219-229, 2017 Sep.
Article in English | MEDLINE | ID: mdl-28668660

ABSTRACT

Stochastic Gradient Descent (SGD) updates Convolutional Neural Network (CNN) with a noisy gradient computed from a random batch, and each batch evenly updates the network once in an epoch. This model applies the same training effort to each batch, but it overlooks the fact that the gradient variance, induced by Sampling Bias and Intrinsic Image Difference, renders different training dynamics on batches. In this paper, we develop a new training strategy for SGD, referred to as Inconsistent Stochastic Gradient Descent (ISGD) to address this problem. The core concept of ISGD is the inconsistent training, which dynamically adjusts the training effort w.r.t the loss. ISGD models the training as a stochastic process that gradually reduces down the mean of batch's loss, and it utilizes a dynamic upper control limit to identify a large loss batch on the fly. ISGD stays on the identified batch to accelerate the training with additional gradient updates, and it also has a constraint to penalize drastic parameter changes. ISGD is straightforward, computationally efficient and without requiring auxiliary memories. A series of empirical evaluations on real world datasets and networks demonstrate the promising performance of inconsistent training.


Subject(s)
Neural Networks, Computer , Stochastic Processes
2.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955619

ABSTRACT

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Subject(s)
DNA/genetics , Encyclopedias as Topic , Gene Regulatory Networks/genetics , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Alleles , Cell Line , GATA1 Transcription Factor/metabolism , Gene Expression Profiling , Genomics , Humans , K562 Cells , Organ Specificity , Phosphorylation/genetics , Polymorphism, Single Nucleotide/genetics , Protein Interaction Maps , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Selection, Genetic/genetics , Transcription Initiation Site
3.
Genome Res ; 22(9): 1658-67, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955978

ABSTRACT

Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.


Subject(s)
Gene Expression Regulation , Genomics , Transcription Factors/metabolism , Transcription, Genetic , Base Composition , Binding Sites/genetics , Cell Line , Chromatin/genetics , Chromatin/metabolism , Computational Biology/methods , Histones/genetics , Humans , Models, Biological , Promoter Regions, Genetic , Protein Binding/genetics , Transcription Initiation Site
4.
Nucleic Acids Res ; 40(Database issue): D687-94, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22009677

ABSTRACT

About one-fifth of the genes in the budding yeast are essential for haploid viability and cannot be functionally assessed using standard genetic approaches such as gene deletion. To facilitate genetic analysis of essential genes, we and others have assembled collections of yeast strains expressing temperature-sensitive (ts) alleles of essential genes. To explore the phenotypes caused by essential gene mutation we used a panel of genetically engineered fluorescent markers to explore the morphology of cells in the ts strain collection using high-throughput microscopy. Here, we describe the design and implementation of an online database, PhenoM (Phenomics of yeast Mutants), for storing, retrieving, visualizing and data mining the quantitative single-cell measurements extracted from micrographs of the ts mutant cells. PhenoM allows users to rapidly search and retrieve raw images and their quantified morphological data for genes of interest. The database also provides several data-mining tools, including a PhenoBlast module for phenotypic comparison between mutant strains and a Gene Ontology module for functional enrichment analysis of gene sets showing similar morphological alterations. The current PhenoM version 1.0 contains 78,194 morphological images and 1,909,914 cells covering six subcellular compartments or structures for 775 ts alleles spanning 491 essential genes. PhenoM is freely available at http://phenom.ccbr.utoronto.ca/.


Subject(s)
Databases, Genetic , Genes, Essential , Genes, Fungal , Mutation , Phenotype , Saccharomyces cerevisiae/genetics , Data Mining , Saccharomyces cerevisiae/cytology
5.
Bioinformatics ; 27(23): 3221-7, 2011 Dec 01.
Article in English | MEDLINE | ID: mdl-22039215

ABSTRACT

MOTIVATION: ChIP-seq and ChIP-chip experiments have been widely used to identify transcription factor (TF) binding sites and target genes. Conventionally, a fairly 'simple' approach is employed for target gene identification e.g. finding genes with binding sites within 2 kb of a transcription start site (TSS). However, this does not take into account the number of sites upstream of the TSS, their exact positioning or the fact that different TFs appear to act at different characteristic distances from the TSS. RESULTS: Here we propose a probabilistic model called target identification from profiles (TIP) that quantitatively measures the regulatory relationships between TFs and target genes. For each TF, our model builds a characteristic, averaged profile of binding around the TSS and then uses this to weight the sites associated with a given gene, providing a continuous-valued 'regulatory' score relating each TF and potential target. Moreover, the score can readily be turned into a ranked list of target genes and an estimate of significance, which is useful for case-dependent downstream analysis. CONCLUSION: We show the advantages of TIP by comparing it to the 'simple' approach on several representative datasets, using motif occurrence and relationship to knock-out experiments as metrics of validation. Moreover, we show that the probabilistic model is not as sensitive to various experimental parameters (including sequencing depth and peak-calling method) as the simple approach; in fact, the lesser dependence on sequencing depth potentially utilizes the result of a ChIP-seq experiment in a more 'cost-effective' manner. CONTACT: mark.gerstein@yale.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Models, Statistical , Transcription Factors/metabolism , Amino Acid Motifs , Animals , Binding Sites , Chromatin Immunoprecipitation , Estrogen Receptor alpha/metabolism , Gene Expression Regulation , Mice , Oligonucleotide Array Sequence Analysis , Protein Binding , STAT4 Transcription Factor/metabolism , Sequence Analysis, DNA , Transcription Factors/chemistry , Transcription Factors/genetics , Transcription Initiation Site
6.
Nat Biotechnol ; 29(4): 361-7, 2011 Apr.
Article in English | MEDLINE | ID: mdl-21441928

ABSTRACT

Conditional temperature-sensitive (ts) mutations are valuable reagents for studying essential genes in the yeast Saccharomyces cerevisiae. We constructed 787 ts strains, covering 497 (∼45%) of the 1,101 essential yeast genes, with ∼30% of the genes represented by multiple alleles. All of the alleles are integrated into their native genomic locus in the S288C common reference strain and are linked to a kanMX selectable marker, allowing further genetic manipulation by synthetic genetic array (SGA)-based, high-throughput methods. We show two such manipulations: barcoding of 440 strains, which enables chemical-genetic suppression analysis, and the construction of arrays of strains carrying different fluorescent markers of subcellular structure, which enables quantitative analysis of phenotypes using high-content screening. Quantitative analysis of a GFP-tubulin marker identified roles for cohesin and condensin genes in spindle disassembly. This mutant collection should facilitate a wide range of systematic studies aimed at understanding the functions of essential genes.


Subject(s)
Genes, Essential , Genome, Fungal , Saccharomyces cerevisiae/genetics , Temperature , Alleles , Databases, Genetic , Genes, Fungal , Genes, Lethal , Genetic Engineering/methods , Genetic Loci , Mass Spectrometry/methods , Microarray Analysis/methods , Microscopy, Confocal , Mutation , Phenotype , Plasmids , RNA, Messenger , Saccharomyces cerevisiae/growth & development , Single-Cell Analysis , Tubulin/analysis
7.
PLoS Comput Biol ; 6(8)2010 Aug 26.
Article in English | MEDLINE | ID: mdl-20865155

ABSTRACT

Variations in gene expression level might lead to phenotypic diversity across individuals or populations. Although many human genes are found to have differential mRNA levels between populations, the extent of gene expression that could vary within and between populations largely remains elusive. To investigate the dynamic range of gene expression, we analyzed the expression variability of ∼18, 000 human genes across individuals within HapMap populations. Although ∼20% of human genes show differentiated mRNA levels between populations, our results show that expression variability of most human genes in one population is not significantly deviant from another population, except for a small fraction that do show substantially higher expression variability in a particular population. By associating expression variability with sequence polymorphism, intriguingly, we found SNPs in the untranslated regions (5' and 3'UTRs) of these variable genes show consistently elevated population heterozygosity. We performed differential expression analysis on a genome-wide scale, and found substantially reduced expression variability for a large number of genes, prohibiting them from being differentially expressed between populations. Functional analysis revealed that genes with the greatest within-population expression variability are significantly enriched for chemokine signaling in HIV-1 infection, and for HIV-interacting proteins that control viral entry, replication, and propagation. This observation combined with the finding that known human HIV host factors show substantially elevated expression variability, collectively suggest that gene expression variability might explain differential HIV susceptibility across individuals.


Subject(s)
Gene Expression Profiling , Genetic Predisposition to Disease , Genetic Variation , HIV Infections/genetics , HIV-1 , Models, Genetic , Chemokines/genetics , Female , Heterozygote , Humans , Male , Polymorphism, Single Nucleotide , Population/genetics , Untranslated Regions/genetics , Virus Internalization , Virus Replication/genetics
8.
Proc Natl Acad Sci U S A ; 107(23): 10472-7, 2010 Jun 08.
Article in English | MEDLINE | ID: mdl-20489180

ABSTRACT

Gene regulation is a process with many steps allowing for stochastic biochemical reactions, which leads to expression noise-i.e., the cell-to-cell stochastic fluctuation in protein abundance. Such expression noise can give rise to drastically diverse phenotypes, even within isogenic cell populations. Although numerous biophysical approaches had been proposed to model the origin and propagation of expression noise in biological networks, these models essentially characterize the innate stochastic dynamics in gene regulation in a mechanistic way. In this work, by investigating expression noise in the context of yeast cellular networks, we place the biophysical formulism onto solid genetic ground. At the sequence level, we show that extremely noisy genes are highly conserved in their coding sequences. At the level of cellular networks where natural selection is manifested by the topological constraints, we show that genes with varying expression noise are modularly organized in the protein interaction network and are positioned orderly in the gene regulatory network. We demonstrate that these topological constraints are highly predictive of stochastic gene expression, with which we were able to confidently predict stochastic expression for more than 2,000 yeast genes whose expression noise was previously not known. We validated the predictions by high-content cell imaging. Our approach makes feasible genome-wide prediction of stochastic gene expression, and such predictability in turn suggests that expression noise is an evolvable genetic trait.


Subject(s)
Gene Expression Regulation, Fungal , Genome, Fungal , Saccharomyces cerevisiae/genetics , Sequence Analysis, DNA/methods , Gene Regulatory Networks
9.
J Bioinform Comput Biol ; 7(6): 955-72, 2009 Dec.
Article in English | MEDLINE | ID: mdl-20014473

ABSTRACT

Due to the difficulties in identifying microRNA (miRNA) targets experimentally in a high-throughput manner, several computational approaches have been proposed. To this date, most leading algorithms are based on sequence information alone. However, there has been limited overlap between these predictions, implying high false-positive rates, which underlines the limitation of sequence-based approaches. Considering the repressive nature of miRNAs at the mRNA translational level, here we describe a probabilistic model to make predictions by combining sequence complementarity, miRNA expression level, and protein abundance. Our underlying assumption is that, given sequence complementarity between a miRNA and its putative mRNA targets, the miRNA expression level should be high and the protein abundance of the mRNA should be low. Having identified a set of confident predictions, we then built a second probabilistic model to trace back to the mRNA expression of the confident targets to investigate the mechanisms of the miRNA-mediated post-transcriptional regulation. Our results suggest that translational repression (which has no effect on mRNA level), instead of mRNA degradation, is the dominant mechanism in miRNA regulation. This observation explained the previously observed discordant correlation between mRNA expression and protein abundance.


Subject(s)
Algorithms , Gene Targeting/methods , MicroRNAs/genetics , Models, Genetic , Models, Statistical , Proteome/genetics , Sequence Analysis, RNA/methods , Base Sequence , Computer Simulation , Molecular Sequence Data
10.
J Comput Biol ; 16(3): 457-74, 2009 Mar.
Article in English | MEDLINE | ID: mdl-19254184

ABSTRACT

Biological sequence classification (such as protein remote homology detection) solely based on sequence data is an important problem in computational biology, especially in the current genomics era, when large amount of sequence data are becoming available. Support vector machines (SVMs) based on mismatch string kernels were previously applied to solve this problem, achieving reasonable success. However, they still perform poorly on difficult protein families. In this paper, we propose two approaches to solve the protein remote homology detection problem: one uses a convex combination of random-walk kernels to approximate the random-walk kernel with the optimal random steps, and the other constructs an empirical-map kernel using a profile kernel. Both resulting kernels make use of a large number of pairwise sequence similarity information and unlabeled data; and have much better prediction performance than the best profile kernel directly derived from protein sequences. On a competitive Structural Classification Of Proteins (SCOP) benchmark dataset, the overall mean ROC(50) scores on 54 protein families we obtained using both approaches are above 0.90, which significantly outperform previous published results.


Subject(s)
Computational Biology/methods , Proteins/chemistry , Proteins/classification , Amino Acid Sequence , Protein Structure, Tertiary , Sequence Analysis, Protein , Sequence Homology, Amino Acid
SELECTION OF CITATIONS
SEARCH DETAIL
...