Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Methods Mol Biol ; 2800: 217-229, 2024.
Article in English | MEDLINE | ID: mdl-38709487

ABSTRACT

High-throughput microscopy has enabled screening of cell phenotypes at unprecedented scale. Systematic identification of cell phenotype changes (such as cell morphology and protein localization changes) is a major analysis goal. Because cell phenotypes are high-dimensional, unbiased approaches to detect and visualize the changes in phenotypes are still needed. Here, we suggest that changes in cellular phenotype can be visualized in reduced dimensionality representations of the image feature space. We describe a freely available analysis pipeline to visualize changes in protein localization in feature spaces obtained from deep learning. As an example, we use the pipeline to identify changes in subcellular localization after the yeast GFP collection was treated with hydroxyurea.


Subject(s)
Image Processing, Computer-Assisted , Phenotype , Image Processing, Computer-Assisted/methods , High-Throughput Screening Assays/methods , Microscopy/methods , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae/genetics , Deep Learning , Green Fluorescent Proteins/metabolism , Green Fluorescent Proteins/genetics , Hydroxyurea/pharmacology
2.
Cell Syst ; 15(3): 286-294.e2, 2024 Mar 20.
Article in English | MEDLINE | ID: mdl-38428432

ABSTRACT

Pretrained protein sequence language models have been shown to improve the performance of many prediction tasks and are now routinely integrated into bioinformatics tools. However, these models largely rely on the transformer architecture, which scales quadratically with sequence length in both run-time and memory. Therefore, state-of-the-art models have limitations on sequence length. To address this limitation, we investigated whether convolutional neural network (CNN) architectures, which scale linearly with sequence length, could be as effective as transformers in protein language models. With masked language model pretraining, CNNs are competitive with, and occasionally superior to, transformers across downstream applications while maintaining strong performance on sequences longer than those allowed in the current state-of-the-art transformer models. Our work suggests that computational efficiency can be improved without sacrificing performance, simply by using a CNN architecture instead of a transformer, and emphasizes the importance of disentangling pretraining task and model architecture. A record of this paper's transparent peer review process is included in the supplemental information.


Subject(s)
Computational Biology , Neural Networks, Computer , Amino Acid Sequence , Peer Review
3.
Nat Commun ; 15(1): 1059, 2024 Feb 05.
Article in English | MEDLINE | ID: mdl-38316764

ABSTRACT

The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process. We describe a protein backbone structure as a sequence of angles capturing the relative orientation of the constituent backbone atoms, and generate structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins natively twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for more complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release an open-source codebase and trained models for protein structure diffusion.


Subject(s)
Protein Folding , Proteins , Proteins/metabolism , Neural Networks, Computer , Protein Conformation
4.
PLoS Comput Biol ; 18(6): e1010238, 2022 06.
Article in English | MEDLINE | ID: mdl-35767567

ABSTRACT

A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call "reverse homology", exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.


Subject(s)
Intrinsically Disordered Proteins , Proteome , Amino Acid Sequence , Evolution, Molecular , Intrinsically Disordered Proteins/chemistry , Protein Conformation , Proteome/metabolism
5.
PLoS Comput Biol ; 15(9): e1007348, 2019 09.
Article in English | MEDLINE | ID: mdl-31479439

ABSTRACT

Cellular microscopy images contain rich insights about biology. To extract this information, researchers use features, or measurements of the patterns of interest in the images. Here, we introduce a convolutional neural network (CNN) to automatically design features for fluorescence microscopy. We use a self-supervised method to learn feature representations of single cells in microscopy images without labelled training data. We train CNNs on a simple task that leverages the inherent structure of microscopy images and controls for variation in cell morphology and imaging: given one cell from an image, the CNN is asked to predict the fluorescence pattern in a second different cell from the same image. We show that our method learns high-quality features that describe protein expression patterns in single cells both yeast and human microscopy datasets. Moreover, we demonstrate that our features are useful for exploratory biological analysis, by capturing high-resolution cellular components in a proteome-wide cluster analysis of human proteins, and by quantifying multi-localized proteins and single-cell variability. We believe paired cell inpainting is a generalizable method to obtain feature representations of single cells in multichannel microscopy images.


Subject(s)
Microscopy/methods , Single-Cell Analysis/methods , Unsupervised Machine Learning , Cells, Cultured , Computational Biology , Humans , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Yeasts/cytology
6.
Bioinformatics ; 35(21): 4525-4527, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31095270

ABSTRACT

SUMMARY: We introduce YeastSpotter, a web application for the segmentation of yeast microscopy images into single cells. YeastSpotter is user-friendly and generalizable, reducing the computational expertise required for this critical preprocessing step in many image analysis pipelines. AVAILABILITY AND IMPLEMENTATION: YeastSpotter is available at http://yeastspotter.csb.utoronto.ca/. Code is available at https://github.com/alexxijielu/yeast_segmentation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Microscopy , Software , Cell Count , Saccharomyces cerevisiae
7.
Elife ; 72018 04 05.
Article in English | MEDLINE | ID: mdl-29620521

ABSTRACT

The evaluation of protein localization changes on a systematic level is a powerful tool for understanding how cells respond to environmental, chemical, or genetic perturbations. To date, work in understanding these proteomic responses through high-throughput imaging has catalogued localization changes independently for each perturbation. To distinguish changes that are targeted responses to the specific perturbation or more generalized programs, we developed a scalable approach to visualize the localization behavior of proteins across multiple experiments as a quantitative pattern. By applying this approach to 24 experimental screens consisting of nearly 400,000 images, we differentiated specific responses from more generalized ones, discovered nuance in the localization behavior of stress-responsive proteins, and formed hypotheses by clustering proteins that have similar patterns. Previous approaches aim to capture all localization changes for a single screen as accurately as possible, whereas our work aims to integrate large amounts of imaging data to find unexpected new cell biology.


Subject(s)
Image Processing, Computer-Assisted/methods , Microscopy, Fluorescence/methods , Proteome/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Subcellular Fractions/metabolism , Computational Biology/methods , Gene Ontology , High-Throughput Screening Assays , Humans , Protein Transport , Proteome/analysis , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/growth & development , Saccharomyces cerevisiae Proteins/genetics
8.
Bio Protoc ; 8(18): e3022, 2018 Sep 20.
Article in English | MEDLINE | ID: mdl-34395810

ABSTRACT

The evaluation of protein localization changes in cells under diverse chemical and genetic perturbations is now possible due to the increasing quantity of screens that systematically image thousands of proteins in an organism. Integrating information from different screens provides valuable contextual information about the protein function. For example, proteins that change localization in response to many different stressful environmental perturbations may have different roles than those that only change in response to a few. We developed, to our knowledge, the first protocol that permits the quantitative comparison and clustering of protein localization changes across multiple screens. Our analysis allows for the exploratory analysis of proteins according to their pattern of localization changes across many different perturbations, potentially discovering new roles by association.

SELECTION OF CITATIONS
SEARCH DETAIL
...