Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Nature ; 629(8014): 1174-1181, 2024 May.
Article in English | MEDLINE | ID: mdl-38720073

ABSTRACT

Phosphorylation of proteins on tyrosine (Tyr) residues evolved in metazoan organisms as a mechanism of coordinating tissue growth1. Multicellular eukaryotes typically have more than 50 distinct protein Tyr kinases that catalyse the phosphorylation of thousands of Tyr residues throughout the proteome1-3. How a given Tyr kinase can phosphorylate a specific subset of proteins at unique Tyr sites is only partially understood4-7. Here we used combinatorial peptide arrays to profile the substrate sequence specificity of all human Tyr kinases. Globally, the Tyr kinases demonstrate considerable diversity in optimal patterns of residues surrounding the site of phosphorylation, revealing the functional organization of the human Tyr kinome by substrate motif preference. Using this information, Tyr kinases that are most compatible with phosphorylating any Tyr site can be identified. Analysis of mass spectrometry phosphoproteomic datasets using this compendium of kinase specificities accurately identifies specific Tyr kinases that are dysregulated in cells after stimulation with growth factors, treatment with anti-cancer drugs or expression of oncogenic variants. Furthermore, the topology of known Tyr signalling networks naturally emerged from a comparison of the sequence specificities of the Tyr kinases and the SH2 phosphotyrosine (pTyr)-binding domains. Finally we show that the intrinsic substrate specificity of Tyr kinases has remained fundamentally unchanged from worms to humans, suggesting that the fidelity between Tyr kinases and their protein substrate sequences has been maintained across hundreds of millions of years of evolution.


Subject(s)
Phosphotyrosine , Protein-Tyrosine Kinases , Substrate Specificity , Tyrosine , Animals , Humans , Amino Acid Motifs , Evolution, Molecular , Mass Spectrometry , Phosphoproteins/chemistry , Phosphoproteins/metabolism , Phosphorylation , Phosphotyrosine/metabolism , Protein-Tyrosine Kinases/drug effects , Protein-Tyrosine Kinases/metabolism , Proteome/chemistry , Proteome/metabolism , Proteomics , Signal Transduction , src Homology Domains , Tyrosine/metabolism , Tyrosine/chemistry
2.
Front Pharmacol ; 14: 1180962, 2023.
Article in English | MEDLINE | ID: mdl-37781703

ABSTRACT

Background: As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI's ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability. Methods: We applied NLP with ML techniques to train, validate, and test the extraction of information from unstructured documents (e.g., clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for RWD analysis. This research used a nationwide electronic health record (EHR)-derived database. Models were selected based on performance. Variables curated with an approach using ML extraction are those where the value is determined solely based on an ML model (i.e. not confirmed by abstraction), which identifies key information from visit notes and documents. These models do not predict future events or infer missing information. Results: We developed an approach using NLP and ML for extraction of clinically meaningful information from unstructured EHR documents and found high performance of output variables compared with variables curated by manually abstracted data. These extraction methods resulted in research-ready variables including initial cancer diagnosis with date, advanced/metastatic diagnosis with date, disease stage, histology, smoking status, surgery status with date, biomarker test results with dates, and oral treatments with dates. Conclusion: NLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer.

3.
Nature ; 613(7945): 759-766, 2023 01.
Article in English | MEDLINE | ID: mdl-36631611

ABSTRACT

Protein phosphorylation is one of the most widespread post-translational modifications in biology1,2. With advances in mass-spectrometry-based phosphoproteomics, 90,000 sites of serine and threonine phosphorylation have so far been identified, and several thousand have been associated with human diseases and biological processes3,4. For the vast majority of phosphorylation events, it is not yet known which of the more than 300 protein serine/threonine (Ser/Thr) kinases encoded in the human genome are responsible3. Here we used synthetic peptide libraries to profile the substrate sequence specificity of 303 Ser/Thr kinases, comprising more than 84% of those predicted to be active in humans. Viewed in its entirety, the substrate specificity of the kinome was substantially more diverse than expected and was driven extensively by negative selectivity. We used our kinome-wide dataset to computationally annotate and identify the kinases capable of phosphorylating every reported phosphorylation site in the human Ser/Thr phosphoproteome. For the small minority of phosphosites for which the putative protein kinases involved have been previously reported, our predictions were in excellent agreement. When this approach was applied to examine the signalling response of tissues and cell lines to hormones, growth factors, targeted inhibitors and environmental or genetic perturbations, it revealed unexpected insights into pathway complexity and compensation. Overall, these studies reveal the intrinsic substrate specificity of the human Ser/Thr kinome, illuminate cellular signalling responses and provide a resource to link phosphorylation events to biological pathways.


Subject(s)
Phosphoproteins , Protein Serine-Threonine Kinases , Proteome , Serine , Threonine , Humans , Phosphorylation , Protein Serine-Threonine Kinases/metabolism , Serine/metabolism , Substrate Specificity , Threonine/metabolism , Proteome/chemistry , Proteome/metabolism , Datasets as Topic , Phosphoproteins/chemistry , Phosphoproteins/metabolism , Cell Line , Phosphoserine/metabolism , Phosphothreonine/metabolism
4.
Bioinformatics ; 38(9): 2381-2388, 2022 04 28.
Article in English | MEDLINE | ID: mdl-35191481

ABSTRACT

MOTIVATION: Sequence models based on deep neural networks have achieved state-of-the-art performance on regulatory genomics prediction tasks, such as chromatin accessibility and transcription factor binding. But despite their high accuracy, their contributions to a mechanistic understanding of the biology of regulatory elements is often hindered by the complexity of the predictive model and thus poor interpretability of its decision boundaries. To address this, we introduce seqgra, a deep learning pipeline that incorporates the rule-based simulation of biological sequence data and the training and evaluation of models, whose decision boundaries mirror the rules from the simulation process. RESULTS: We show that seqgra can be used to (i) generate data under the assumption of a hypothesized model of genome regulation, (ii) identify neural network architectures capable of recovering the rules of said model and (iii) analyze a model's predictive performance as a function of training set size and the complexity of the rules behind the simulated data. AVAILABILITY AND IMPLEMENTATION: The source code of the seqgra package is hosted on GitHub (https://github.com/gifford-lab/seqgra). seqgra is a pip-installable Python package. Extensive documentation can be found at https://kkrismer.github.io/seqgra. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics , Neural Networks, Computer , Software , Chromatin , Regulatory Sequences, Nucleic Acid
5.
Nucleic Acids Res ; 50(9): e52, 2022 05 20.
Article in English | MEDLINE | ID: mdl-35100401

ABSTRACT

Genomic interactions provide important context to our understanding of the state of the genome. One question is whether specific transcription factor interactions give rise to genome organization. We introduce spatzie, an R package and a website that implements statistical tests for significant transcription factor motif cooperativity between enhancer-promoter interactions. We conducted controlled experiments under realistic simulated data from ChIP-seq to confirm spatzie is capable of discovering co-enriched motif interactions even in noisy conditions. We then use spatzie to investigate cell type specific transcription factor cooperativity within recent human ChIA-PET enhancer-promoter interaction data. The method is available online at https://spatzie.mit.edu.


Subject(s)
Enhancer Elements, Genetic , Promoter Regions, Genetic , Software , Transcription Factors , Chromatin Immunoprecipitation Sequencing , Genome , Genomics , Humans , Transcription Factors/genetics , Transcription Factors/metabolism
6.
Genome Res ; 30(10): 1468-1480, 2020 10.
Article in English | MEDLINE | ID: mdl-32973041

ABSTRACT

A key mechanism in cellular regulation is the ability of the transcriptional machinery to physically access DNA. Transcription factors interact with DNA to alter the accessibility of chromatin, which enables changes to gene expression during development or disease or as a response to environmental stimuli. However, the regulation of DNA accessibility via the recruitment of transcription factors is difficult to study in the context of the native genome because every genomic site is distinct in multiple ways. Here we introduce the multiplexed integrated accessibility assay (MIAA), an assay that measures chromatin accessibility of synthetic oligonucleotide sequence libraries integrated into a controlled genomic context with low native accessibility. We apply MIAA to measure the effects of sequence motifs on cell type-specific accessibility between mouse embryonic stem cells and embryonic stem cell-derived definitive endoderm cells, screening 7905 distinct DNA sequences. MIAA recapitulates differential accessibility patterns of 100-nt sequences derived from natively differential genomic regions, identifying E-box motifs common to epithelial-mesenchymal transition driver transcription factors in stem cell-specific accessible regions that become repressed in endoderm. We show that a single binding motif for a key regulatory transcription factor is sufficient to open chromatin, and classify sets of stem cell-specific, endoderm-specific, and shared accessibility-modifying transcription factor motifs. We also show that overexpression of two definitive endoderm transcription factors, T and Foxa2, results in changes to accessibility in DNA sequences containing their respective DNA-binding motifs and identify preferential motif arrangements that influence accessibility.


Subject(s)
Chromatin/metabolism , Regulatory Sequences, Nucleic Acid , Transcription Factors/metabolism , Animals , Base Composition , DNA/chemistry , DNA/metabolism , Embryonic Stem Cells/metabolism , Endoderm/metabolism , Genomics/methods , Mice , Nucleotide Motifs , Oligonucleotides , Sequence Analysis, DNA
7.
Cell Rep ; 32(8): 108064, 2020 08 25.
Article in English | MEDLINE | ID: mdl-32846122

ABSTRACT

RNA-binding proteins (RBPs) play critical roles in regulating gene expression by modulating splicing, RNA stability, and protein translation. Stimulus-induced alterations in RBP function contribute to global changes in gene expression, but identifying which RBPs are responsible for the observed changes remains an unmet need. Here, we present Transite, a computational approach that systematically infers RBPs influencing gene expression through changes in RNA stability and degradation. As a proof of principle, we apply Transite to RNA expression data from human patients with non-small-cell lung cancer whose tumors were sampled at diagnosis or after recurrence following treatment with platinum-based chemotherapy. Transite implicates known RBP regulators of the DNA damage response and identifies hnRNPC as a new modulator of chemotherapeutic resistance, which we subsequently validated experimentally. Transite serves as a framework for the identification of RBPs that drive cell-state transitions and adds additional value to the vast collection of publicly available gene expression datasets.


Subject(s)
DNA Damage/genetics , Gene Expression/genetics , RNA-Binding Proteins/metabolism , Humans
8.
Nucleic Acids Res ; 48(6): e31, 2020 04 06.
Article in English | MEDLINE | ID: mdl-32009147

ABSTRACT

Chromatin interaction data from protocols such as ChIA-PET, HiChIP and Hi-C provide valuable insights into genome organization and gene regulation, but can include spurious interactions that do not reflect underlying genome biology. We introduce an extension of the Irreproducible Discovery Rate (IDR) method called IDR2D that identifies replicable interactions shared by chromatin interaction experiments. IDR2D provides a principled set of interactions and eliminates artifacts from single experiments. The method is available as a Bioconductor package for the R community, as well as an online service at https://idr2d.mit.edu.


Subject(s)
Genome , Genomics/methods , Chromatin/metabolism , Chromatin Immunoprecipitation , Chromosomes/genetics , Reproducibility of Results , Software
9.
Nucleic Acids Res ; 47(6): e35, 2019 04 08.
Article in English | MEDLINE | ID: mdl-30953075

ABSTRACT

Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) is a method for the genome-wide de novo discovery of chromatin interactions. Existing computational methods typically fail to detect weak or dynamic interactions because they use a peak-calling step that ignores paired-end linkage information. We have developed a novel computational method called Chromatin Interaction Discovery (CID) to overcome this limitation with an unbiased clustering approach for interaction discovery. CID outperforms existing chromatin interaction detection methods with improved sensitivity, replicate consistency, and concordance with other chromatin interaction datasets. In addition, CID also outperforms other methods in discovering chromatin interactions from HiChIP data. We expect that the CID method will be valuable in characterizing 3D chromatin interactions and in understanding the functional consequences of disease-associated distal genetic variations.


Subject(s)
Chromatin Immunoprecipitation/methods , Chromatin/chemistry , Chromatin/metabolism , Computational Biology/methods , Sequence Analysis, DNA/methods , Algorithms , DNA-Binding Proteins/analysis , DNA-Binding Proteins/metabolism , Datasets as Topic , Expressed Sequence Tags , Humans , Protein Binding
10.
Cancer Res ; 79(8): 1952-1966, 2019 04 15.
Article in English | MEDLINE | ID: mdl-30755444

ABSTRACT

Acidosis is a fundamental feature of the tumor microenvironment, which directly regulates tumor cell invasion by affecting immune cell function, clonal cell evolution, and drug resistance. Despite the important association of tumor microenvironment acidosis with tumor cell invasion, relatively little is known regarding which areas within a tumor are acidic and how acidosis influences gene expression to promote invasion. Here, we injected a labeled pH-responsive peptide to mark acidic regions within tumors. Surprisingly, acidic regions were not restricted to hypoxic areas and overlapped with highly proliferative, invasive regions at the tumor-stroma interface, which were marked by increased expression of matrix metalloproteinases and degradation of the basement membrane. RNA-seq analysis of cells exposed to low pH conditions revealed a general rewiring of the transcriptome that involved RNA splicing and enriched for targets of RNA binding proteins with specificity for AU-rich motifs. Alternative splicing of Mena and CD44, which play important isoform-specific roles in metastasis and drug resistance, respectively, was sensitive to histone acetylation status. Strikingly, this program of alternative splicing was reversed in vitro and in vivo through neutralization experiments that mitigated acidic conditions. These findings highlight a previously underappreciated role for localized acidification of tumor microenvironment in the expression of an alternative splicing-dependent tumor invasion program. SIGNIFICANCE: This study expands our understanding of acidosis within the tumor microenvironment and indicates that acidosis induces potentially therapeutically actionable changes to alternative splicing.


Subject(s)
Acids/adverse effects , Alternative Splicing , Biomarkers, Tumor/metabolism , Breast Neoplasms/pathology , Gene Expression Regulation, Neoplastic/drug effects , Transcriptome/drug effects , Tumor Microenvironment/drug effects , Animals , Apoptosis , Biomarkers, Tumor/genetics , Breast Neoplasms/chemically induced , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Cell Proliferation , Female , Humans , Hyaluronan Receptors/genetics , Hyaluronan Receptors/metabolism , Mice , Mice, Inbred BALB C , Mice, Nude , Microfilament Proteins/genetics , Microfilament Proteins/metabolism , Neoplasm Invasiveness , Neoplasm Metastasis , Tumor Cells, Cultured , Xenograft Model Antitumor Assays
11.
J Biomol Screen ; 20(8): 985-97, 2015 Sep.
Article in English | MEDLINE | ID: mdl-25918037

ABSTRACT

High-content screening (HCS) using RNA interference (RNAi) in combination with automated microscopy is a powerful investigative tool to explore complex biological processes. However, despite the plethora of data generated from these screens, little progress has been made in analyzing HC data using multivariate methods that exploit the full richness of multidimensional data. We developed a novel multivariate method for HCS, multivariate robust analysis method (M-RAM), integrating image feature selection with ranking of perturbations for hit identification, and applied this method to an HC RNAi screen to discover novel components of the DNA damage response in an osteosarcoma cell line. M-RAM automatically selects the most informative phenotypic readouts and time points to facilitate the more efficient design of follow-up experiments and enhance biological understanding. Our method outperforms univariate hit identification and identifies relevant genes that these approaches would have missed. We found that statistical cell-to-cell variation in phenotypic responses is an important predictor of hits in RNAi-directed image-based screens. Genes that we identified as modulators of DNA damage signaling in U2OS cells include B-Raf, a cancer driver gene in multiple tumor types, whose role in DNA damage signaling we confirm experimentally, and multiple subunits of protein kinase A.


Subject(s)
High-Throughput Screening Assays , Models, Biological , RNA Interference , RNA, Messenger/genetics , RNA, Small Interfering/genetics , Algorithms , Animals , Cell Line , Computer Simulation , DNA Damage , Gene Knockdown Techniques , Humans , Phenotype , Proto-Oncogene Proteins B-raf/genetics
12.
Cell Immunol ; 288(1-2): 31-8, 2014.
Article in English | MEDLINE | ID: mdl-24607567

ABSTRACT

Diversity of B and T cell receptors, achieved by gene recombination and somatic hypermutation, allows the immune system for recognition and targeted reaction against various threats. Next-generation sequencing for assessment of a cell's gene composition and variation makes deep analysis of one individual's immune spectrum feasible. An easy to apply but detailed analysis and visualization strategy is necessary to process all sequences generated. We performed sequencing utilizing the 454 system for CLL and control samples, utilized the IMGT database and applied the presented analysis tools. With the applied protocol, malignant clones are found and characterized, mutational status compared to germline identity is elaborated in detail showing that the CLL mutation status is not as monoclonal as generally thought. On the other hand, this strategy is not solely applicable to the 454 sequencing system but can easily be transferred to any other next-generation sequencing platform.


Subject(s)
Genome, Human , High-Throughput Nucleotide Sequencing/standards , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Receptors, Antigen, B-Cell/genetics , Receptors, Antigen, T-Cell/genetics , Base Sequence , Case-Control Studies , Clone Cells , Germ-Line Mutation , Humans , Leukemia, Lymphocytic, Chronic, B-Cell/immunology , Leukemia, Lymphocytic, Chronic, B-Cell/pathology , Molecular Sequence Data , Phylogeny , Receptors, Antigen, B-Cell/classification , Receptors, Antigen, B-Cell/immunology , Receptors, Antigen, T-Cell/classification , Receptors, Antigen, T-Cell/immunology , Sequence Alignment , Sequence Homology, Nucleic Acid
SELECTION OF CITATIONS
SEARCH DETAIL
...