Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 40(Supplement_1): i471-i480, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940142

ABSTRACT

MOTIVATION: High-resolution Hi-C contact matrices reveal the detailed three-dimensional architecture of the genome, but high-coverage experimental Hi-C data are expensive to generate. Simultaneously, chromatin structure analyses struggle with extremely sparse contact matrices. To address this problem, computational methods to enhance low-coverage contact matrices have been developed, but existing methods are largely based on resolution enhancement methods for natural images and hence often employ models that do not distinguish between biologically meaningful contacts, such as loops and other stochastic contacts. RESULTS: We present Capricorn, a machine learning model for Hi-C resolution enhancement that incorporates small-scale chromatin features as additional views of the input Hi-C contact matrix and leverages a diffusion probability model backbone to generate a high-coverage matrix. We show that Capricorn outperforms the state of the art in a cross-cell-line setting, improving on existing methods by 17% in mean squared error and 26% in F1 score for chromatin loop identification from the generated high-coverage data. We also demonstrate that Capricorn performs well in the cross-chromosome setting and cross-chromosome, cross-cell-line setting, improving the downstream loop F1 score by 14% relative to existing methods. We further show that our multiview idea can also be used to improve several existing methods, HiCARN and HiCNN, indicating the wide applicability of this approach. Finally, we use DNA sequence to validate discovered loops and find that the fraction of CTCF-supported loops from Capricorn is similar to those identified from the high-coverage data. Capricorn is a powerful Hi-C resolution enhancement method that enables scientists to find chromatin features that cannot be identified in the low-coverage contact matrix. AVAILABILITY AND IMPLEMENTATION: Implementation of Capricorn and source code for reproducing all figures in this paper are available at https://github.com/CHNFTQ/Capricorn.


Subject(s)
Chromatin , Machine Learning , Chromatin/chemistry , Chromatin/metabolism , Humans , Computational Biology/methods , Algorithms , Software
2.
Nat Commun ; 15(1): 1365, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38355719

ABSTRACT

Ribonucleoprotein complexes are composed of RNA, RNA-dependent proteins (RDPs) and RNA-binding proteins (RBPs), and play fundamental roles in RNA regulation. However, in the human malaria parasite, Plasmodium falciparum, identification and characterization of these proteins are particularly limited. In this study, we use an unbiased proteome-wide approach, called R-DeeP, a method based on sucrose density gradient ultracentrifugation, to identify RDPs. Quantitative analysis by mass spectrometry identifies 898 RDPs, including 545 proteins not yet associated with RNA. Results are further validated using a combination of computational and molecular approaches. Overall, this method provides the first snapshot of the Plasmodium protein-protein interaction network in the presence and absence of RNA. R-DeeP also helps to reconstruct Plasmodium multiprotein complexes based on co-segregation and deciphers their RNA-dependence. One RDP candidate, PF3D7_0823200, is functionally characterized and validated as a true RBP. Using enhanced crosslinking and immunoprecipitation followed by high-throughput sequencing (eCLIP-seq), we demonstrate that this protein interacts with various Plasmodium non-coding transcripts, including the var genes and ap2 transcription factors.


Subject(s)
Plasmodium , RNA , Humans , RNA/metabolism , Plasmodium falciparum/genetics , Plasmodium falciparum/metabolism , Proteome/metabolism , RNA-Binding Proteins/metabolism , Plasmodium/genetics
3.
bioRxiv ; 2023 Sep 22.
Article in English | MEDLINE | ID: mdl-37790381

ABSTRACT

Most studies of genome organization have focused on intra-chromosomal (cis) contacts because they harbor key features such as DNA loops and topologically associating domains. Inter-chromosomal (trans) contacts have received much less attention, and tools for interrogating potential biologically relevant trans structures are lacking. Here, we develop a computational framework to identify sets of loci that jointly interact in trans from Hi-C data. This method, trans-C, initiates probabilistic random walks with restarts from a set of seed loci to traverse an input Hi-C contact network, thereby identifying sets of trans-contacting loci. We validate trans-C in three increasingly complex models of established trans contacts: the Plasmodium falciparum var genes, the mouse olfactory receptor "Greek islands", and the human RBM20 cardiac splicing factory. We then apply trans-C to systematically test the hypothesis that genes co-regulated by the same trans-acting element (i.e., a transcription or splicing factor) co-localize in three dimensions to form "RNA factories" that maximize the efficiency and accuracy of RNA biogenesis. We find that many loci with multiple binding sites of the same transcription factor interact with one another in trans, especially those bound by transcription factors with intrinsically disordered domains. Similarly, clustered binding of a subset of RNA binding proteins correlates with trans interaction of the encoding loci. These findings support the existence of trans interacting chromatin domains (TIDs) driven by RNA biogenesis. Trans-C provides an efficient computational framework for studying these and other types of trans interactions, empowering studies of a poorly understood aspect of genome architecture.

4.
Nat Commun ; 14(1): 5086, 2023 08 22.
Article in English | MEDLINE | ID: mdl-37607941

ABSTRACT

The complex life cycle of Plasmodium falciparum requires coordinated gene expression regulation to allow host cell invasion, transmission, and immune evasion. Increasing evidence now suggests a major role for epigenetic mechanisms in gene expression in the parasite. In eukaryotes, many lncRNAs have been identified to be pivotal regulators of genome structure and gene expression. To investigate the regulatory roles of lncRNAs in P. falciparum we explore the intergenic lncRNA distribution in nuclear and cytoplasmic subcellular locations. Using nascent RNA expression profiles, we identify a total of 1768 lncRNAs, of which 718 (~41%) are novels in P. falciparum. The subcellular localization and stage-specific expression of several putative lncRNAs are validated using RNA-FISH. Additionally, the genome-wide occupancy of several candidate nuclear lncRNAs is explored using ChIRP. The results reveal that lncRNA occupancy sites are focal and sequence-specific with a particular enrichment for several parasite-specific gene families, including those involved in pathogenesis and sexual differentiation. Genomic and phenotypic analysis of one specific lncRNA demonstrate its importance in sexual differentiation and reproduction. Our findings bring a new level of insight into the role of lncRNAs in pathogenicity, gene regulation and sexual differentiation, opening new avenues for targeted therapeutic strategies against the deadly malaria parasite.


Subject(s)
Malaria, Falciparum , Malaria , Parasites , RNA, Long Noncoding , Humans , Animals , Plasmodium falciparum/genetics , RNA, Long Noncoding/genetics , Malaria, Falciparum/genetics
5.
Bioinformatics ; 38(Suppl_2): ii148-ii154, 2022 09 16.
Article in English | MEDLINE | ID: mdl-36124797

ABSTRACT

MOTIVATION: A wide variety of experimental methods are available to characterize different properties of single cells in a complex biosample. However, because these measurement techniques are typically destructive, researchers are often presented with complementary measurements from disjoint subsets of cells, providing a fragmented view of the cell's biological processes. This creates a need for computational tools capable of integrating disjoint multi-omics data. Because different measurements typically do not share any features, the problem requires the integration to be done in unsupervised fashion. Recently, several methods have been proposed that project the cell measurements into a common latent space and attempt to align the corresponding low-dimensional manifolds. RESULTS: In this study, we present an approach, Synmatch, which produces a direct matching of the cells between modalities by exploiting information about neighborhood structure in each modality. Synmatch relies on the intuition that cells which are close in one measurement space should be close in the other as well. This allows us to formulate the matching problem as a constrained supermodular optimization problem over neighborhood structures that can be solved efficiently. We show that our approach successfully matches cells in small real multi-omics datasets and performs favorably when compared with recently published state-of-the-art methods. Further, we demonstrate that Synmatch is capable of scaling to large datasets of thousands of cells. AVAILABILITY AND IMPLEMENTATION: The Synmatch code and data used in this manuscript are available at https://github.com/Noble-Lab/synmatch.


Subject(s)
Cells
6.
Cell Syst ; 10(6): 470-479.e3, 2020 06 24.
Article in English | MEDLINE | ID: mdl-32684276

ABSTRACT

Protein interaction networks provide a powerful framework for identifying genes causal for complex genetic diseases. Here, we introduce a general framework, uKIN, that uses prior knowledge of disease-associated genes to guide, within known protein-protein interaction networks, random walks that are initiated from newly identified candidate genes. In large-scale testing across 24 cancer types, we demonstrate that our network propagation approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. We also apply our approach to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes. uKIN is freely available for download at: https://github.com/Singh-Lab/uKIN.


Subject(s)
Gene Regulatory Networks/genetics , Protein Interaction Maps/genetics , Humans
7.
Nucleic Acids Res ; 48(5): 2303-2311, 2020 03 18.
Article in English | MEDLINE | ID: mdl-32034421

ABSTRACT

Chromatin conformation assays such as Hi-C cannot directly measure differences in 3D architecture between cell types or cell states. For this purpose, two or more Hi-C experiments must be carried out, but direct comparison of the resulting Hi-C matrices is confounded by several features of Hi-C data. Most notably, the genomic distance effect, whereby contacts between pairs of genomic loci that are proximal along the chromosome exhibit many more Hi-C contacts that distal pairs of loci, dominates every Hi-C matrix. Furthermore, the form that this distance effect takes often varies between different Hi-C experiments, even between replicate experiments. Thus, a statistical confidence measure designed to identify differential Hi-C contacts must accurately account for the genomic distance effect or risk being misled by large-scale but artifactual differences. ACCOST (Altered Chromatin COnformation STatistics) accomplishes this goal by extending the statistical model employed by DEseq, re-purposing the 'size factors,' which were originally developed to account for differences in read depth between samples, to instead model the genomic distance effect. We show via analysis of simulated and real data that ACCOST provides unbiased statistical confidence estimates that compare favorably with competing methods such as diffHiC, FIND and HiCcompare. ACCOST is freely available with an Apache license at https://bitbucket.org/noblelab/accost.


Subject(s)
Chromatin/chemistry , DNA/chemistry , Genetic Loci , Genome , Software , Animals , Cell Line , Chromatin/metabolism , DNA/metabolism , Epistasis, Genetic , Epithelial Cells/cytology , Epithelial Cells/metabolism , Humans , Lymphocytes/cytology , Lymphocytes/metabolism , Mice , Molecular Conformation , Plasmodium falciparum/genetics , Sporozoites/genetics , Trophozoites/genetics
8.
Cell Syst ; 5(3): 221-229.e4, 2017 09 27.
Article in English | MEDLINE | ID: mdl-28957656

ABSTRACT

A central goal in cancer genomics is to identify the somatic alterations that underpin tumor initiation and progression. While commonly mutated cancer genes are readily identifiable, those that are rarely mutated across samples are difficult to distinguish from the large numbers of other infrequently mutated genes. We introduce a method, nCOP, that considers per-individual mutational profiles within the context of protein-protein interaction networks in order to identify small connected subnetworks of genes that, while not individually frequently mutated, comprise pathways that are altered across (i.e., "cover") a large fraction of individuals. By analyzing 6,038 samples across 24 different cancer types, we demonstrate that nCOP is highly effective in identifying cancer genes, including those with low mutation frequencies. Overall, our work demonstrates that combining per-individual mutational information with interaction networks is a powerful approach for tackling the mutational heterogeneity observed across cancers.


Subject(s)
Computational Biology/methods , Gene Regulatory Networks/genetics , Protein Interaction Maps/genetics , Algorithms , Computer Simulation , Disease Progression , Genomics/methods , Humans , Mutation/genetics , Mutation Rate , Neoplasms/genetics , Oncogenes/genetics
9.
Bioinformatics ; 26(18): i446-52, 2010 Sep 15.
Article in English | MEDLINE | ID: mdl-20823306

ABSTRACT

MOTIVATION: Segmental duplications > 1 kb in length with >or= 90% sequence identity between copies comprise nearly 5% of the human genome. They are frequently found in large, contiguous regions known as duplication blocks that can contain mosaic patterns of thousands of segmental duplications. Reconstructing the evolutionary history of these complex genomic regions is a non-trivial, but important task. RESULTS: We introduce parsimony and likelihood techniques to analyze the evolutionary relationships between duplication blocks. Both techniques rely on a generic model of duplication in which long, contiguous substrings are copied and reinserted over large physical distances, allowing for a duplication block to be constructed by aggregating substrings of other blocks. For the likelihood method, we give an efficient dynamic programming algorithm to compute the weighted ensemble of all duplication scenarios that account for the construction of a duplication block. Using this ensemble, we derive the probabilities of various duplication scenarios. We formalize the task of reconstructing the evolutionary history of segmental duplications as an optimization problem on the space of directed acyclic graphs. We use a simulated annealing heuristic to solve the problem for a set of segmental duplications in the human genome in both parsimony and likelihood settings. AVAILABILITY: Supplementary information is available at http://www.cs.brown.edu/people/braphael/supplements/.


Subject(s)
Evolution, Molecular , Segmental Duplications, Genomic , Algorithms , Genome, Human , Humans , Models, Genetic , Models, Statistical , Probability
10.
J Natl Compr Canc Netw ; 5(4): 456-66, 2007 Apr.
Article in English | MEDLINE | ID: mdl-17442236

ABSTRACT

Radiotherapy is integral in the multidisciplinary approach to patients with musculoskeletal neoplasms. Multiple studies have established a role for radiotherapy as a definitive local treatment of unresectable lesions or when surgery might yield unacceptable functional outcomes, such as in Ewing's tumor or base of skull chondrosarcoma. Radiotherapy is also used as an adjuvant treatment after surgery with close or positive margins. In the metastatic setting, external beam radiotherapy and bone-seeking intravenous radioisotopes are used on a case-by-case basis for palliation. As radiotherapy and its delivery techniques have evolved, so has its role in treating tumors such as Ewing's sarcoma, chordoma and chondrosarcoma, osteosarcoma, primary lymphoma of bone, malignant fibrous histiocytoma of bone, and vascular tumors. Radiation can also be successfully used to treat unresectable or recurrent benign tumors, such as giant cell tumor and aneurysmal bone cyst. This article reviews the indications for radiotherapy for various bone tumors and summarizes some of the important data supporting its use.


Subject(s)
Bone Neoplasms/radiotherapy , Osteosarcoma/radiotherapy , Sarcoma, Ewing/radiotherapy , Combined Modality Therapy , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...