Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 40
Filter
Add more filters










Publication year range
1.
Curr Issues Mol Biol ; 45(12): 9904-9916, 2023 Dec 09.
Article in English | MEDLINE | ID: mdl-38132464

ABSTRACT

Lipids are important modifiers of protein function, particularly as parts of lipoproteins, which transport lipophilic substances and mediate cellular uptake of circulating lipids. As such, lipids are of particular interest as blood biological markers for cardiovascular disease (CVD) as well as for conditions linked to CVD such as atherosclerosis, diabetes mellitus, obesity and dietary states. Notably, lipid research is particularly well developed in the context of CVD because of the relevance and multiple causes and risk factors of CVD. The advent of methods for high-throughput screening of biological molecules has recently resulted in the generation of lipidomic profiles that allow monitoring of lipid compositions in biological samples in an untargeted manner. These and other earlier advances in biomedical research have shaped the knowledge we have about lipids in CVD. To evaluate the knowledge acquired on the multiple biological functions of lipids in CVD and the trends in their research, we collected a dataset of references from the PubMed database of biomedical literature focused on plasma lipids and CVD in human and mouse. Using annotations from these records, we were able to categorize significant associations between lipids and particular types of research approaches, distinguish non-biological lipids used as markers, identify differential research between human and mouse models, and detect the increasingly mechanistic nature of the results in this field. Using known associations between lipids and proteins that metabolize or transport them, we constructed a comprehensive lipid-protein network, which we used to highlight proteins strongly connected to lipids found in the CVD-lipid literature. Our approach points to a series of proteins for which lipid-focused research would bring insights into CVD, including Prostaglandin G/H synthase 2 (PTGS2, a.k.a. COX2) and Acylglycerol kinase (AGK). In this review, we summarize our findings, putting them in a historical perspective of the evolution of lipid research in CVD.

2.
PLoS One ; 17(7): e0270043, 2022.
Article in English | MEDLINE | ID: mdl-35776722

ABSTRACT

MOTIVATION: Single-cell Chromatin ImmunoPrecipitation DNA-Sequencing (scChIP-seq) analysis is challenging due to data sparsity. High degree of sparsity in biological high-throughput single-cell data is generally handled with imputation methods that complete the data, but specific methods for scChIP-seq are lacking. We present SIMPA, a scChIP-seq data imputation method leveraging predictive information within bulk data from the ENCODE project to impute missing protein-DNA interacting regions of target histone marks or transcription factors. RESULTS: Imputations using machine learning models trained for each single cell, each ChIP protein target, and each genomic region accurately preserve cell type clustering and improve pathway-related gene identification on real human data. Results on bulk data simulating single cells show that the imputations are single-cell specific as the imputed profiles are closer to the simulated cell than to other cells related to the same ChIP protein target and the same cell type. Simulations also show that 100 input genomic regions are already enough to train single-cell specific models for the imputation of thousands of undetected regions. Furthermore, SIMPA enables the interpretation of machine learning models by revealing interaction sites of a given single cell that are most important for the imputation model trained for a specific genomic region. The corresponding feature importance values derived from promoter-interaction profiles of H3K4me3, an activating histone mark, highly correlate with co-expression of genes that are present within the cell-type specific pathways in 2 real human and mouse datasets. The SIMPA's interpretable imputation method allows users to gain a deep understanding of individual cells and, consequently, of sparse scChIP-seq datasets. AVAILABILITY AND IMPLEMENTATION: Our interpretable imputation algorithm was implemented in Python and is available at https://github.com/salbrec/SIMPA.


Subject(s)
Genomics , Machine Learning , Animals , Cluster Analysis , DNA , Mice , Sequence Analysis, DNA/methods
3.
BMC Bioinformatics ; 23(Suppl 6): 279, 2022 Jul 14.
Article in English | MEDLINE | ID: mdl-35836114

ABSTRACT

BACKGROUND: The constant evolving and development of next-generation sequencing techniques lead to high throughput data composed of datasets that include a large number of biological samples. Although a large number of samples are usually experimentally processed by batches, scientific publications are often elusive about this information, which can greatly impact the quality of the samples and confound further statistical analyzes. Because dedicated bioinformatics methods developed to detect unwanted sources of variance in the data can wrongly detect real biological signals, such methods could benefit from using a quality-aware approach. RESULTS: We recently developed statistical guidelines and a machine learning tool to automatically evaluate the quality of a next-generation-sequencing sample. We leveraged this quality assessment to detect and correct batch effects in 12 publicly available RNA-seq datasets with available batch information. We were able to distinguish batches by our quality score and used it to correct for some batch effects in sample clustering. Overall, the correction was evaluated as comparable to or better than the reference method that uses a priori knowledge of the batches (in 10 and 1 datasets of 12, respectively; total = 92%). When coupled to outlier removal, the correction was more often evaluated as better than the reference (comparable or better in 5 and 6 datasets of 12, respectively; total = 92%). CONCLUSIONS: In this work, we show the capabilities of our software to detect batches in public RNA-seq datasets from differences in the predicted quality of their samples. We also use these insights to correct the batch effect and observe the relation of sample quality and batch effect. These observations reinforce our expectation that while batch effects do correlate with differences in quality, batch effects also arise from other artifacts and are more suitably  corrected statistically in well-designed experiments.


Subject(s)
Algorithms , Software , Cluster Analysis , Machine Learning , RNA-Seq
4.
Genes (Basel) ; 13(5)2022 05 20.
Article in English | MEDLINE | ID: mdl-35627304

ABSTRACT

The gene family of insect olfactory receptors (ORs) has expanded greatly over the course of evolution. ORs enable insects to detect volatile chemicals and therefore play an important role in social interactions, enemy and prey recognition, and foraging. The sequences of several thousand ORs are known, but their specific function or their ligands have only been identified for very few of them. To advance the functional characterization of ORs, we have assembled, curated, and aligned the sequences of 3902 ORs from 21 insect species, which we provide as an annotated online resource. Using functionally characterized proteins from the fly Drosophila melanogaster, the mosquito Anopheles gambiae and the ant Harpegnathos saltator, we identified amino acid positions that best predict response to ligands. We examined the conservation of these predicted relevant residues in all OR subfamilies; the results showed that the subfamilies that expanded strongly in social insects had a high degree of conservation in their binding sites. This suggests that the ORs of social insect families are typically finely tuned and exhibit sensitivity to very similar odorants. Our novel approach provides a powerful tool to exploit functional information from a limited number of genes to study the functional evolution of large gene families.


Subject(s)
Receptors, Odorant , Animals , Drosophila melanogaster/metabolism , Insect Proteins/metabolism , Insecta/genetics , Insecta/metabolism , Ligands , Receptors, Odorant/genetics , Receptors, Odorant/metabolism
5.
Proc Natl Acad Sci U S A ; 118(38)2021 09 21.
Article in English | MEDLINE | ID: mdl-34526403

ABSTRACT

The spleen contains phenotypically and functionally distinct conventional dendritic cell (cDC) subpopulations, termed cDC1 and cDC2, which each can be divided into several smaller and less well-characterized subsets. Despite advances in understanding the complexity of cDC ontogeny by transcriptional programming, the significance of posttranslational modifications in controlling tissue-specific cDC subset immunobiology remains elusive. Here, we identified the cell-surface-expressed A-disintegrin-and-metalloproteinase 10 (ADAM10) as an essential regulator of cDC1 and cDC2 homeostasis in the splenic marginal zone (MZ). Mice with a CD11c-specific deletion of ADAM10 (ADAM10ΔCD11c) exhibited a complete loss of splenic ESAMhi cDC2A because ADAM10 regulated the commitment, differentiation, and survival of these cells. The major pathways controlled by ADAM10 in ESAMhi cDC2A are Notch, signaling pathways involved in cell proliferation and survival (e.g., mTOR, PI3K/AKT, and EIF2 signaling), and EBI2-mediated localization within the MZ. In addition, we discovered that ADAM10 is a molecular switch regulating cDC2 subset heterogeneity in the spleen, as the disappearance of ESAMhi cDC2A in ADAM10ΔCD11c mice was compensated for by the emergence of a Clec12a+ cDC2B subset closely resembling cDC2 generally found in peripheral lymph nodes. Moreover, in ADAM10ΔCD11c mice, terminal differentiation of cDC1 was abrogated, resulting in severely reduced splenic Langerin+ cDC1 numbers. Next to the disturbed splenic cDC compartment, ADAM10 deficiency on CD11c+ cells led to an increase in marginal metallophilic macrophage (MMM) numbers. In conclusion, our data identify ADAM10 as a molecular hub on both cDC and MMM regulating their transcriptional programming, turnover, homeostasis, and ability to shape the anatomical niche of the MZ.


Subject(s)
ADAM10 Protein/metabolism , Amyloid Precursor Protein Secretases/metabolism , Dendritic Cells/metabolism , Membrane Proteins/metabolism , ADAM10 Protein/physiology , Amyloid Precursor Protein Secretases/physiology , Animals , Antigen-Presenting Cells/metabolism , CD11c Antigen/metabolism , Cell Differentiation , Cell Proliferation , Female , Homeostasis , Lymphoid Tissue/metabolism , Macrophages/metabolism , Male , Membrane Proteins/physiology , Mice , Mice, Inbred C57BL , Myeloid Cells/metabolism , Phosphatidylinositol 3-Kinases/metabolism , Protein Processing, Post-Translational/genetics , Protein Processing, Post-Translational/physiology , Signal Transduction , Spleen/cytology , Spleen/metabolism
6.
Life Sci Alliance ; 4(11)2021 11.
Article in English | MEDLINE | ID: mdl-34462322

ABSTRACT

More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to know if quality features are relevant in all experimental conditions. Therefore, the NGS community would highly benefit from condition-specific data-driven guidelines derived from many publicly available experiments, which reflect routinely generated NGS data. In this work, we have characterized well-known quality guidelines and related features in big datasets and concluded that they are too limited for assessing the quality of a given NGS file accurately. Therefore, we present new data-driven guidelines derived from the statistical analysis of many public datasets using quality features calculated by common bioinformatics tools. Thanks to this approach, we confirm the high relevance of genome mapping statistics to assess the quality of the data, and we demonstrate the limited scope of some quality features that are not relevant in all conditions. Our guidelines are available at https://cbdm.uni-mainz.de/ngs-guidelines.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, DNA/methods , Computational Biology/methods , Genome, Human , Humans , Quality Control , Sequence Analysis, DNA/statistics & numerical data , Software
7.
Bioinformatics ; 37(21): 3981-3982, 2021 11 05.
Article in English | MEDLINE | ID: mdl-34358314

ABSTRACT

SUMMARY: Lipids exhibit an essential role in cellular assembly and signaling. Dysregulation of these functions has been linked with many complications including obesity, diabetes, metabolic disorders, cancer and more. Investigating lipid profiles in such conditions can provide insights into cellular functions and possible interventions. Hence the field of lipidomics is expanding in recent years. Even though the role of individual lipids in diseases has been investigated, there is no resource to perform disease enrichment analysis considering the cumulative association of a lipid set. To address this, we have implemented the LipiDisease web server. The tool analyzes millions of records from the PubMed biomedical literature database discussing lipids and diseases, predicts their association and ranks them according to false discovery rates generated by random simulations. The tool takes into account 4270 diseases and 4798 lipids. Since the tool extracts the information from PubMed records, the number of diseases and lipids will be expanded over time as the biomedical literature grows. AVAILABILITY AND IMPLEMENTATION: The LipiDisease webserver can be freely accessed at http://cbdm-01.zdv.uni-mainz.de:3838/piyusmor/LipiDisease/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Lipids , Software , PubMed , Databases, Factual , Lipids/analysis , Data Mining
8.
Cells ; 10(4)2021 03 29.
Article in English | MEDLINE | ID: mdl-33805436

ABSTRACT

Long intergenic non-coding RNAs (LincRNAs) are long RNAs that do not encode proteins. Functional evidence is lacking for most of them. Their biogenesis is not well-known, but it is thought that many lincRNAs originate from genomic duplication of coding material, resulting in pseudogenes, gene copies that lose their original function and can accumulate mutations. While most pseudogenes eventually stop producing a transcript and become erased by mutations, many of these pseudogene-based lincRNAs keep similarity to the parental gene from which they originated, possibly for functional reasons. For example, they can act as decoys for miRNAs targeting the parental gene. Enrichment analysis of function is a powerful tool to discover the functional effects of a treatment producing differential expression of transcripts. However, in the case of lincRNAs, since their function is not easy to define experimentally, such a tool is lacking. To address this problem, we have developed an enrichment analysis tool that focuses on lincRNAs exploiting their functional association, using as a proxy function that of the parental genes and has a focus on human diseases.


Subject(s)
Disease/genetics , Gene Expression Profiling , RNA, Long Noncoding/genetics , Breast Neoplasms/genetics , Female , Gene Expression Regulation, Neoplastic , Humans , Internet , Kaplan-Meier Estimate , Prognosis , RNA, Long Noncoding/metabolism , User-Computer Interface
9.
Genome Biol ; 22(1): 75, 2021 03 05.
Article in English | MEDLINE | ID: mdl-33673854

ABSTRACT

Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer .


Subject(s)
Computational Biology/methods , High-Throughput Nucleotide Sequencing , Machine Learning , Quality Control , Software , Algorithms , Computational Biology/standards , Databases, Genetic , Genomics/methods , Genomics/standards , High-Throughput Nucleotide Sequencing/methods , ROC Curve , Reproducibility of Results , Workflow
10.
Front Neurol ; 11: 573560, 2020.
Article in English | MEDLINE | ID: mdl-33329316

ABSTRACT

Huntington's disease (HD) is an autosomal dominantly inherited neurodegenerative disorder caused by a trinucleotide repeat expansion in the Huntingtin gene. As disease-modifying therapies for HD are being developed, peripheral blood cells may be used to indicate disease progression and to monitor treatment response. In order to investigate whether gene expression changes can be found in the blood of individuals with HD that distinguish them from healthy controls, we performed transcriptome analysis by next-generation sequencing (RNA-seq). We detected a gene expression signature consistent with dysregulation of immune-related functions and inflammatory response in peripheral blood from HD cases vs. controls, including induction of the interferon response genes, IFITM3, IFI6 and IRF7. Our results suggest that it is possible to detect gene expression changes in blood samples from individuals with HD, which may reflect the immune pathology associated with the disease.

11.
Cell Rep ; 32(7): 108050, 2020 08 18.
Article in English | MEDLINE | ID: mdl-32814053

ABSTRACT

Interactome maps are valuable resources to elucidate protein function and disease mechanisms. Here, we report on an interactome map that focuses on neurodegenerative disease (ND), connects ∼5,000 human proteins via ∼30,000 candidate interactions and is generated by systematic yeast two-hybrid interaction screening of ∼500 ND-related proteins and integration of literature interactions. This network reveals interconnectivity across diseases and links many known ND-causing proteins, such as α-synuclein, TDP-43, and ATXN1, to a host of proteins previously unrelated to NDs. It facilitates the identification of interacting proteins that significantly influence mutant TDP-43 and HTT toxicity in transgenic flies, as well as of ARF-GEP100 that controls misfolding and aggregation of multiple ND-causing proteins in experimental model systems. Furthermore, it enables the prediction of ND-specific subnetworks and the identification of proteins, such as ATXN1 and MKL1, that are abnormally aggregated in postmortem brains of Alzheimer's disease patients, suggesting widespread protein aggregation in NDs.


Subject(s)
Brain Mapping/methods , Brain/physiopathology , Neurodegenerative Diseases/genetics , Protein Aggregates/genetics , Protein Interaction Mapping/methods , Humans
12.
Nucleic Acids Res ; 48(9): e53, 2020 05 21.
Article in English | MEDLINE | ID: mdl-32187374

ABSTRACT

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is used to identify genome-wide DNA regions bound by proteins. Given one ChIP-seq experiment with replicates, binding sites not observed in all the replicates will usually be interpreted as noise and discarded. However, the recent discovery of high-occupancy target (HOT) regions suggests that there are regions where binding of multiple transcription factors can be identified. To investigate ChIP-seq variability, we developed a reproducibility score and a method that identifies cell-specific variable regions in ChIP-seq data by integrating replicated ChIP-seq experiments for multiple protein targets on a particular cell type. Using our method, we found variable regions in human cell lines K562, GM12878, HepG2, MCF-7 and in mouse embryonic stem cells (mESCs). These variable-occupancy target regions (VOTs) are CG dinucleotide rich, and show enrichment at promoters and R-loops. They overlap significantly with HOT regions, but are not blacklisted regions producing non-specific binding ChIP-seq peaks. Furthermore, in mESCs, VOTs are conserved among placental species suggesting that they could have a function important for this taxon. Our method can be useful to point to such regions along the genome in a given cell type of interest, to improve the downstream interpretative analysis before follow-up experiments.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Transcription Factors/metabolism , Animals , Binding Sites , Cell Line , Cell Line, Tumor , Chromatin/metabolism , Embryonic Stem Cells/metabolism , Evolution, Molecular , Genetic Variation , Genomics/methods , Humans , K562 Cells , MCF-7 Cells , Mice , Nucleotides/analysis , Principal Component Analysis , Promoter Regions, Genetic , R-Loop Structures
13.
Life Sci Alliance ; 2(4)2019 08.
Article in English | MEDLINE | ID: mdl-31331983

ABSTRACT

Chromatin immunoprecipitation (ChIP) followed by next generation sequencing (ChIP-Seq) is a powerful technique to study transcriptional regulation. However, the requirement of millions of cells to generate results with high signal-to-noise ratio precludes it in the study of small cell populations. Here, we present a tagmentation-assisted fragmentation ChIP (TAF-ChIP) and sequencing method to generate high-quality histone profiles from low cell numbers. The data obtained from the TAF-ChIP approach are amenable to standard tools for ChIP-Seq analysis, owing to its high signal-to-noise ratio. The epigenetic profiles from TAF-ChIP approach showed high agreement with conventional ChIP-Seq datasets, thereby underlining the utility of this approach.


Subject(s)
Chromatin Immunoprecipitation Sequencing/methods , Drosophila/genetics , Histones/metabolism , Animals , Epigenesis, Genetic , High-Throughput Nucleotide Sequencing , Humans , K562 Cells , Signal-To-Noise Ratio , Software , Whole Genome Sequencing
14.
PLoS One ; 14(1): e0210467, 2019.
Article in English | MEDLINE | ID: mdl-30640953

ABSTRACT

The study of drug toxicity in human organs is complicated by their complex inter-relations and by the obvious difficulty to testing drug effects on biologically relevant material. Animal models and human cell cultures offer alternatives for systematic and large-scale profiling of drug effects on gene expression level, as typically found in the so-called toxicogenomics datasets. However, the complexity of these data, which includes variable drug doses, time points, and experimental setups, makes it difficult to choose and integrate the data, and to evaluate the appropriateness of one or another model system to study drug toxicity (of particular drugs) of particular human organs. Here, we define a protocol to integrate drug-wise rankings of gene expression changes in toxicogenomics data, which we apply to the TG-GATEs dataset, to prioritize genes for association to drug toxicity in liver or kidney. Contrast of the results with sets of known human genes associated to drug toxicity in the literature allows to compare different rank aggregation approaches for the task at hand. Collectively, ranks from multiple models point to genes not previously associated to toxicity, notably, the PCNA clamp associated factor (PCLAF), and genes regulated by the master regulator of the antioxidant response NFE2L2, such as NQO1 and SRXN1. In addition, comparing gene ranks from different models allowed us to evaluate striking differences in terms of toxicity-associated genes between human and rat hepatocytes or between rat liver and rat hepatocytes. We interpret these results to point to the different molecular functions associated to organ toxicity that are best described by each model. We conclude that the expected production of toxicogenomics panels with larger numbers of drugs and models, in combination with the ongoing increase of the experimental literature in organ toxicity, will lead to increasingly better associations of genes for organism toxicity.


Subject(s)
Databases, Genetic , Gene Expression Regulation , Organ Specificity/genetics , Publications , Toxicogenetics , Animals , Gene Expression Profiling , Humans , ROC Curve , Rats
15.
Methods ; 132: 57-65, 2018 01 01.
Article in English | MEDLINE | ID: mdl-28716510

ABSTRACT

Toxicity affecting humans is studied by observing the effects of chemical substances in animal organisms (in vivo) or in animal and human cultivated cell lines (in vitro). Toxicogenomics studies collect gene expression profiles and histopathology assessment data for hundreds of drugs and pollutants in standardized experimental designs using different model systems. These data are an invaluable source for analyzing genome-wide drug response in biological systems. However, a problem remains that is how to evaluate the suitability of heterogeneous in vitro and in vivo systems to model the many different aspects of human toxicity. We propose here that a given model system (cell type or animal organ) is supported to appropriately describe a particular aspect of human toxicity if the set of compounds associated in the literature with that aspect of toxicity causes a change in expression of genes with a particular function in the tested model system. This approach provides candidate genes to explain the toxicity effect (the differentially expressed genes) and the compounds whose effect could be modeled (the ones producing both the change of expression in the model system and that are associated with the human phenotype in the literature). Here we present an application of this approach using a computational pipeline that integrates compound-induced gene expression profiles (from the Open TG-GATEs database) and biomedical literature annotations (from the PubMed database) to evaluate the suitability of (human and rat) in vitro systems as well as rat in vivo systems to model human toxicity.


Subject(s)
Drug Evaluation, Preclinical/methods , Animals , Cells, Cultured , Hepatocytes/drug effects , Hepatocytes/physiology , Humans , Rats , Toxicogenetics , Transcriptome
16.
Cell Syst ; 5(2): 128-139.e4, 2017 08 23.
Article in English | MEDLINE | ID: mdl-28837810

ABSTRACT

Systematic assessment of tyrosine kinase-substrate relationships is fundamental to a better understanding of cellular signaling and its profound alterations in human diseases such as cancer. In human cells, such assessments are confounded by complex signaling networks, feedback loops, conditional activity, and intra-kinase redundancy. Here we address this challenge by exploiting the yeast proteome as an in vivo model substrate. We individually expressed 16 human non-receptor tyrosine kinases (NRTKs) in Saccharomyces cerevisiae and identified 3,279 kinase-substrate relationships involving 1,351 yeast phosphotyrosine (pY) sites. Based on the yeast data without prior information, we generated a set of linear kinase motifs and assigned ∼1,300 known human pY sites to specific NRTKs. Furthermore, experimentally defined pY sites for each individual kinase were shown to cluster within the yeast interactome network irrespective of linear motif information. We therefore applied a network inference approach to predict kinase-substrate relationships for more than 3,500 human proteins, providing a resource to advance our understanding of kinase biology.


Subject(s)
Protein Interaction Maps , Protein-Tyrosine Kinases/metabolism , Saccharomyces cerevisiae/genetics , Amino Acid Motifs , Humans , Phosphorylation , Protein-Tyrosine Kinases/chemistry , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism , Sequence Alignment
17.
Genome Med ; 8(1): 28, 2016 Mar 17.
Article in English | MEDLINE | ID: mdl-26988706

ABSTRACT

BACKGROUND: NF-κB is widely involved in lymphoid malignancies; however, the functional roles and specific transcriptomes of NF-κB dimers with distinct subunit compositions have been unclear. METHODS: Using combined ChIP-sequencing and microarray analyses, we determined the cistromes and target gene signatures of canonical and non-canonical NF-κB species in Hodgkin lymphoma (HL) cells. RESULTS: We found that the various NF-κB subunits are recruited to regions with redundant κB motifs in a large number of genes. Yet canonical and non-canonical NF-κB dimers up- and downregulate gene sets that are both distinct and overlapping, and are associated with diverse biological functions. p50 and p52 are formed through NIK-dependent p105 and p100 precursor processing in HL cells and are the predominant DNA binding subunits. Logistic regression analyses of combinations of the p50, p52, RelA, and RelB subunits in binding regions that have been assigned to genes they regulate reveal a cross-contribution of p52 and p50 to canonical and non-canonical transcriptomes. These analyses also indicate that the subunit occupancy pattern of NF-κB binding regions and their distance from the genes they regulate are determinants of gene activation versus repression. The pathway-specific signatures of activated and repressed genes distinguish HL from other NF-κB-associated lymphoid malignancies and inversely correlate with gene expression patterns in normal germinal center B cells, which are presumed to be the precursors of HL cells. CONCLUSIONS: We provide insights that are relevant for lymphomas with constitutive NF-κB activation and generally for the decoding of the mechanisms of differential gene regulation through canonical and non-canonical NF-κB signaling.


Subject(s)
Genome-Wide Association Study , Hodgkin Disease/genetics , Hodgkin Disease/metabolism , NF-kappa B/genetics , NF-kappa B/metabolism , Binding Sites , Cell Line, Tumor , Cell Survival , Chromatin Immunoprecipitation , Computational Biology/methods , Databases, Nucleic Acid , Gene Expression Regulation, Neoplastic , High-Throughput Nucleotide Sequencing , Humans , I-kappa B Kinase/genetics , I-kappa B Kinase/metabolism , NF-kappa B p50 Subunit/genetics , NF-kappa B p50 Subunit/metabolism , NF-kappa B p52 Subunit/genetics , NF-kappa B p52 Subunit/metabolism , Nucleotide Motifs , Protein Binding , Protein Multimerization , Signal Transduction , Transcription Factor RelA/genetics , Transcription Factor RelA/metabolism , Transcription Factor RelB/genetics , Transcription Factor RelB/metabolism , Transcriptional Activation
18.
Methods ; 74: 90-6, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25484337

ABSTRACT

Clinical evaluation of patients and diagnosis of disorder is crucial to make decisions on appropriate therapies. In addition, in the case of genetic disorders resulting from gene abnormalities, phenotypic effects may guide basic research on the mechanisms of a disorder to find the mutated gene and therefore to propose novel targets for drug therapy. However, this approach is complicated by two facts. First, the relationship between genes and disorders is not simple: one gene may be related to multiple disorders and a disorder may be caused by mutations in different genes. Second, recognizing relevant phenotypes might be difficult for clinicians working with patients of closely related complex disorders. Neuropsychiatric disorders best illustrate these difficulties since phenotypes range from metabolic to behavioral aspects, the latter extremely complex. Based on our clinical expertise on five neurodegenerative disorders, and from the wealth of bibliographical data on neuropsychiatric disorders, we have built a resource to infer associations between genes, chemicals, phenotypes for a total of 31 disorders. An initial step of automated text mining of the literature related to 31 disorders returned thousands of enriched terms. Fewer relevant phenotypic terms were manually selected by clinicians as relevant to the five neural disorders of their expertise and used to analyze the complete set of disorders. Analysis of the data indicates general relationships between neuropsychiatric disorders, which can be used to classify and characterize them. Correlation analyses allowed us to propose novel associations of genes and drugs with disorders. More generally, the results led us to uncovering mechanisms of disease that span multiple neuropsychiatric disorders, for example that genes related to synaptic transmission and receptor functions tend to be involved in many disorders, whereas genes related to sensory perception and channel transport functions are associated with fewer disorders. Our study shows that starting from expertise covering a limited set of neurological disorders and using text and data mining methods, meaningful and novel associations regarding genes, chemicals and phenotypes can be derived for an expanded set of neuropsychiatric disorders. Our results are intended for clinicians to help them evaluate patients, and for basic scientists to propose new gene targets for drug therapies. This strategy can be extended to virtually all diseases and takes advantage of the ever increasing amount of biomedical literature.


Subject(s)
Data Mining/methods , Databases, Genetic , Gene Regulatory Networks/genetics , Mental Disorders/genetics , Phenotype , Databases, Genetic/standards , Humans
19.
Nucleic Acids Res ; 42(Database issue): D950-8, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24304896

ABSTRACT

CellFinder (http://www.cellfinder.org) is a comprehensive one-stop resource for molecular data characterizing mammalian cells in different tissues and in different development stages. It is built from carefully selected data sets stemming from other curated databases and the biomedical literature. To date, CellFinder describes 3394 cell types and 50 951 cell lines. The database currently contains 3055 microscopic and anatomical images, 205 whole-genome expression profiles of 194 cell/tissue types from RNA-seq and microarrays and 553 905 protein expressions for 535 cells/tissues. Text mining of a corpus of >2000 publications followed by manual curation confirmed expression information on ∼900 proteins and genes. CellFinder's data model is capable to seamlessly represent entities from single cells to the organ level, to incorporate mappings between homologous entities in different species and to describe processes of cell development and differentiation. Its ontological backbone currently consists of 204 741 ontology terms incorporated from 10 different ontologies unified under the novel CELDA ontology. CellFinder's web portal allows searching, browsing and comparing the stored data, interactive construction of developmental trees and navigating the partonomic hierarchy of cells and tissues through a unique body browser designed for life scientists and clinicians.


Subject(s)
Cells/metabolism , Databases, Factual , Animals , Cell Line , Cell Physiological Phenomena , Cells/cytology , Cellular Structures/ultrastructure , Data Mining , Gene Expression Profiling , Humans , Internet , Kidney/cytology , Liver/cytology , Proteins/metabolism , RNA/metabolism
20.
Genes Dev ; 27(17): 1932-46, 2013 Sep 01.
Article in English | MEDLINE | ID: mdl-24013505

ABSTRACT

Understanding how distinct cell types arise from multipotent progenitor cells is a major quest in stem cell biology. The liver and pancreas share many aspects of their early development and possibly originate from a common progenitor. However, how liver and pancreas cells diverge from a common endoderm progenitor population and adopt specific fates remains elusive. Using RNA sequencing (RNA-seq), we defined the molecular identity of liver and pancreas progenitors that were isolated from the mouse embryo at two time points, spanning the period when the lineage decision is made. The integration of temporal and spatial gene expression profiles unveiled mutually exclusive signaling signatures in hepatic and pancreatic progenitors. Importantly, we identified the noncanonical Wnt pathway as a potential developmental regulator of this fate decision and capable of inducing the pancreas program in endoderm and liver cells. Our study offers an unprecedented view of gene expression programs in liver and pancreas progenitors and forms the basis for formulating lineage-reprogramming strategies to convert adult hepatic cells into pancreatic cells.


Subject(s)
Cell Differentiation , Gene Expression Regulation, Developmental , Liver , Pancreas , Signal Transduction , Stem Cells/cytology , Animals , Cell Line , Cell Lineage , Endoderm/cytology , Gene Expression Profiling , Liver/cytology , Liver/embryology , Mice , Pancreas/cytology , Pancreas/embryology , Sequence Analysis, RNA , Time Factors , Wnt Proteins/genetics , Wnt Proteins/metabolism , Xenopus/embryology
SELECTION OF CITATIONS
SEARCH DETAIL
...