Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Brief Bioinform ; 23(6)2022 11 19.
Article in English | MEDLINE | ID: mdl-36208175

ABSTRACT

Cell-type composition of intact bulk tissues can vary across samples. Deciphering cell-type composition and its changes during disease progression is an important step toward understanding disease pathogenesis. To infer cell-type composition, existing cell-type deconvolution methods for bulk RNA sequencing (RNA-seq) data often require matched single-cell RNA-seq (scRNA-seq) data, generated from samples with similar clinical conditions, as reference. However, due to the difficulty of obtaining scRNA-seq data in diseased samples, only limited scRNA-seq data in matched disease conditions are available. Using scRNA-seq reference to deconvolve bulk RNA-seq data from samples with different disease conditions may lead to a biased estimation of cell-type proportions. To overcome this limitation, we propose an iterative estimation procedure, MuSiC2, which is an extension of MuSiC, to perform deconvolution analysis of bulk RNA-seq data generated from samples with multiple clinical conditions where at least one condition is different from that of the scRNA-seq reference. Extensive benchmark evaluations indicated that MuSiC2 improved the accuracy of cell-type proportion estimates of bulk RNA-seq samples under different conditions as compared with the traditional MuSiC deconvolution. MuSiC2 was applied to two bulk RNA-seq datasets for deconvolution analysis, including one from human pancreatic islets and the other from human retina. We show that MuSiC2 improves current deconvolution methods and provides more accurate cell-type proportion estimates when the bulk and single-cell reference differ in clinical conditions. We believe the condition-specific cell-type composition estimates from MuSiC2 will facilitate the downstream analysis and help identify cellular targets of human diseases.


Subject(s)
RNA , Single-Cell Analysis , Humans , RNA/genetics , RNA-Seq , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Transcriptome , Sequence Analysis, RNA/methods
2.
Nutrients ; 14(8)2022 Apr 08.
Article in English | MEDLINE | ID: mdl-35458125

ABSTRACT

Vitamin A (VA) deficiency and diarrheal diseases are both serious public health issues worldwide. VA deficiency is associated with impaired intestinal barrier function and increased risk of mucosal infection-related mortality. The bioactive form of VA, retinoic acid, is a well-known regulator of mucosal integrity. Using Citrobacter rodentium-infected mice as a model for diarrheal diseases in humans, previous studies showed that VA-deficient (VAD) mice failed to clear C. rodentium as compared to their VA-sufficient (VAS) counterparts. However, the distinct intestinal gene responses that are dependent on the host's VA status still need to be discovered. The mRNAs extracted from the small intestine (SI) and the colon were sequenced and analyzed on three levels: differential gene expression, enrichment, and co-expression. C. rodentium infection interacted differentially with VA status to alter colon gene expression. Novel functional categories downregulated by this pathogen were identified, highlighted by genes related to the metabolism of VA, vitamin D, and ion transport, including improper upregulation of Cl- secretion and disrupted HCO3- metabolism. Our results suggest that derangement of micronutrient metabolism and ion transport, together with the compromised immune responses in VAD hosts, may be responsible for the higher mortality to C. rodentium under conditions of inadequate VA.


Subject(s)
Enterobacteriaceae Infections , Vitamin A Deficiency , Animals , Citrobacter rodentium , Colon/metabolism , Diarrhea/complications , Intestinal Mucosa/metabolism , Intestine, Small/metabolism , Mice , Mice, Inbred C57BL , Vitamin A/metabolism , Vitamin A Deficiency/complications
3.
Sci Rep ; 11(1): 15612, 2021 08 02.
Article in English | MEDLINE | ID: mdl-34341398

ABSTRACT

Age-related macular degeneration (AMD) is a blinding eye disease with no unifying theme for its etiology. We used single-cell RNA sequencing to analyze the transcriptomes of ~ 93,000 cells from the macula and peripheral retina from two adult human donors and bulk RNA sequencing from fifteen adult human donors with and without AMD. Analysis of our single-cell data identified 267 cell-type-specific genes. Comparison of macula and peripheral retinal regions found no cell-type differences but did identify 50 differentially expressed genes (DEGs) with about 1/3 expressed in cones. Integration of our single-cell data with bulk RNA sequencing data from normal and AMD donors showed compositional changes more pronounced in macula in rods, microglia, endothelium, Müller glia, and astrocytes in the transition from normal to advanced AMD. KEGG pathway analysis of our normal vs. advanced AMD eyes identified enrichment in complement and coagulation pathways, antigen presentation, tissue remodeling, and signaling pathways including PI3K-Akt, NOD-like, Toll-like, and Rap1. These results showcase the use of single-cell RNA sequencing to infer cell-type compositional and cell-type-specific gene expression changes in intact bulk tissue and provide a foundation for investigating molecular mechanisms of retinal disease that lead to new therapeutic targets.


Subject(s)
Macular Degeneration , Phosphatidylinositol 3-Kinases , RNA-Seq , Retina , Gene Expression Profiling , Humans , Sequence Analysis, RNA
4.
J Nutr Biochem ; 98: 108814, 2021 12.
Article in English | MEDLINE | ID: mdl-34242724

ABSTRACT

Vitamin A (VA) deficiency remains prevalent in resource limited areas. Using Citrobacter rodentium infection in mice as a model for diarrheal diseases, previous reports showed reduced pathogen clearance and survival due to vitamin A deficient (VAD) status. To characterize the impact of preexisting VA deficiency on gene expression patterns in the intestines, and to discover novel target genes in VA-related biological pathways, VA deficiency in mice were induced by diet. Total mRNAs were extracted from small intestine (SI) and colon, and sequenced. Differentially Expressed Gene (DEG), Gene Ontology (GO) enrichment, and co-expression network analyses were performed. DEGs compared between VAS and VAD groups detected 49 SI and 94 colon genes. By GO information, SI DEGs were significantly enriched in categories relevant to retinoid metabolic process, molecule binding, and immune function. Three co-expression modules showed significant correlation with VA status in SI; these modules contained four known retinoic acid targets. In addition, other SI genes of interest (e.g., Mbl2, Cxcl14, and Nr0b2) in these modules were suggested as new candidate genes regulated by VA. Furthermore, our analysis showed that markers of two cell types in SI, mast cells and Tuft cells, were significantly altered by VA status. In colon, "cell division" was the only enriched category and was negatively associated with VA. Thus, these data suggested that SI and colon have distinct networks under the regulation of dietary VA, and that preexisting VA deficiency could have a significant impact on the host response to a variety of disease conditions.


Subject(s)
Colon/metabolism , Intestine, Small/metabolism , RNA-Seq/methods , Vitamin A Deficiency/genetics , Animals , Citrobacter rodentium , Enterobacteriaceae Infections/genetics , Enterobacteriaceae Infections/microbiology , Gene Expression Profiling/methods , Gene Ontology , Mice , Mice, Inbred C57BL , RNA, Messenger/genetics , Transcriptome , Tretinoin/metabolism , Vitamin A/genetics , Vitamin A/metabolism
5.
Nat Commun ; 11(1): 2338, 2020 05 11.
Article in English | MEDLINE | ID: mdl-32393754

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) can characterize cell types and states through unsupervised clustering, but the ever increasing number of cells and batch effect impose computational challenges. We present DESC, an unsupervised deep embedding algorithm that clusters scRNA-seq data by iteratively optimizing a clustering objective function. Through iterative self-learning, DESC gradually removes batch effects, as long as technical differences across batches are smaller than true biological variations. As a soft clustering algorithm, cluster assignment probabilities from DESC are biologically interpretable and can reveal both discrete and pseudotemporal structure of cells. Comprehensive evaluations show that DESC offers a proper balance of clustering accuracy and stability, has a small footprint on memory, does not explicitly require batch information for batch effect removal, and can utilize GPU when available. As the scale of single-cell studies continues to grow, we believe DESC will offer a valuable tool for biomedical researchers to disentangle complex cellular heterogeneity.


Subject(s)
Cluster Analysis , Deep Learning , RNA-Seq , Single-Cell Analysis , Algorithms , Animals , Bone Marrow/metabolism , Gene Expression Regulation , Humans , Islets of Langerhans/metabolism , Leukocytes, Mononuclear/metabolism , Macaca , Mice , Monocytes/metabolism , Retina/metabolism
6.
Nat Mach Intell ; 2(10): 607-618, 2020 Oct.
Article in English | MEDLINE | ID: mdl-33817554

ABSTRACT

Clustering and cell type classification are important steps in single-cell RNA-seq (scRNA-seq) analysis. As more and more scRNA-seq data are becoming available, supervised cell type classification methods that utilize external well-annotated source data start to gain popularity over unsupervised clustering algorithms. However, the performance of existing supervised methods is highly dependent on source data quality, and they often have limited accuracy to classify cell types that are missing in the source data. To overcome these limitations, we developed ItClust, a transfer learning algorithm that borrows idea from supervised cell type classification algorithms, but also leverages information in target data to ensure sensitivity in classifying cells that are only present in the target data. Through extensive evaluations using data from different species and tissues generated with diverse scRNA-seq protocols, we show that ItClust significantly improves clustering and cell type classification accuracy over popular unsupervised clustering and supervised cell type classification algorithms.

7.
PLoS Comput Biol ; 14(9): e1006436, 2018 09.
Article in English | MEDLINE | ID: mdl-30240439

ABSTRACT

Co-expression network analysis provides useful information for studying gene regulation in biological processes. Examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. One challenge in this type of analysis is that the sample sizes in each condition are usually small, making the statistical inference of co-expression patterns highly underpowered. A joint network construction that borrows information from related structures across conditions has the potential to improve the power of the analysis. One possible approach to constructing the co-expression network is to use the Gaussian graphical model. Though several methods are available for joint estimation of multiple graphical models, they do not fully account for the heterogeneity between samples and between co-expression patterns introduced by condition specificity. Here we develop the condition-adaptive fused graphical lasso (CFGL), a data-driven approach to incorporate condition specificity in the estimation of co-expression networks. We show that this method improves the accuracy with which networks are learned. The application of this method on a rat multi-tissue dataset and The Cancer Genome Atlas (TCGA) breast cancer dataset provides interesting biological insights. In both analyses, we identify numerous modules enriched for Gene Ontology functions and observe that the modules that are upregulated in a particular condition are often involved in condition-specific activities. Interestingly, we observe that the genes strongly associated with survival time in the TCGA dataset are less likely to be network hubs, suggesting that genes associated with cancer progression are likely to govern specific functions or execute final biological functions in pathways, rather than regulating a large number of biological processes. Additionally, we observed that the tumor-specific hub genes tend to have few shared edges with normal tissue, revealing tumor-specific regulatory mechanism.


Subject(s)
Brain/metabolism , Breast Neoplasms/metabolism , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Myocardium/metabolism , Algorithms , Animals , Area Under Curve , Breast Neoplasms/genetics , Computer Graphics , Computer Simulation , Databases, Factual , Female , Heart , Humans , Male , Neoplasms/metabolism , Normal Distribution , Rats , Software
8.
J Am Stat Assoc ; 113(523): 1028-1039, 2018.
Article in English | MEDLINE | ID: mdl-31249430

ABSTRACT

The identification of reproducible signals from the results of replicate high-throughput experiments is an important part of modern biological research. Often little is known about the dependence structure and the marginal distribution of the data, motivating the development of a nonparametric approach to assess reproducibility. The procedure, which we call the maximum rank reproducibility (MaRR) procedure, uses a maximum rank statistic to parse reproducible signals from noise without making assumptions about the distribution of reproducible signals. Because it uses the rank scale this procedure can be easily applied to a variety of data types. One application is to assess the reproducibility of RNA-seq technology using data produced by the sequencing quality control (SEQC) consortium, which coordinated a multi-laboratory effort to assess reproducibility across three RNA-seq platforms. Our results on simulations and SEQC data show that the MaRR procedure effectively controls false discovery rates, has desirable power properties, and compares well to existing methods. Supplementary materials for this article are available online.

9.
BMC Bioinformatics ; 17 Suppl 1: 5, 2016 Jan 11.
Article in English | MEDLINE | ID: mdl-26818110

ABSTRACT

BACKGROUND: Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise to phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found between DEGs detected using the two technologies. Integration data across these two platforms has the potential to improve the power and reliability of DEG detection. METHODS: We propose a rank-based semi-parametric model to determine DEGs using information across different sources and apply it to the integration of RNA-seq and microarray data. By incorporating both the significance of differential expression and the consistency across platforms, our method effectively detects DEGs with moderate but consistent signals. We demonstrate the effectiveness of our method using simulation studies, MAQC/SEQC data and a synthetic microRNA dataset. CONCLUSIONS: Our integration method is not only robust to noise and heterogeneity in the data, but also adaptive to the structure of data. In our simulations and real data studies, our approach shows a higher discriminate power and identifies more biologically relevant DEGs than eBayes, DEseq and some commonly used meta-analysis methods.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Models, Statistical , Oligonucleotide Array Sequence Analysis/methods , RNA/genetics , Sequence Analysis, RNA/methods , Transcriptome , Gene Expression Profiling/methods , Humans , Reproducibility of Results
10.
Brief Bioinform ; 16(1): 24-31, 2015 Jan.
Article in English | MEDLINE | ID: mdl-24335788

ABSTRACT

As an important mechanism for adaptation to heterogeneous environment, plastic responses of correlated traits to environmental alteration may also be genetically correlated, but less is known about the underlying genetic basis. We describe a statistical model for mapping specific quantitative trait loci (QTLs) that control the interrelationship of phenotypic plasticity between different traits. The model is constructed by a bivariate mixture setting, implemented with the EM algorithm to estimate the genetic effects of QTLs on correlative plastic response. We provide a series of procedure that test (1) how a QTL controls the phenotypic plasticity of a single trait; and (2) how the QTL determines the correlation of environment-induced changes of different traits. The model is readily extended to test how epistatic interactions among QTLs play a part in the correlations of different plastic traits. The model was validated through computer simulation and used to analyse multi-environment data of genetic mapping in winter wheat, showing its utilization in practice.


Subject(s)
Models, Statistical , Quantitative Trait Loci/genetics , Chromosome Mapping , Gene-Environment Interaction , Genes, Plant , Phenotype , Triticum/genetics
11.
Brief Bioinform ; 16(1): 32-8, 2015 Jan.
Article in English | MEDLINE | ID: mdl-24177380

ABSTRACT

As a group of important plant species in agriculture and biology, polyploids have been increasingly studied in terms of their genome structure and organization. There are two types of polyploids, allopolyploids and autopolyploids, each resulting from a different genetic origin, which undergo meiotic divisions of a distinct complexity. A set of statistical models has been developed for linkage analysis, respectively for each type, by taking into account their unique meiotic behavior, i.e. preferential pairing for allopolyploids and double reduction for autopolyploids. We synthesized these models and modified them to accommodate the linkage analysis of less informative dominant markers. By reanalysing a published data set of varying ploidy in Arabidopsis, we corrected the estimates of the meiotic recombination frequency aimed to study the significance of polyploidization.


Subject(s)
Arabidopsis/genetics , Genetic Linkage , Models, Genetic , Tetraploidy , Chromosome Mapping , Genes, Plant , Recombination, Genetic
12.
Brief Bioinform ; 15(6): 1044-56, 2014 Nov.
Article in English | MEDLINE | ID: mdl-24177379

ABSTRACT

As a group of economically important species, linkage mapping of polysomic autotetraploids, including potato, sugarcane and rose, is difficult to conduct due to their unique meiotic property of double reduction that allows sister chromatids to enter into the same gamete. We describe and assess a statistical model for mapping quantitative trait loci (QTLs) in polysomic autotetraploids. The model incorporates double reduction, built in the mixture model-based framework and implemented with the expectation-maximization algorithm. It allows the simultaneous estimation of QTL positions, QTL effects and the degree of double reduction as well as the assessment of the estimation precision of these parameters. We performed computer simulation to examine the statistical properties of the method and validate its use through analyzing real data in tetraploid switchgrass.


Subject(s)
Chromosome Mapping/statistics & numerical data , Models, Genetic , Quantitative Trait Loci , Tetraploidy , Algorithms , Computational Biology , Computer Simulation , Likelihood Functions , Models, Statistical , Monte Carlo Method , Panicum/genetics , Plants/genetics , Polyribosomes/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...