Search | VHL Regional Portal

1.

Path algorithms for fused lasso signal approximator with application to COVID-19 spread in Korea.

Son, Won; Lim, Johan; Yu, Donghyeon.

Int Stat Rev ; 2022 Oct 19.

Article in English | MEDLINE | ID: mdl-36710888

ABSTRACT

The fused lasso signal approximator (FLSA) is a smoothing procedure for noisy observations that uses fused lasso penalty on unobserved mean levels to find sparse signal blocks. Several path algorithms have been developed to obtain the whole solution path of the FLSA. However, it is known that the FLSA has model selection inconsistency when the underlying signals have a stair-case block, where three consecutive signal blocks are either strictly increasing or decreasing. Modified path algorithms for the FLSA have been proposed to guarantee model selection consistency regardless of the stair-case block. In this paper, we provide a comprehensive review of the path algorithms for the FLSA and prove the properties of the recently modified path algorithms' hitting times. Specifically, we reinterpret the modified path algorithm as the path algorithm for local FLSA problems and reveal the condition that the hitting time for the fusion of the modified path algorithm is not monotone in a tuning parameter. To recover the monotonicity of the solution path, we propose a pathwise adaptive FLSA having monotonicity with similar performance as the modified solution path algorithm. Finally, we apply the proposed method to the number of daily-confirmed cases of COVID-19 in Korea to identify the change points of its spread.

2.

Truncated rank correlation (TRC) as a robust measure of test-retest reliability in mass spectrometry data.

Lim, Johan; Yu, Donghyeon; Kuo, Hsun-Chih; Choi, Hyungwon; Walmsley, Scott.

Stat Appl Genet Mol Biol ; 18(4)2019 05 30.

Article in English | MEDLINE | ID: mdl-31145698

ABSTRACT

In mass spectrometry (MS) experiments, more than thousands of peaks are detected in the space of mass-to-charge ratio and chromatographic retention time, each associated with an abundance measurement. However, a large proportion of the peaks consists of experimental noise and low abundance compounds are typically masked by noise peaks, compromising the quality of the data. In this paper, we propose a new measure of similarity between a pair of MS experiments, called truncated rank correlation (TRC). To provide a robust metric of similarity in noisy high-dimensional data, TRC uses truncated top ranks (or top m-ranks) for calculating correlation. A comprehensive numerical study suggests that TRC outperforms traditional sample correlation and Kendall's τ. We apply TRC to measuring test-retest reliability of two MS experiments, including biological replicate analysis of the metabolome in HEK293 cells and metabolomic profiling of benign prostate hyperplasia (BPH) patients. An R package trc of the proposed TRC and related functions is available at https://sites.google.com/site/dhyeonyu/software.

Subject(s)

Mass Spectrometry/methods , HEK293 Cells , Humans , Male , Metabolomics/methods , Prostatic Hyperplasia/metabolism , Reproducibility of Results

3.

GeNeCK: a web server for gene network construction and visualization.

Zhang, Minzhe; Li, Qiwei; Yu, Donghyeon; Yao, Bo; Guo, Wei; Xie, Yang; Xiao, Guanghua.

BMC Bioinformatics ; 20(1): 12, 2019 Jan 07.

Article in English | MEDLINE | ID: mdl-30616521

ABSTRACT

BACKGROUND: Reverse engineering approaches to infer gene regulatory networks using computational methods are of great importance to annotate gene functionality and identify hub genes. Although various statistical algorithms have been proposed, development of computational tools to integrate results from different methods and user-friendly online tools is still lagging. RESULTS: We developed a web server that efficiently constructs gene networks from expression data. It allows the user to use ten different network construction methods (such as partial correlation-, likelihood-, Bayesian- and mutual information-based methods) and integrates the resulting networks from multiple methods. Hub gene information, if available, can be incorporated to enhance performance. CONCLUSIONS: GeNeCK is an efficient and easy-to-use web application for gene regulatory network construction. It can be accessed at http://lce.biohpc.swmed.edu/geneck .

Subject(s)

Computational Biology/methods , Computer Graphics , Gene Regulatory Networks , Internet , Protein Interaction Maps , Software , Algorithms , Bayes Theorem , Humans , Transcriptome

4.

COEX-Seq: Convert a Variety of Measurements of Gene Expression in RNA-Seq.

Kim, Sang Cheol; Yu, Donghyeon; Cho, Seong Beom.

Genomics Inform ; 16(4): e36, 2018 Dec.

Article in English | MEDLINE | ID: mdl-30602097

ABSTRACT

Next generation sequencing (NGS), a high-throughput DNA sequencing technology, is widely used for molecular biological studies. In NGS, RNA-sequencing (RNA-Seq), which is a short-read massively parallel sequencing, is a major quantitative transcriptome tool for different transcriptome studies. To utilize the RNA-Seq data, various quantification and analysis methods have been developed to solve specific research goals, including identification of differentially expressed genes and detection of novel transcripts. Because of the accumulation of RNA-Seq data in the public databases, there is a demand for integrative analysis. However, the available RNA-Seq data are stored in different formats such as read count, transcripts per million, and fragments per kilobase million. This hinders the integrative analysis of the RNA-Seq data. To solve this problem, we have developed a web-based application using Shiny, COEX-seq (Convert a Variety of Measurements of Gene Expression in RNA-Seq) that easily converts data in a variety of measurement formats of gene expression used in most bioinformatic tools for RNA-Seq. It provides a workflow that includes loading data set, selecting measurement formats of gene expression, and identifying gene names. COEX-seq is freely available for academic purposes and can be run on Windows, Mac OS, and Linux operating systems. Source code, sample data sets, and supplementary documentation are available as well.

5.

Fused Lasso Regression for Identifying Differential Correlations in Brain Connectome Graphs.

Yu, Donghyeon; Lee, Sang Han; Lim, Johan; Xiao, Guanghua; Craddock, R Cameron; Biswal, Bharat B.

Stat Anal Data Min ; 11(5): 203-226, 2018 Oct.

Article in English | MEDLINE | ID: mdl-34386148

ABSTRACT

In this paper, we propose a procedure to find differential edges between two graphs from high-dimensional data. We estimate two matrices of partial correlations and their differences by solving a penalized regression problem. We assume sparsity only on differences between two graphs, not graphs themselves. Thus, we impose an â 2 penalty on partial correlations and an â 1 penalty on their differences in the penalized regression problem. We apply the proposed procedure to finding differential functional connectivity between healthy individuals and Alzheimer's disease patients.

6.

Enhanced construction of gene regulatory networks using hub gene information.

Yu, Donghyeon; Lim, Johan; Wang, Xinlei; Liang, Faming; Xiao, Guanghua.

BMC Bioinformatics ; 18(1): 186, 2017 Mar 23.

Article in English | MEDLINE | ID: mdl-28335719

ABSTRACT

BACKGROUND: Gene regulatory networks reveal how genes work together to carry out their biological functions. Reconstructions of gene networks from gene expression data greatly facilitate our understanding of underlying biological mechanisms and provide new opportunities for biomarker and drug discoveries. In gene networks, a gene that has many interactions with other genes is called a hub gene, which usually plays an essential role in gene regulation and biological processes. In this study, we developed a method for reconstructing gene networks using a partial correlation-based approach that incorporates prior information about hub genes. Through simulation studies and two real-data examples, we compare the performance in estimating the network structures between the existing methods and the proposed method. RESULTS: In simulation studies, we show that the proposed strategy reduces errors in estimating network structures compared to the existing methods. When applied to Escherichia coli, the regulation network constructed by our proposed ESPACE method is more consistent with current biological knowledge than the SPACE method. Furthermore, application of the proposed method in lung cancer has identified hub genes whose mRNA expression predicts cancer progress and patient response to treatment. CONCLUSIONS: We have demonstrated that incorporating hub gene information in estimating network structures can improve the performance of the existing methods.

Subject(s)

Gene Expression Profiling/methods , Gene Regulatory Networks/genetics , Humans

7.

Predicting progression from mild cognitive impairment to Alzheimer's disease using longitudinal callosal atrophy.

Lee, Sang Han; Bachman, Alvin H; Yu, Donghyeon; Lim, Johan; Ardekani, Babak A.

Alzheimers Dement (Amst) ; 2: 68-74, 2016.

Article in English | MEDLINE | ID: mdl-27239537

ABSTRACT

INTRODUCTION: We investigate whether longitudinal callosal atrophy could predict conversion from mild cognitive impairment (MCI) to Alzheimer's disease (AD). METHODS: Longitudinal (baseline + 1-year follow-up) MRI scans of 132 MCI subjects from the Alzheimer's Disease Neuroimaging Initiative were used. A total of 54 subjects did not convert to AD over an average (±SD) follow-up of 5.46 (±1.63) years, whereas 78 converted to AD with an average conversion time of 2.56 (±1.65) years. Annual change in the corpus callosum thickness profile was calculated from the baseline and 1-year follow-up MRI. A logistic regression model with fused lasso regularization for prediction was applied to the annual changes. RESULTS: We found a sex difference. The accuracy of prediction was 84% in females and 61% in males. The discriminating regions of corpus callosum differed between sexes. In females, the genu, rostrum, and posterior body had predictive power, whereas the genu and splenium were relevant in males. DISCUSSION: Annual callosal atrophy predicts MCI-to-AD conversion in females more accurately than in males.

8.

An integrative somatic mutation analysis to identify pathways linked with survival outcomes across 19 cancer types.

Park, Sunho; Kim, Seung-Jun; Yu, Donghyeon; Peña-Llopis, Samuel; Gao, Jianjiong; Park, Jin Suk; Chen, Beibei; Norris, Jessie; Wang, Xinlei; Chen, Min; Kim, Minsoo; Yong, Jeongsik; Wardak, Zabi; Choe, Kevin; Story, Michael; Starr, Timothy; Cheong, Jae-Ho; Hwang, Tae Hyun.

Bioinformatics ; 32(11): 1643-51, 2016 06 01.

Article in English | MEDLINE | ID: mdl-26635139

ABSTRACT

MOTIVATION: Identification of altered pathways that are clinically relevant across human cancers is a key challenge in cancer genomics. Precise identification and understanding of these altered pathways may provide novel insights into patient stratification, therapeutic strategies and the development of new drugs. However, a challenge remains in accurately identifying pathways altered by somatic mutations across human cancers, due to the diverse mutation spectrum. We developed an innovative approach to integrate somatic mutation data with gene networks and pathways, in order to identify pathways altered by somatic mutations across cancers. RESULTS: We applied our approach to The Cancer Genome Atlas (TCGA) dataset of somatic mutations in 4790 cancer patients with 19 different types of tumors. Our analysis identified cancer-type-specific altered pathways enriched with known cancer-relevant genes and targets of currently available drugs. To investigate the clinical significance of these altered pathways, we performed consensus clustering for patient stratification using member genes in the altered pathways coupled with gene expression datasets from 4870 patients from TCGA, and multiple independent cohorts confirmed that the altered pathways could be used to stratify patients into subgroups with significantly different clinical outcomes. Of particular significance, certain patient subpopulations with poor prognosis were identified because they had specific altered pathways for which there are available targeted therapies. These findings could be used to tailor and intensify therapy in these patients, for whom current therapy is suboptimal. AVAILABILITY AND IMPLEMENTATION: The code is available at: http://www.taehyunlab.org CONTACT: jhcheong@yuhs.ac or taehyun.hwang@utsouthwestern.edu or taehyun.cs@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Mutation , Neoplasms , DNA Mutational Analysis , Gene Regulatory Networks , Genomics , Humans

9.

Statistical completion of a partially identified graph with applications for the estimation of gene regulatory networks.

Yu, Donghyeon; Son, Won; Lim, Johan; Xiao, Guanghua.

Biostatistics ; 16(4): 670-85, 2015 Oct.

Article in English | MEDLINE | ID: mdl-25837438

ABSTRACT

We study the estimation of a Gaussian graphical model whose dependent structures are partially identified. In a Gaussian graphical model, an off-diagonal zero entry in the concentration matrix (the inverse covariance matrix) implies the conditional independence of two corresponding variables, given all other variables. A number of methods have been proposed to estimate a sparse large-scale Gaussian graphical model or, equivalently, a sparse large-scale concentration matrix. In practice, the graph structure to be estimated is often partially identified by other sources or a pre-screening. In this paper, we propose a simple modification of existing methods to take into account this information in the estimation. We show that the partially identified dependent structure reduces the error in estimating the dependent structure. We apply the proposed method to estimating the gene regulatory network from lung cancer data, where protein-protein interactions are partially identified from the human protein reference database. The application shows that proposed method identified many important cancer genes as hub genes in the constructed lung cancer network. In addition, we validated the prognostic importance of a newly identified cancer gene, PTPN13, in four independent lung cancer datasets. The results indicate that the proposed method could facilitate studying underlying lung cancer mechanisms and identifying reliable biomarkers for lung cancer prognosis.

Subject(s)

Gene Regulatory Networks , Lung Neoplasms/genetics , Models, Statistical , Protein Interaction Maps , Humans

10.

Application of fused lasso logistic regression to the study of corpus callosum thickness in early Alzheimer's disease.

Lee, Sang H; Yu, Donghyeon; Bachman, Alvin H; Lim, Johan; Ardekani, Babak A.

J Neurosci Methods ; 221: 78-84, 2014 Jan 15.

Article in English | MEDLINE | ID: mdl-24121089

ABSTRACT

We propose a fused lasso logistic regression to analyze callosal thickness profiles. The fused lasso regression imposes penalties on both the l1-norm of the model coefficients and their successive differences, and finds only a small number of non-zero coefficients which are locally constant. An iterative method of solving logistic regression with fused lasso regularization is proposed to make this a practical procedure. In this study we analyzed callosal thickness profiles sampled at 100 equal intervals between the rostrum and the splenium. The method was applied to corpora callosa of elderly normal controls (NCs) and patients with very mild or mild Alzheimer's disease (AD) from the Open Access Series of Imaging Studies (OASIS) database. We found specific locations in the genu and splenium of AD patients that are proportionally thinner than those of NCs. Callosal thickness in these regions combined with the Mini Mental State Examination scores differentiated AD from NC with 84% accuracy.

Subject(s)

Alzheimer Disease/pathology , Corpus Callosum/pathology , Image Interpretation, Computer-Assisted/methods , Aged , Aged, 80 and over , Female , Humans , Logistic Models , Magnetic Resonance Imaging , Male , Middle Aged

11.

Review of biological network data and its applications.

Yu, Donghyeon; Kim, Minsoo; Xiao, Guanghua; Hwang, Tae Hyun.

Genomics Inform ; 11(4): 200-10, 2013 Dec.

Article in English | MEDLINE | ID: mdl-24465231

ABSTRACT

Studying biological networks, such as protein-protein interactions, is key to understanding complex biological activities. Various types of large-scale biological datasets have been collected and analyzed with high-throughput technologies, including DNA microarray, next-generation sequencing, and the two-hybrid screening system, for this purpose. In this review, we focus on network-based approaches that help in understanding biological systems and identifying biological functions. Accordingly, this paper covers two major topics in network biology: reconstruction of gene regulatory networks and network-based applications, including protein function prediction, disease gene prioritization, and network-based genome-wide association study.

12.

Detection of differentially expressed gene sets in a partially paired microarray data set.

Lim, Johan; Kim, Jayoun; Kim, Sang-cheol; Yu, Donghyeon; Kim, Kyunga; Kim, Byung Soo.

Stat Appl Genet Mol Biol ; 11(3): Article 5, 2012 Feb 15.

Article in English | MEDLINE | ID: mdl-22499704

ABSTRACT

Partially paired data sets often occur in microarray experiments (Kim et al., 2005; Liu, Liang and Jang, 2006). Discussions of testing with partially paired data are found in the literature (Lin and Stivers 1974; Ekbohm, 1976; Bhoj, 1978). Bhoj (1978) initially proposed a test statistic that uses a convex combination of paired and unpaired t statistics. Kim et al. (2005) later proposed the t3 statistic, which is a linear combination of paired and unpaired t statistics, and then used it to detect differentially expressed (DE) genes in colorectal cancer (CRC) cDNA microarray data. In this paper, we extend Kim et al.'s t3 statistic to the Hotelling's T2 type statistic Tp for detecting DE gene sets of size p. We employ Efron's empirical null principle to incorporate inter-gene correlation in the estimation of the false discovery rate. Then, the proposed Tp statistic is applied to Kim et al's CRC data to detect the DE gene sets of sizes p=2 and p=3. Our results show that for small p, particularly for p=2 and marginally for p=3, the proposed Tp statistic compliments the univariate procedure by detecting additional DE genes that were undetected in the univariate test procedure. We also conduct a simulation study to demonstrate that Efron's empirical null principle is robust to the departure from the normal assumption.

Subject(s)

Colorectal Neoplasms/genetics , Gene Expression Profiling/methods , Models, Statistical , Algorithms , Computer Simulation , Gene Expression Regulation, Neoplastic , Humans , Oligonucleotide Array Sequence Analysis

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL