Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
1.
Neural Netw ; 167: 309-330, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37666188

ABSTRACT

Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies. Although several works have studied theoretical and numerical properties of sparse neural architectures, they have primarily focused on the edge selection. Sparsity through edge selection might be intuitively appealing; however, it does not necessarily reduce the structural complexity of a network. Instead pruning excessive nodes leads to a structurally sparse network with significant computational speedup during inference. To this end, we propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for automatic node selection during training. The use of spike-and-slab prior alleviates the need of an ad-hoc thresholding rule for pruning. In addition, we adopt a variational Bayes approach to circumvent the computational challenges of traditional Markov Chain Monte Carlo (MCMC) implementation. In the context of node selection, we establish the fundamental result of variational posterior consistency together with the characterization of prior parameters. In contrast to the previous works, our theoretical development relaxes the assumptions of the equal number of nodes and uniform bounds on all network weights, thereby accommodating sparse networks with layer-dependent node structures or coefficient bounds. With a layer-wise characterization of prior inclusion probabilities, we discuss the optimal contraction rates of the variational posterior. We empirically demonstrate that our proposed approach outperforms the edge selection method in computational complexity with similar or better predictive performance. Our experimental evidence further substantiates that our theoretical work facilitates layer-wise optimal node recovery.


Subject(s)
Algorithms , Neural Networks, Computer , Bayes Theorem , Markov Chains , Monte Carlo Method
2.
BMC Bioinformatics ; 24(1): 127, 2023 Apr 04.
Article in English | MEDLINE | ID: mdl-37016281

ABSTRACT

BACKGROUND: Characterizing the topology of gene regulatory networks (GRNs) is a fundamental problem in systems biology. The advent of single cell technologies has made it possible to construct GRNs at finer resolutions than bulk and microarray datasets. However, cellular heterogeneity and sparsity of the single cell datasets render void the application of regular Gaussian assumptions for constructing GRNs. Additionally, most GRN reconstruction approaches estimate a single network for the entire data. This could cause potential loss of information when single cell datasets are generated from multiple treatment conditions/disease states. RESULTS: To better characterize single cell GRNs under different but related conditions, we propose the joint estimation of multiple networks using multiple signed graph learning (scMSGL). The proposed method is based on recently developed graph signal processing (GSP) based graph learning, where GRNs and gene expressions are modeled as signed graphs and graph signals, respectively. scMSGL learns multiple GRNs by optimizing the total variation of gene expressions with respect to GRNs while ensuring that the learned GRNs are similar to each other through regularization with respect to a learned signed consensus graph. We further kernelize scMSGL with the kernel selected to suit the structure of single cell data. CONCLUSIONS: scMSGL is shown to have superior performance over existing state of the art methods in GRN recovery on simulated datasets. Furthermore, scMSGL successfully identifies well-established regulators in a mouse embryonic stem cell differentiation study and a cancer clinical study of medulloblastoma.


Subject(s)
Gene Regulatory Networks , Neoplasms , Animals , Mice , Systems Biology , Sequence Analysis, RNA , Algorithms
3.
Toxicol Sci ; 191(1): 135-148, 2023 01 31.
Article in English | MEDLINE | ID: mdl-36222588

ABSTRACT

2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) dose-dependently induces the development of hepatic fat accumulation and inflammation with fibrosis in mice initially in the portal region. Conversely, differential gene and protein expression is first detected in the central region. To further investigate cell-specific and spatially resolved dose-dependent changes in gene expression elicited by TCDD, single-nuclei RNA sequencing and spatial transcriptomics were used for livers of male mice gavaged with TCDD every 4 days for 28 days. The proportion of 11 cell (sub)types across 131 613 nuclei dose-dependently changed with 68% of all portal and central hepatocyte nuclei in control mice being overtaken by macrophages following TCDD treatment. We identified 368 (portal fibroblasts) to 1339 (macrophages) differentially expressed genes. Spatial analyses revealed initial loss of portal identity that eventually spanned the entire liver lobule with increasing dose. Induction of R-spondin 3 (Rspo3) and pericentral Apc, suggested dysregulation of the Wnt/ß-catenin signaling cascade in zonally resolved steatosis. Collectively, the integrated results suggest disruption of zonation contributes to the pattern of TCDD-elicited NAFLD pathologies.


Subject(s)
Non-alcoholic Fatty Liver Disease , Polychlorinated Dibenzodioxins , Mice , Male , Animals , Polychlorinated Dibenzodioxins/toxicity , Transcriptome , Liver/metabolism , Non-alcoholic Fatty Liver Disease/metabolism , Gene Expression Profiling
4.
Article in English | MEDLINE | ID: mdl-35584070

ABSTRACT

We consider the problem of nonparametric classification from a high-dimensional input vector (small n large p problem). To handle the high-dimensional feature space, we propose a random projection (RP) of the feature space followed by training of a neural network (NN) on the compressed feature space. Unlike regularization techniques (lasso, ridge, etc.), which train on the full data, NNs based on compressed feature space have significantly lower computation complexity and memory storage requirements. Nonetheless, a random compression-based method is often sensitive to the choice of compression. To address this issue, we adopt a Bayesian model averaging (BMA) approach and leverage the posterior model weights to determine: 1) uncertainty under each compression and 2) intrinsic dimensionality of the feature space (the effective dimension of feature space useful for prediction). The final prediction is improved by averaging models with projected dimensions close to the intrinsic dimensionality. Furthermore, we propose a variational approach to the afore-mentioned BMA to allow for simultaneous estimation of both model weights and model-specific parameters. Since the proposed variational solution is parallelizable across compressions, it preserves the computational gain of frequentist ensemble techniques while providing the full uncertainty quantification of a Bayesian approach. We establish the asymptotic consistency of the proposed algorithm under the suitable characterization of the RPs and the prior parameters. Finally, we provide extensive numerical examples for empirical validation of the proposed method.

5.
Bioinformatics ; 38(11): 3011-3019, 2022 05 26.
Article in English | MEDLINE | ID: mdl-35451460

ABSTRACT

MOTIVATION: Elucidating the topology of gene regulatory networks (GRNs) from large single-cell RNA sequencing datasets, while effectively capturing its inherent cell-cycle heterogeneity and dropouts, is currently one of the most pressing problems in computational systems biology. Recently, graph learning (GL) approaches based on graph signal processing have been developed to infer graph topology from signals defined on graphs. However, existing GL methods are not suitable for learning signed graphs, a characteristic feature of GRNs, which are capable of accounting for both activating and inhibitory relationships in the gene network. They are also incapable of handling high proportion of zero values present in the single cell datasets. RESULTS: To this end, we propose a novel signed GL approach, scSGL, that learns GRNs based on the assumption of smoothness and non-smoothness of gene expressions over activating and inhibitory edges, respectively. scSGL is then extended with kernels to account for non-linearity of co-expression and for effective handling of highly occurring zero values. The proposed approach is formulated as a non-convex optimization problem and solved using an efficient ADMM framework. Performance assessment using simulated datasets demonstrates the superior performance of kernelized scSGL over existing state of the art methods in GRN recovery. The performance of scSGL is further investigated using human and mouse embryonic datasets. AVAILABILITY AND IMPLEMENTATION: The scSGL code and analysis scripts are available on https://github.com/Single-Cell-Graph-Learning/scSGL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Gene Regulatory Networks , Animals , Humans , Mice , Systems Biology
6.
Nucleic Acids Res ; 50(8): e48, 2022 05 06.
Article in English | MEDLINE | ID: mdl-35061903

ABSTRACT

The application of single-cell RNA sequencing (scRNAseq) for the evaluation of chemicals, drugs, and food contaminants presents the opportunity to consider cellular heterogeneity in pharmacological and toxicological responses. Current differential gene expression analysis (DGEA) methods focus primarily on two group comparisons, not multi-group dose-response study designs used in safety assessments. To benchmark DGEA methods for dose-response scRNAseq experiments, we proposed a multiplicity corrected Bayesian testing approach and compare it against 8 other methods including two frequentist fit-for-purpose tests using simulated and experimental data. Our Bayesian test method outperformed all other tests for a broad range of accuracy metrics including control of false positive error rates. Most notable, the fit-for-purpose and standard multiple group DGEA methods were superior to the two group scRNAseq methods for dose-response study designs. Collectively, our benchmarking of DGEA methods demonstrates the importance in considering study design when determining the most appropriate test methods.


Subject(s)
Benchmarking , Research Design , Bayes Theorem , Gene Expression
7.
R Soc Open Sci ; 8(12): 211102, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34925868

ABSTRACT

The responses of plant photosynthesis to rapid fluctuations in environmental conditions are critical for efficient conversion of light energy. These responses are not well-seen laboratory conditions and are difficult to probe in field environments. We demonstrate an open science approach to this problem that combines multifaceted measurements of photosynthesis and environmental conditions, and an unsupervised statistical clustering approach. In a selected set of data on mint (Mentha sp.), we show that 'light potentials' for linear electron flow and non-photochemical quenching (NPQ) upon rapid light increases are strongly suppressed in leaves previously exposed to low ambient photosynthetically active radiation (PAR) or low leaf temperatures, factors that can act both independently and cooperatively. Further analyses allowed us to test specific mechanisms. With decreasing leaf temperature or PAR, limitations to photosynthesis during high light fluctuations shifted from rapidly induced NPQ to photosynthetic control of electron flow at the cytochrome b6f complex. At low temperatures, high light induced lumen acidification, but did not induce NPQ, leading to accumulation of reduced electron transfer intermediates, probably inducing photodamage, revealing a potential target for improving the efficiency and robustness of photosynthesis. We discuss the implications of the approach for open science efforts to understand and improve crop productivity.

8.
Stat Methods Med Res ; 30(10): 2207-2220, 2021 10.
Article in English | MEDLINE | ID: mdl-34460337

ABSTRACT

The primary objective of this paper is to develop a statistically valid classification procedure for analyzing brain image volumetrics data obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) in elderly subjects with cognitive impairments. The Bayesian group lasso method thereby proposed for logistic regression efficiently selects an optimal model with the use of a spike and slab type prior. This method selects groups of attributes of a brain subregion encouraged by the group lasso penalty. We conduct simulation studies for high- and low-dimensional scenarios where our method is always able to select the true parameters that are truly predictive among a large number of parameters. The method is then applied on dichotomous response ADNI data which selects predictive atrophied brain regions and classifies Alzheimer's disease patients from healthy controls. Our analysis is able to give an accuracy rate of 80% for classifying Alzheimer's disease. The suggested method selects 29 brain subregions. The medical literature indicates that all these regions are associated with Alzheimer's patients. The Bayesian method of model selection further helps selecting only the subregions that are statistically significant, thus obtaining an optimal model.


Subject(s)
Alzheimer Disease , Cognitive Dysfunction , Aged , Alzheimer Disease/diagnostic imaging , Bayes Theorem , Brain/diagnostic imaging , Cognitive Dysfunction/diagnostic imaging , Humans , Magnetic Resonance Imaging , Neuroimaging
9.
J Alzheimers Dis ; 83(4): 1859-1875, 2021.
Article in English | MEDLINE | ID: mdl-34459391

ABSTRACT

BACKGROUND: The transition from mild cognitive impairment (MCI) to dementia is of great interest to clinical research on Alzheimer's disease and related dementias. This phenomenon also serves as a valuable data source for quantitative methodological researchers developing new approaches for classification. However, the growth of machine learning (ML) approaches for classification may falsely lead many clinical researchers to underestimate the value of logistic regression (LR), which often demonstrates classification accuracy equivalent or superior to other ML methods. Further, when faced with many potential features that could be used for classifying the transition, clinical researchers are often unaware of the relative value of different approaches for variable selection. OBJECTIVE: The present study sought to compare different methods for statistical classification and for automated and theoretically guided feature selection techniques in the context of predicting conversion from MCI to dementia. METHODS: We used data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) to evaluate different influences of automated feature preselection on LR and support vector machine (SVM) classification methods, in classifying conversion from MCI to dementia. RESULTS: The present findings demonstrate how similar performance can be achieved using user-guided, clinically informed pre-selection versus algorithmic feature selection techniques. CONCLUSION: These results show that although SVM and other ML techniques are capable of relatively accurate classification, similar or higher accuracy can often be achieved by LR, mitigating SVM's necessity or value for many clinical researchers.


Subject(s)
Alzheimer Disease/classification , Cognitive Dysfunction/classification , Machine Learning , Aged , Brain/pathology , Female , Humans , Magnetic Resonance Imaging , Male , Support Vector Machine
10.
Neural Netw ; 137: 151-173, 2021 May.
Article in English | MEDLINE | ID: mdl-33607444

ABSTRACT

Despite the popularism of Bayesian neural networks (BNNs) in recent years, its use is somewhat limited in complex and big data situations due to the computational cost associated with full posterior evaluations. Variational Bayes (VB) provides a useful alternative to circumvent the computational cost and time complexity associated with the generation of samples from the true posterior using Markov Chain Monte Carlo (MCMC) techniques. The efficacy of the VB methods is well established in machine learning literature. However, its potential broader impact is hindered due to a lack of theoretical validity from a statistical perspective. In this paper, we establish the fundamental result of posterior consistency for the mean-field variational posterior (VP) for a feed-forward artificial neural network model. The paper underlines the conditions needed to guarantee that the VP concentrates around Hellinger neighborhoods of the true density function. Additionally, the role of the scale parameter and its influence on the convergence rates has also been discussed. The paper mainly relies on two results (1) the rate at which the true posterior grows (2) the rate at which the Kullback-Leibler (KL) distance between the posterior and variational posterior grows. The theory provides a guideline for building prior distributions for BNNs along with an assessment of accuracy of the corresponding VB implementation.


Subject(s)
Machine Learning , Bayes Theorem , Markov Chains , Monte Carlo Method
11.
J Nurs Care Qual ; 35(3): 206-212, 2020.
Article in English | MEDLINE | ID: mdl-32433142

ABSTRACT

BACKGROUND: Negative nurse work environments have been associated with nurse bullying and poor nurse health. However, few studies have examined the influence of nurse bullying on actual patient outcomes. PURPOSE: The purpose of the study was to examine the association between nurse-reported bullying and documented nursing-sensitive patient outcomes. METHODS: Nurses (n = 432) in a large US hospital responded to a survey on workplace bullying. Unit-level data for 5 adverse patient events and nurse staffing were acquired from the National Database of Nursing Quality Indicators. Generalized linear models were used to examine the association between bullying and adverse patient events. A Bayesian regression analysis was used to confirm the findings. RESULTS: After controlling for nurse staffing and qualification, nurse-reported bullying was significantly associated with the incidence of central-line-associated bloodstream infections (P < .001). CONCLUSIONS: Interventions to address bullying, a malleable aspect of the nurse practice environment, may help to reduce adverse patient events.


Subject(s)
Bullying/statistics & numerical data , Catheterization, Central Venous/adverse effects , Hospitals , Incidence , Nursing Staff, Hospital , Workplace , Adult , Catheter-Related Infections/complications , Cross-Sectional Studies , Female , Humans , Inpatients/statistics & numerical data , Nursing Staff, Hospital/psychology , Nursing Staff, Hospital/statistics & numerical data , Retrospective Studies , Surveys and Questionnaires , United States
12.
IEEE J Biomed Health Inform ; 23(6): 2537-2550, 2019 11.
Article in English | MEDLINE | ID: mdl-30714936

ABSTRACT

Translating recent advances in abdominal aortic aneurysm (AAA) growth and remodeling (G&R) knowledge into a predictive, patient-specific clinical treatment tool requires a major paradigm shift in computational modeling. The objectives of this study are to develop a prediction framework that first calibrates the physical AAA G&R model using patient-specific serial computed tomography (CT) scan images, predicts the expansion of an AAA in the future, and quantifies the associated uncertainty in the prediction. We adopt a Bayesian calibration method to calibrate parameters in the G&R computational model and predict the magnitude of AAA expansion. The proposed Bayesian approach can take different sources of uncertainty; therefore, it is well suited to achieve our aims in predicting the AAA expansion process as well as in computing the propagated uncertainty. We demonstrate how to achieve the proposed aims by solving the formulated Bayesian calibration problems for cases with the synthetic G&R model output data and real medical patient-specific CT data. We compare and discuss the performance of predictions and computation time under different sampling cases of the model output data and patient data, both of which are simulated by the G&R computation. Furthermore, we apply our Bayesian calibration to real patient-specific serial CT data and validate our prediction. The accuracy and efficiency of the proposed method is promising, which appeals to computational and medical communities.


Subject(s)
Aortic Aneurysm, Abdominal/diagnostic imaging , Aortic Aneurysm, Abdominal/pathology , Image Interpretation, Computer-Assisted/methods , Patient-Specific Modeling , Bayes Theorem , Computer Simulation , Disease Progression , Humans , Tomography, X-Ray Computed
13.
Stat Methods Med Res ; 28(9): 2801-2819, 2019 09.
Article in English | MEDLINE | ID: mdl-30039745

ABSTRACT

With rapid aging of world population, Alzheimer's disease is becoming a leading cause of death after cardiovascular disease and cancer. Nearly 10% of people who are over 65 years old are affected by Alzheimer's disease. The causes have been studied intensively, but no definitive answer has been found. Genetic predisposition, abnormal protein deposits in brain, and environmental factors are suspected to play a role in the development of this disease. In this paper, we model progression of Alzheimer's disease using a multi-state Markov model to investigate the significance of known risk factors such as age, apolipoprotein E4, and some brain structural volumetric variables from magnetic resonance imaging scans (e.g., hippocampus, etc.) while predicting transitions between different clinical diagnosis states. With the Alzheimer's Disease Neuroimaging Initiative data, we found that the model with age is not significant (p = 0.1733) according to the likelihood ratio test, but the apolipoprotein E4 is a significant risk factor, and the examination of apolipoprotein E4-by-sex interaction suggests that the apolipoprotein E4 link to Alzheimer's disease is stronger in women. Given the estimated transition probabilities, the prediction accuracy is as high as 0.7849.


Subject(s)
Alzheimer Disease/diagnostic imaging , Markov Chains , Aged , Disease Progression , Female , Humans , Male , Neuroimaging , Risk Factors , Sex Factors
14.
Spat Spatiotemporal Epidemiol ; 24: 53-62, 2018 02.
Article in English | MEDLINE | ID: mdl-29413714

ABSTRACT

The purpose of this study is to identify regions with diabetes health-service shortage. American Diabetes Association (ADA)-accredited diabetes self-management education (DSME) is recommended for all those with diabetes. In this study, we focus on demographic patterns and geographic regionalization of the disease by including accessibility and availability of diabetes education resources as a critical component in understanding and confronting differences in diabetes prevalence, as well as addressing regional or sub-regional differences in awareness, treatment and control. We conducted an ecological county-level study utilizing publicly available secondary data on 3,109 counties in the continental U.S. We used a Bayesian spatial cluster model that enabled spatial heterogeneities across the continental U.S. to be addressed. We used the American Diabetes Association (ADA) website to identify 2012 DSME locations and national 2010 county-level diabetes rates estimated by the Centers for Disease Control and Prevention and identified regions with low DSME program availability relative to their diabetes rates and population density. Only 39.8% of the U.S. counties had at least one ADA-accredited DSME program location. Based on our 95% credible intervals, age-adjusted diabetes rates and DSME program locations were associated in only seven out of thirty five identified clusters. Out of these seven, only two clusters had a positive association. We identified clusters that were above the 75th percentile of average diabetes rates, but below the 25th percentile of average DSME location counts and found that these clusters were all located in the Southeast portion of the country. Overall, there was a lack of relationship between diabetes rates and DSME center locations in the U.S., suggesting resources could be more efficiently placed according to need. Clusters that were high in diabetes rates and low in DSME placements, all in the southeast, should particularly be considered for additional DSME programming.


Subject(s)
Diabetes Mellitus, Type 2/epidemiology , Health Education , Health Services Accessibility , Self-Management , Age Factors , Aged , Cluster Analysis , Diabetes Mellitus, Type 2/prevention & control , Female , Humans , Male , Spatio-Temporal Analysis , United States/epidemiology
15.
Stat Methods Med Res ; 27(4): 971-990, 2018 04.
Article in English | MEDLINE | ID: mdl-28034170

ABSTRACT

Accelerated failure time model is a popular model to analyze censored time-to-event data. Analysis of this model without assuming any parametric distribution for the model error is challenging, and the model complexity is enhanced in the presence of large number of covariates. We developed a nonparametric Bayesian method for regularized estimation of the regression parameters in a flexible accelerated failure time model. The novelties of our method lie in modeling the error distribution of the accelerated failure time nonparametrically, modeling the variance as a function of the mean, and adopting a variable selection technique in modeling the mean. The proposed method allowed for identifying a set of important regression parameters, estimating survival probabilities, and constructing credible intervals of the survival probabilities. We evaluated operating characteristics of the proposed method via simulation studies. Finally, we apply our new comprehensive method to analyze the motivating breast cancer data from the Surveillance, Epidemiology, and End Results Program, and estimate the five-year survival probabilities for women included in the Surveillance, Epidemiology, and End Results database who were diagnosed with breast cancer between 1990 and 2000.


Subject(s)
Bayes Theorem , Breast Neoplasms/diagnosis , Breast Neoplasms/pathology , Survival Analysis , Adult , Aged , Aged, 80 and over , Algorithms , Female , Humans , Middle Aged , Models, Statistical , Monte Carlo Method , Population Surveillance , Prognosis , SEER Program/statistics & numerical data , Young Adult
16.
Scand Stat Theory Appl ; 43(3): 886-903, 2016 Sep.
Article in English | MEDLINE | ID: mdl-27795610

ABSTRACT

Functional data analysis has become an important area of research due to its ability of handling high dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models, and in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area level data, and fit a varying coefficient linear mixed effect model where the varying coefficients are semi-parametrically modeled via B-splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors, and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies.

17.
Biometrics ; 72(4): 1164-1172, 2016 12.
Article in English | MEDLINE | ID: mdl-27061299

ABSTRACT

We consider the problem of selecting covariates in a spatial regression model when the response is binary. Penalized likelihood-based approach is proved to be effective for both variable selection and estimation simultaneously. In the context of a spatially dependent binary variable, an uniquely interpretable likelihood is not available, rather a quasi-likelihood might be more suitable. We develop a penalized quasi-likelihood with spatial dependence for simultaneous variable selection and parameter estimation along with an efficient computational algorithm. The theoretical properties including asymptotic normality and consistency are studied under increasing domain asymptotics framework. An extensive simulation study is conducted to validate the methodology. Real data examples are provided for illustration and applicability. Although theoretical justification has not been made, we also investigate empirical performance of the proposed penalized quasi-likelihood approach for spatial count data to explore suitability of this method to a general exponential family of distributions.


Subject(s)
Likelihood Functions , Models, Statistical , Spatial Regression , Algorithms , Biometry/methods , Computer Simulation , Fires/statistics & numerical data , Michigan
18.
PLoS One ; 6(5): e19640, 2011.
Article in English | MEDLINE | ID: mdl-21611181

ABSTRACT

Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.


Subject(s)
Arabidopsis Proteins/genetics , Arabidopsis/genetics , Cold Temperature , Oligonucleotide Array Sequence Analysis , Statistics as Topic/methods , Stress, Physiological/genetics , Trans-Activators/genetics , Arabidopsis Proteins/metabolism , Computer Simulation , Gene Expression Regulation, Plant , Genes, Plant/genetics , Regulatory Sequences, Nucleic Acid/genetics , Statistics, Nonparametric , Trans-Activators/metabolism
19.
Stat Med ; 30(4): 348-55, 2011 Feb 20.
Article in English | MEDLINE | ID: mdl-21225897

ABSTRACT

We employ a general bias preventive approach developed by Firth (Biometrika 1993; 80:27-38) to reduce the bias of an estimator of the log-odds ratio parameter in a matched case-control study by solving a modified score equation. We also propose a method to calculate the standard error of the resultant estimator. A closed-form expression for the estimator of the log-odds ratio parameter is derived in the case of a dichotomous exposure variable. Finite sample properties of the estimator are investigated via a simulation study. Finally, we apply the method to analyze a matched case-control data from a low birthweight study.


Subject(s)
Bias , Case-Control Studies , Effect Modifier, Epidemiologic , Logistic Models , Computer Simulation/statistics & numerical data , Humans , Infant, Low Birth Weight , Infant, Newborn
20.
Biometrics ; 66(2): 621-9, 2010 Jun.
Article in English | MEDLINE | ID: mdl-19522873

ABSTRACT

In a microarray experiment, one experimental design is used to obtain expression measures for all genes. One popular analysis method involves fitting the same linear mixed model for each gene, obtaining gene-specific p-values for tests of interest involving fixed effects, and then choosing a threshold for significance that is intended to control false discovery rate (FDR) at a desired level. When one or more random factors have zero variance components for some genes, the standard practice of fitting the same full linear mixed model for all genes can result in failure to control FDR. We propose a new method that combines results from the fit of full and selected linear mixed models to identify differentially expressed genes and provide FDR control at target levels when the true underlying random effects structure varies across genes.


Subject(s)
Artifacts , Linear Models , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Gene Expression Profiling/statistics & numerical data , Genes , Oligonucleotide Array Sequence Analysis/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...