Search | VHL Regional Portal

1.

Layer adaptive node selection in Bayesian neural networks: Statistical guarantees and implementation details.

Jantre, Sanket; Bhattacharya, Shrijita; Maiti, Tapabrata.

Neural Netw ; 167: 309-330, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37666188

ABSTRACT

Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies. Although several works have studied theoretical and numerical properties of sparse neural architectures, they have primarily focused on the edge selection. Sparsity through edge selection might be intuitively appealing; however, it does not necessarily reduce the structural complexity of a network. Instead pruning excessive nodes leads to a structurally sparse network with significant computational speedup during inference. To this end, we propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for automatic node selection during training. The use of spike-and-slab prior alleviates the need of an ad-hoc thresholding rule for pruning. In addition, we adopt a variational Bayes approach to circumvent the computational challenges of traditional Markov Chain Monte Carlo (MCMC) implementation. In the context of node selection, we establish the fundamental result of variational posterior consistency together with the characterization of prior parameters. In contrast to the previous works, our theoretical development relaxes the assumptions of the equal number of nodes and uniform bounds on all network weights, thereby accommodating sparse networks with layer-dependent node structures or coefficient bounds. With a layer-wise characterization of prior inclusion probabilities, we discuss the optimal contraction rates of the variational posterior. We empirically demonstrate that our proposed approach outperforms the edge selection method in computational complexity with similar or better predictive performance. Our experimental evidence further substantiates that our theoretical work facilitates layer-wise optimal node recovery.

Subject(s)

Algorithms , Neural Networks, Computer , Bayes Theorem , Markov Chains , Monte Carlo Method

2.

Kernelized multiview signed graph learning for single-cell RNA sequencing data.

Karaaslanli, Abdullah; Saha, Satabdi; Maiti, Tapabrata; Aviyente, Selin.

BMC Bioinformatics ; 24(1): 127, 2023 Apr 04.

Article in English | MEDLINE | ID: mdl-37016281

ABSTRACT

BACKGROUND: Characterizing the topology of gene regulatory networks (GRNs) is a fundamental problem in systems biology. The advent of single cell technologies has made it possible to construct GRNs at finer resolutions than bulk and microarray datasets. However, cellular heterogeneity and sparsity of the single cell datasets render void the application of regular Gaussian assumptions for constructing GRNs. Additionally, most GRN reconstruction approaches estimate a single network for the entire data. This could cause potential loss of information when single cell datasets are generated from multiple treatment conditions/disease states. RESULTS: To better characterize single cell GRNs under different but related conditions, we propose the joint estimation of multiple networks using multiple signed graph learning (scMSGL). The proposed method is based on recently developed graph signal processing (GSP) based graph learning, where GRNs and gene expressions are modeled as signed graphs and graph signals, respectively. scMSGL learns multiple GRNs by optimizing the total variation of gene expressions with respect to GRNs while ensuring that the learned GRNs are similar to each other through regularization with respect to a learned signed consensus graph. We further kernelize scMSGL with the kernel selected to suit the structure of single cell data. CONCLUSIONS: scMSGL is shown to have superior performance over existing state of the art methods in GRN recovery on simulated datasets. Furthermore, scMSGL successfully identifies well-established regulators in a mouse embryonic stem cell differentiation study and a cancer clinical study of medulloblastoma.

Subject(s)

Gene Regulatory Networks , Neoplasms , Animals , Mice , Systems Biology , Sequence Analysis, RNA , Algorithms

3.

Single-cell transcriptomics shows dose-dependent disruption of hepatic zonation by TCDD in mice.

Nault, Rance; Saha, Satabdi; Bhattacharya, Sudin; Sinha, Samiran; Maiti, Tapabrata; Zacharewski, Tim.

Toxicol Sci ; 191(1): 135-148, 2023 01 31.

Article in English | MEDLINE | ID: mdl-36222588

ABSTRACT

2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) dose-dependently induces the development of hepatic fat accumulation and inflammation with fibrosis in mice initially in the portal region. Conversely, differential gene and protein expression is first detected in the central region. To further investigate cell-specific and spatially resolved dose-dependent changes in gene expression elicited by TCDD, single-nuclei RNA sequencing and spatial transcriptomics were used for livers of male mice gavaged with TCDD every 4 days for 28 days. The proportion of 11 cell (sub)types across 131 613 nuclei dose-dependently changed with 68% of all portal and central hepatocyte nuclei in control mice being overtaken by macrophages following TCDD treatment. We identified 368 (portal fibroblasts) to 1339 (macrophages) differentially expressed genes. Spatial analyses revealed initial loss of portal identity that eventually spanned the entire liver lobule with increasing dose. Induction of R-spondin 3 (Rspo3) and pericentral Apc, suggested dysregulation of the Wnt/ß-catenin signaling cascade in zonally resolved steatosis. Collectively, the integrated results suggest disruption of zonation contributes to the pattern of TCDD-elicited NAFLD pathologies.

Subject(s)

Non-alcoholic Fatty Liver Disease , Polychlorinated Dibenzodioxins , Mice , Male , Animals , Polychlorinated Dibenzodioxins/toxicity , Transcriptome , Liver/metabolism , Non-alcoholic Fatty Liver Disease/metabolism , Gene Expression Profiling

4.

Variational Bayes Ensemble Learning Neural Networks With Compressed Feature Space.

Liu, Zihuan; Bhattacharya, Shrijita; Maiti, Tapabrata.

IEEE Trans Neural Netw Learn Syst ; PP2022 May 18.

Article in English | MEDLINE | ID: mdl-35584070

ABSTRACT

We consider the problem of nonparametric classification from a high-dimensional input vector (small n large p problem). To handle the high-dimensional feature space, we propose a random projection (RP) of the feature space followed by training of a neural network (NN) on the compressed feature space. Unlike regularization techniques (lasso, ridge, etc.), which train on the full data, NNs based on compressed feature space have significantly lower computation complexity and memory storage requirements. Nonetheless, a random compression-based method is often sensitive to the choice of compression. To address this issue, we adopt a Bayesian model averaging (BMA) approach and leverage the posterior model weights to determine: 1) uncertainty under each compression and 2) intrinsic dimensionality of the feature space (the effective dimension of feature space useful for prediction). The final prediction is improved by averaging models with projected dimensions close to the intrinsic dimensionality. Furthermore, we propose a variational approach to the afore-mentioned BMA to allow for simultaneous estimation of both model weights and model-specific parameters. Since the proposed variational solution is parallelizable across compressions, it preserves the computational gain of frequentist ensemble techniques while providing the full uncertainty quantification of a Bayesian approach. We establish the asymptotic consistency of the proposed algorithm under the suitable characterization of the RPs and the prior parameters. Finally, we provide extensive numerical examples for empirical validation of the proposed method.

5.

scSGL: kernelized signed graph learning for single-cell gene regulatory network inference.

Karaaslanli, Abdullah; Saha, Satabdi; Aviyente, Selin; Maiti, Tapabrata.

Bioinformatics ; 38(11): 3011-3019, 2022 05 26.

Article in English | MEDLINE | ID: mdl-35451460

ABSTRACT

MOTIVATION: Elucidating the topology of gene regulatory networks (GRNs) from large single-cell RNA sequencing datasets, while effectively capturing its inherent cell-cycle heterogeneity and dropouts, is currently one of the most pressing problems in computational systems biology. Recently, graph learning (GL) approaches based on graph signal processing have been developed to infer graph topology from signals defined on graphs. However, existing GL methods are not suitable for learning signed graphs, a characteristic feature of GRNs, which are capable of accounting for both activating and inhibitory relationships in the gene network. They are also incapable of handling high proportion of zero values present in the single cell datasets. RESULTS: To this end, we propose a novel signed GL approach, scSGL, that learns GRNs based on the assumption of smoothness and non-smoothness of gene expressions over activating and inhibitory edges, respectively. scSGL is then extended with kernels to account for non-linearity of co-expression and for effective handling of highly occurring zero values. The proposed approach is formulated as a non-convex optimization problem and solved using an efficient ADMM framework. Performance assessment using simulated datasets demonstrates the superior performance of kernelized scSGL over existing state of the art methods in GRN recovery. The performance of scSGL is further investigated using human and mouse embryonic datasets. AVAILABILITY AND IMPLEMENTATION: The scSGL code and analysis scripts are available on https://github.com/Single-Cell-Graph-Learning/scSGL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Gene Regulatory Networks , Animals , Humans , Mice , Systems Biology

6.

Benchmarking of a Bayesian single cell RNAseq differential gene expression test for dose-response study designs.

Nault, Rance; Saha, Satabdi; Bhattacharya, Sudin; Dodson, Jack; Sinha, Samiran; Maiti, Tapabrata; Zacharewski, Tim.

Nucleic Acids Res ; 50(8): e48, 2022 05 06.

Article in English | MEDLINE | ID: mdl-35061903

ABSTRACT

The application of single-cell RNA sequencing (scRNAseq) for the evaluation of chemicals, drugs, and food contaminants presents the opportunity to consider cellular heterogeneity in pharmacological and toxicological responses. Current differential gene expression analysis (DGEA) methods focus primarily on two group comparisons, not multi-group dose-response study designs used in safety assessments. To benchmark DGEA methods for dose-response scRNAseq experiments, we proposed a multiplicity corrected Bayesian testing approach and compare it against 8 other methods including two frequentist fit-for-purpose tests using simulated and experimental data. Our Bayesian test method outperformed all other tests for a broad range of accuracy metrics including control of false positive error rates. Most notable, the fit-for-purpose and standard multiple group DGEA methods were superior to the two group scRNAseq methods for dose-response study designs. Collectively, our benchmarking of DGEA methods demonstrates the importance in considering study design when determining the most appropriate test methods.

Subject(s)

Benchmarking , Research Design , Bayes Theorem , Gene Expression

7.

Light potentials of photosynthetic energy storage in the field: what limits the ability to use or dissipate rapidly increased light energy?

Kanazawa, Atsuko; Chattopadhyay, Abhijnan; Kuhlgert, Sebastian; Tuitupou, Hainite; Maiti, Tapabrata; Kramer, David M.

R Soc Open Sci ; 8(12): 211102, 2021 Dec.

Article in English | MEDLINE | ID: mdl-34925868

ABSTRACT

The responses of plant photosynthesis to rapid fluctuations in environmental conditions are critical for efficient conversion of light energy. These responses are not well-seen laboratory conditions and are difficult to probe in field environments. We demonstrate an open science approach to this problem that combines multifaceted measurements of photosynthesis and environmental conditions, and an unsupervised statistical clustering approach. In a selected set of data on mint (Mentha sp.), we show that 'light potentials' for linear electron flow and non-photochemical quenching (NPQ) upon rapid light increases are strongly suppressed in leaves previously exposed to low ambient photosynthetically active radiation (PAR) or low leaf temperatures, factors that can act both independently and cooperatively. Further analyses allowed us to test specific mechanisms. With decreasing leaf temperature or PAR, limitations to photosynthesis during high light fluctuations shifted from rapidly induced NPQ to photosynthetic control of electron flow at the cytochrome b6f complex. At low temperatures, high light induced lumen acidification, but did not induce NPQ, leading to accumulation of reduced electron transfer intermediates, probably inducing photodamage, revealing a potential target for improving the efficiency and robustness of photosynthesis. We discuss the implications of the approach for open science efforts to understand and improve crop productivity.

8.

A Bayesian group lasso classification for ADNI volumetrics data.

Majumder, Atreyee; Maiti, Tapabrata; Datta, Subha.

Stat Methods Med Res ; 30(10): 2207-2220, 2021 10.

Article in English | MEDLINE | ID: mdl-34460337

ABSTRACT

The primary objective of this paper is to develop a statistically valid classification procedure for analyzing brain image volumetrics data obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) in elderly subjects with cognitive impairments. The Bayesian group lasso method thereby proposed for logistic regression efficiently selects an optimal model with the use of a spike and slab type prior. This method selects groups of attributes of a brain subregion encouraged by the group lasso penalty. We conduct simulation studies for high- and low-dimensional scenarios where our method is always able to select the true parameters that are truly predictive among a large number of parameters. The method is then applied on dichotomous response ADNI data which selects predictive atrophied brain regions and classifies Alzheimer's disease patients from healthy controls. Our analysis is able to give an accuracy rate of 80% for classifying Alzheimer's disease. The suggested method selects 29 brain subregions. The medical literature indicates that all these regions are associated with Alzheimer's patients. The Bayesian method of model selection further helps selecting only the subregions that are statistically significant, thus obtaining an optimal model.

Subject(s)

Alzheimer Disease , Cognitive Dysfunction , Aged , Alzheimer Disease/diagnostic imaging , Bayes Theorem , Brain/diagnostic imaging , Cognitive Dysfunction/diagnostic imaging , Humans , Magnetic Resonance Imaging , Neuroimaging

9.

A Role for Prior Knowledge in Statistical Classification of the Transition from Mild Cognitive Impairment to Alzheimer's Disease.

Liu, Zihuan; Maiti, Tapabrata; Bender, Andrew R.

J Alzheimers Dis ; 83(4): 1859-1875, 2021.

Article in English | MEDLINE | ID: mdl-34459391

ABSTRACT

BACKGROUND: The transition from mild cognitive impairment (MCI) to dementia is of great interest to clinical research on Alzheimer's disease and related dementias. This phenomenon also serves as a valuable data source for quantitative methodological researchers developing new approaches for classification. However, the growth of machine learning (ML) approaches for classification may falsely lead many clinical researchers to underestimate the value of logistic regression (LR), which often demonstrates classification accuracy equivalent or superior to other ML methods. Further, when faced with many potential features that could be used for classifying the transition, clinical researchers are often unaware of the relative value of different approaches for variable selection. OBJECTIVE: The present study sought to compare different methods for statistical classification and for automated and theoretically guided feature selection techniques in the context of predicting conversion from MCI to dementia. METHODS: We used data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) to evaluate different influences of automated feature preselection on LR and support vector machine (SVM) classification methods, in classifying conversion from MCI to dementia. RESULTS: The present findings demonstrate how similar performance can be achieved using user-guided, clinically informed pre-selection versus algorithmic feature selection techniques. CONCLUSION: These results show that although SVM and other ML techniques are capable of relatively accurate classification, similar or higher accuracy can often be achieved by LR, mitigating SVM's necessity or value for many clinical researchers.

Subject(s)

Alzheimer Disease/classification , Cognitive Dysfunction/classification , Machine Learning , Aged , Brain/pathology , Female , Humans , Magnetic Resonance Imaging , Male , Support Vector Machine

10.

Statistical foundation of Variational Bayes neural networks.

Bhattacharya, Shrijita; Maiti, Tapabrata.

Neural Netw ; 137: 151-173, 2021 May.

Article in English | MEDLINE | ID: mdl-33607444

ABSTRACT

Despite the popularism of Bayesian neural networks (BNNs) in recent years, its use is somewhat limited in complex and big data situations due to the computational cost associated with full posterior evaluations. Variational Bayes (VB) provides a useful alternative to circumvent the computational cost and time complexity associated with the generation of samples from the true posterior using Markov Chain Monte Carlo (MCMC) techniques. The efficacy of the VB methods is well established in machine learning literature. However, its potential broader impact is hindered due to a lack of theoretical validity from a statistical perspective. In this paper, we establish the fundamental result of posterior consistency for the mean-field variational posterior (VP) for a feed-forward artificial neural network model. The paper underlines the conditions needed to guarantee that the VP concentrates around Hellinger neighborhoods of the true density function. Additionally, the role of the scale parameter and its influence on the convergence rates has also been discussed. The paper mainly relies on two results (1) the rate at which the true posterior grows (2) the rate at which the Kullback-Leibler (KL) distance between the posterior and variational posterior grows. The theory provides a guideline for building prior distributions for BNNs along with an assessment of accuracy of the corresponding VB implementation.

Subject(s)

Machine Learning , Bayes Theorem , Markov Chains , Monte Carlo Method

11.

Nurse-Reported Bullying and Documented Adverse Patient Events: An Exploratory Study in a US Hospital.

Arnetz, Judith E; Neufcourt, Leo; Sudan, Sukhesh; Arnetz, Bengt B; Maiti, Tapabrata; Viens, Frederi.

J Nurs Care Qual ; 35(3): 206-212, 2020.

Article in English | MEDLINE | ID: mdl-32433142

ABSTRACT

BACKGROUND: Negative nurse work environments have been associated with nurse bullying and poor nurse health. However, few studies have examined the influence of nurse bullying on actual patient outcomes. PURPOSE: The purpose of the study was to examine the association between nurse-reported bullying and documented nursing-sensitive patient outcomes. METHODS: Nurses (n = 432) in a large US hospital responded to a survey on workplace bullying. Unit-level data for 5 adverse patient events and nurse staffing were acquired from the National Database of Nursing Quality Indicators. Generalized linear models were used to examine the association between bullying and adverse patient events. A Bayesian regression analysis was used to confirm the findings. RESULTS: After controlling for nurse staffing and qualification, nurse-reported bullying was significantly associated with the incidence of central-line-associated bloodstream infections (P < .001). CONCLUSIONS: Interventions to address bullying, a malleable aspect of the nurse practice environment, may help to reduce adverse patient events.

Subject(s)

Bullying/statistics & numerical data , Catheterization, Central Venous/adverse effects , Hospitals , Incidence , Nursing Staff, Hospital , Workplace , Adult , Catheter-Related Infections/complications , Cross-Sectional Studies , Female , Humans , Inpatients/statistics & numerical data , Nursing Staff, Hospital/psychology , Nursing Staff, Hospital/statistics & numerical data , Retrospective Studies , Surveys and Questionnaires , United States

12.

Patient-Specific Prediction of Abdominal Aortic Aneurysm Expansion Using Bayesian Calibration.

Zhang, Liangliang; Jiang, Zhenxiang; Choi, Jongeun; Lim, Chae Young; Maiti, Tapabrata; Baek, Seungik.

IEEE J Biomed Health Inform ; 23(6): 2537-2550, 2019 11.

Article in English | MEDLINE | ID: mdl-30714936

ABSTRACT

Translating recent advances in abdominal aortic aneurysm (AAA) growth and remodeling (G&R) knowledge into a predictive, patient-specific clinical treatment tool requires a major paradigm shift in computational modeling. The objectives of this study are to develop a prediction framework that first calibrates the physical AAA G&R model using patient-specific serial computed tomography (CT) scan images, predicts the expansion of an AAA in the future, and quantifies the associated uncertainty in the prediction. We adopt a Bayesian calibration method to calibrate parameters in the G&R computational model and predict the magnitude of AAA expansion. The proposed Bayesian approach can take different sources of uncertainty; therefore, it is well suited to achieve our aims in predicting the AAA expansion process as well as in computing the propagated uncertainty. We demonstrate how to achieve the proposed aims by solving the formulated Bayesian calibration problems for cases with the synthetic G&R model output data and real medical patient-specific CT data. We compare and discuss the performance of predictions and computation time under different sampling cases of the model output data and patient data, both of which are simulated by the G&R computation. Furthermore, we apply our Bayesian calibration to real patient-specific serial CT data and validate our prediction. The accuracy and efficiency of the proposed method is promising, which appeals to computational and medical communities.

Subject(s)

Aortic Aneurysm, Abdominal/diagnostic imaging , Aortic Aneurysm, Abdominal/pathology , Image Interpretation, Computer-Assisted/methods , Patient-Specific Modeling , Bayes Theorem , Computer Simulation , Disease Progression , Humans , Tomography, X-Ray Computed

13.

Analysis of conversion of Alzheimer's disease using a multi-state Markov model.

Zhang, Liangliang; Lim, Chae Young; Maiti, Tapabrata; Li, Yingjie; Choi, Jongeun; Bozoki, Andrea; Zhu, David C.

Stat Methods Med Res ; 28(9): 2801-2819, 2019 09.

Article in English | MEDLINE | ID: mdl-30039745

ABSTRACT

With rapid aging of world population, Alzheimer's disease is becoming a leading cause of death after cardiovascular disease and cancer. Nearly 10% of people who are over 65 years old are affected by Alzheimer's disease. The causes have been studied intensively, but no definitive answer has been found. Genetic predisposition, abnormal protein deposits in brain, and environmental factors are suspected to play a role in the development of this disease. In this paper, we model progression of Alzheimer's disease using a multi-state Markov model to investigate the significance of known risk factors such as age, apolipoprotein E4, and some brain structural volumetric variables from magnetic resonance imaging scans (e.g., hippocampus, etc.) while predicting transitions between different clinical diagnosis states. With the Alzheimer's Disease Neuroimaging Initiative data, we found that the model with age is not significant (p = 0.1733) according to the likelihood ratio test, but the apolipoprotein E4 is a significant risk factor, and the examination of apolipoprotein E4-by-sex interaction suggests that the apolipoprotein E4 link to Alzheimer's disease is stronger in women. Given the estimated transition probabilities, the prediction accuracy is as high as 0.7849.

Subject(s)

Alzheimer Disease/diagnostic imaging , Markov Chains , Aged , Disease Progression , Female , Humans , Male , Neuroimaging , Risk Factors , Sex Factors

14.

Assessing the association of diabetes self-management education centers with age-adjusted diabetes rates across U.S.: Aspatial cluster analysis approach.

Paul, Rajib; Lim, Chae Young; Curtis, Amy B; Maiti, Tapabrata; Baker, Kathleen M; Mantilla, Libertie B; MacQuillan, Elizabeth L.

Spat Spatiotemporal Epidemiol ; 24: 53-62, 2018 02.

Article in English | MEDLINE | ID: mdl-29413714

ABSTRACT

The purpose of this study is to identify regions with diabetes health-service shortage. American Diabetes Association (ADA)-accredited diabetes self-management education (DSME) is recommended for all those with diabetes. In this study, we focus on demographic patterns and geographic regionalization of the disease by including accessibility and availability of diabetes education resources as a critical component in understanding and confronting differences in diabetes prevalence, as well as addressing regional or sub-regional differences in awareness, treatment and control. We conducted an ecological county-level study utilizing publicly available secondary data on 3,109 counties in the continental U.S. We used a Bayesian spatial cluster model that enabled spatial heterogeneities across the continental U.S. to be addressed. We used the American Diabetes Association (ADA) website to identify 2012 DSME locations and national 2010 county-level diabetes rates estimated by the Centers for Disease Control and Prevention and identified regions with low DSME program availability relative to their diabetes rates and population density. Only 39.8% of the U.S. counties had at least one ADA-accredited DSME program location. Based on our 95% credible intervals, age-adjusted diabetes rates and DSME program locations were associated in only seven out of thirty five identified clusters. Out of these seven, only two clusters had a positive association. We identified clusters that were above the 75th percentile of average diabetes rates, but below the 25th percentile of average DSME location counts and found that these clusters were all located in the Southeast portion of the country. Overall, there was a lack of relationship between diabetes rates and DSME center locations in the U.S., suggesting resources could be more efficiently placed according to need. Clusters that were high in diabetes rates and low in DSME placements, all in the southeast, should particularly be considered for additional DSME programming.

Subject(s)

Diabetes Mellitus, Type 2/epidemiology , Health Education , Health Services Accessibility , Self-Management , Age Factors , Aged , Cluster Analysis , Diabetes Mellitus, Type 2/prevention & control , Female , Humans , Male , Spatio-Temporal Analysis , United States/epidemiology

15.

Bayesian variable selection in the accelerated failure time model with an application to the surveillance, epidemiology, and end results breast cancer data.

Zhang, Zhen; Sinha, Samiran; Maiti, Tapabrata; Shipp, Eva.

Stat Methods Med Res ; 27(4): 971-990, 2018 04.

Article in English | MEDLINE | ID: mdl-28034170

ABSTRACT

Accelerated failure time model is a popular model to analyze censored time-to-event data. Analysis of this model without assuming any parametric distribution for the model error is challenging, and the model complexity is enhanced in the presence of large number of covariates. We developed a nonparametric Bayesian method for regularized estimation of the regression parameters in a flexible accelerated failure time model. The novelties of our method lie in modeling the error distribution of the accelerated failure time nonparametrically, modeling the variance as a function of the mean, and adopting a variable selection technique in modeling the mean. The proposed method allowed for identifying a set of important regression parameters, estimating survival probabilities, and constructing credible intervals of the survival probabilities. We evaluated operating characteristics of the proposed method via simulation studies. Finally, we apply our new comprehensive method to analyze the motivating breast cancer data from the Surveillance, Epidemiology, and End Results Program, and estimate the five-year survival probabilities for women included in the Surveillance, Epidemiology, and End Results database who were diagnosed with breast cancer between 1990 and 2000.

Subject(s)

Bayes Theorem , Breast Neoplasms/diagnosis , Breast Neoplasms/pathology , Survival Analysis , Adult , Aged , Aged, 80 and over , Algorithms , Female , Humans , Middle Aged , Models, Statistical , Monte Carlo Method , Population Surveillance , Prognosis , SEER Program/statistics & numerical data , Young Adult

16.

Functional Mixed Effects Model for Small Area Estimation.

Maiti, Tapabrata; Sinha, Samiran; Zhong, Ping-Shou.

Scand Stat Theory Appl ; 43(3): 886-903, 2016 Sep.

Article in English | MEDLINE | ID: mdl-27795610

ABSTRACT

Functional data analysis has become an important area of research due to its ability of handling high dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models, and in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area level data, and fit a varying coefficient linear mixed effect model where the varying coefficients are semi-parametrically modeled via B-splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors, and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies.

17.

Variable selection for binary spatial regression: Penalized quasi-likelihood approach.

Feng, Wenning; Sarkar, Abdhi; Lim, Chae Young; Maiti, Tapabrata.

Biometrics ; 72(4): 1164-1172, 2016 12.

Article in English | MEDLINE | ID: mdl-27061299

ABSTRACT

We consider the problem of selecting covariates in a spatial regression model when the response is binary. Penalized likelihood-based approach is proved to be effective for both variable selection and estimation simultaneously. In the context of a spatially dependent binary variable, an uniquely interpretable likelihood is not available, rather a quasi-likelihood might be more suitable. We develop a penalized quasi-likelihood with spatial dependence for simultaneous variable selection and parameter estimation along with an efficient computational algorithm. The theoretical properties including asymptotic normality and consistency are studied under increasing domain asymptotics framework. An extensive simulation study is conducted to validate the methodology. Real data examples are provided for illustration and applicability. Although theoretical justification has not been made, we also investigate empirical performance of the proposed penalized quasi-likelihood approach for spatial count data to explore suitability of this method to a general exponential family of distributions.

Subject(s)

Likelihood Functions , Models, Statistical , Spatial Regression , Algorithms , Biometry/methods , Computer Simulation , Fires/statistics & numerical data , Michigan

18.

A nonparametric mean-variance smoothing method to assess Arabidopsis cold stress transcriptional regulator CBF2 overexpression microarray data.

Hu, Pingsha; Maiti, Tapabrata.

PLoS One ; 6(5): e19640, 2011.

Article in English | MEDLINE | ID: mdl-21611181

ABSTRACT

Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.

Subject(s)

Arabidopsis Proteins/genetics , Arabidopsis/genetics , Cold Temperature , Oligonucleotide Array Sequence Analysis , Statistics as Topic/methods , Stress, Physiological/genetics , Trans-Activators/genetics , Arabidopsis Proteins/metabolism , Computer Simulation , Gene Expression Regulation, Plant , Genes, Plant/genetics , Regulatory Sequences, Nucleic Acid/genetics , Statistics, Nonparametric , Trans-Activators/metabolism

19.

Bias reduction in conditional logistic regression.

Sun, Jenny X; Sinha, Samiran; Wang, Suojin; Maiti, Tapabrata.

Stat Med ; 30(4): 348-55, 2011 Feb 20.

Article in English | MEDLINE | ID: mdl-21225897

ABSTRACT

We employ a general bias preventive approach developed by Firth (Biometrika 1993; 80:27-38) to reduce the bias of an estimator of the log-odds ratio parameter in a matched case-control study by solving a modified score equation. We also propose a method to calculate the standard error of the resultant estimator. A closed-form expression for the estimator of the log-odds ratio parameter is derived in the case of a dichotomous exposure variable. Finite sample properties of the estimator are investigated via a simulation study. Finally, we apply the method to analyze a matched case-control data from a low birthweight study.

Subject(s)

Bias , Case-Control Studies , Effect Modifier, Epidemiologic , Logistic Models , Computer Simulation/statistics & numerical data , Humans , Infant, Low Birth Weight , Infant, Newborn

20.

Linear mixed model selection for false discovery rate control in microarray data analysis.

Demirkale, Cumhur Yusuf; Nettleton, Dan; Maiti, Tapabrata.

Biometrics ; 66(2): 621-9, 2010 Jun.

Article in English | MEDLINE | ID: mdl-19522873

ABSTRACT

In a microarray experiment, one experimental design is used to obtain expression measures for all genes. One popular analysis method involves fitting the same linear mixed model for each gene, obtaining gene-specific p-values for tests of interest involving fixed effects, and then choosing a threshold for significance that is intended to control false discovery rate (FDR) at a desired level. When one or more random factors have zero variance components for some genes, the standard practice of fitting the same full linear mixed model for all genes can result in failure to control FDR. We propose a new method that combines results from the fit of full and selected linear mixed models to identify differentially expressed genes and provide FDR control at target levels when the true underlying random effects structure varies across genes.

Subject(s)

Artifacts , Linear Models , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Gene Expression Profiling/statistics & numerical data , Genes , Oligonucleotide Array Sequence Analysis/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL