Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
2.
Biometrics ; 72(4): 1246-1254, 2016 12.
Article in English | MEDLINE | ID: mdl-26954906

ABSTRACT

We introduce a new Bayesian nonparametric method for estimating the size of a closed population from multiple-recapture data. Our method, based on Dirichlet process mixtures, can accommodate complex patterns of heterogeneity of capture, and can transparently modulate its complexity without a separate model selection step. Additionally, it can handle the massively sparse contingency tables generated by large number of recaptures with moderate sample sizes. We develop an efficient and scalable MCMC algorithm for estimation. We apply our method to simulated data, and to two examples from the literature of estimation of casualties in armed conflicts.


Subject(s)
Bayes Theorem , Models, Statistical , Population Density , Statistics, Nonparametric , Algorithms , Armed Conflicts/statistics & numerical data , Computer Simulation , Humans , Mass Casualty Incidents/statistics & numerical data , Sample Size
3.
Multivariate Behav Res ; 50(4): 383-97, 2015.
Article in English | MEDLINE | ID: mdl-26257437

ABSTRACT

Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches.


Subject(s)
Behavioral Research/methods , Retrospective Studies , Statistics, Nonparametric , Adolescent , Adult , Child , Humans , Psychometrics/methods , Reproducibility of Results , Young Adult
4.
Ann Appl Stat ; 8(4): 2268-2291, 2014 Dec.
Article in English | MEDLINE | ID: mdl-26322146

ABSTRACT

We develop new methods for analyzing discrete multivariate longitudinal data and apply them to functional disability data on U.S. elderly population from the National Long Term Care Survey (NLTCS), 1982-2004. Our models build on a mixed membership framework, in which individuals are allowed multiple membership on a set of extreme profiles characterized by time-dependent trajectories of progression into disability. We also develop an extension that allows us to incorporate birth-cohort effects, in order to assess inter-generational changes. Applying these methods we find that most individuals follow trajectories that imply a late onset of disability, and that younger cohorts tend to develop disabilities at a later stage in life compared to their elders.

5.
J Am Stat Assoc ; 107(500): 1385-1394, 2012 Dec 01.
Article in English | MEDLINE | ID: mdl-25214699

ABSTRACT

Statistical agencies and other organizations that disseminate data are obligated to protect data subjects' confidentiality. For example, ill-intentioned individuals might link data subjects to records in other databases by matching on common characteristics (keys). Successful links are particularly problematic for data subjects with combinations of keys that are unique in the population. Hence, as part of their assessments of disclosure risks, many data stewards estimate the probabilities that sample uniques on sets of discrete keys are also population uniques on those keys. This is typically done using log-linear modeling on the keys. However, log-linear models can yield biased estimates of cell probabilities for sparse contingency tables with many zero counts, which often occurs in databases with many keys. This bias can result in unreliable estimates of probabilities of uniqueness and, hence, misrepresentations of disclosure risks. We propose an alternative to log-linear models for datasets with sparse keys based on a Bayesian version of grade of membership (GoM) models. We present a Bayesian GoM model for multinomial variables and offer an MCMC algorithm for fitting the model. We evaluate the approach by treating data from a recent US Census Bureau public use microdata sample as a population, taking simple random samples from that population, and benchmarking estimated probabilities of uniqueness against population values. Compared to log-linear models, GoM models provide more accurate estimates of the total number of uniques in the samples. Additionally, they offer record-level predictions of uniqueness that dominate those based on log-linear models.

6.
Biom J ; 50(6): 1051-63, 2008 Dec.
Article in English | MEDLINE | ID: mdl-19035548

ABSTRACT

We revisit the heterogeneous closed population multiple recapture problem, modeling individual-level heterogeneity using the Grade of Membership model (Woodbury et al., 1978). This strategy allows us to postulate the existence of homogeneous latent "ideal" or "pure" classes within the population, and construct a soft clustering of the individuals, where each one is allowed partial or mixed membership in all of these classes. We propose a full hierarchical Bayes specification and a MCMC algorithm to obtain samples from the posterior distribution. We apply the method to simulated data and to three real life examples.


Subject(s)
Bayes Theorem , Models, Biological , Models, Statistical , Population Density , Child , Computer Simulation , Congenital Abnormalities/epidemiology , Diabetes Mellitus/epidemiology , Homicide , Humans , Markov Chains , Monte Carlo Method
SELECTION OF CITATIONS
SEARCH DETAIL
...