Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 34
Filter
2.
Proc Natl Acad Sci U S A ; 120(43): e2220558120, 2023 Oct 24.
Article in English | MEDLINE | ID: mdl-37831744

ABSTRACT

The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. We argue that any proposal for quantifying disclosure risk should be based on prespecified, objective criteria. We illustrate this approach to evaluate the absolute disclosure risk framework, the counterfactual framework underlying differential privacy, and prior-to-posterior comparisons. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most while absolute disclosure risk satisfies the fewest. Furthermore, we explain that many of the criticisms levied against differential privacy would be levied against any technology that is not equivalent to direct, unrestricted access to confidential data. More research is needed, but in the near term, the counterfactual approach appears best-suited for privacy versus utility analysis.


Subject(s)
Confidentiality , Disclosure , Privacy , Risk Assessment , Censuses
3.
J R Stat Soc Ser A Stat Soc ; 184(2): 643-662, 2021 Apr.
Article in English | MEDLINE | ID: mdl-36254262

ABSTRACT

Often, government agencies and survey organizations know the population counts or percentages for some of the variables in a survey. These may be available from auxiliary sources, for example, administrative databases or other high quality surveys. We present and illustrate a model-based framework for leveraging such auxiliary marginal information when handling unit and item nonresponse. We show how one can use the margins to specify different missingness mechanisms for each type of nonresponse. We use the framework to impute missing values in voter turnout in a subset of data from the U.S. Current Population Survey (CPS). In doing so, we examine the sensitivity of results to different assumptions about the unit and item nonresponse.

4.
J Palliat Med ; 23(1): 90-96, 2020 01.
Article in English | MEDLINE | ID: mdl-31424316

ABSTRACT

Background: Hospital referral regions (HRRs) are often used to characterize inpatient referral patterns, but it is unknown how well these geographic regions are aligned with variation in Medicare-financed hospice care, which is largely provided at home. Objective: Our objective was to characterize the variability in hospice use rates among elderly Medicare decedents by HRR and county. Methods: Using 2014 Master Beneficiary File for decedents 65 and older from North and South Carolina, we applied Bayesian mixed models to quantify variation in hospice use rates explained by HRR fixed effects, county random effects, and residual error among Medicare decedents. Results: We found HRRs and county indicators are significant predictors of hospice use in NC and SC; however, the relative variation within HRRs and associated residual variation is substantial. On average, HRR fixed effects explained more variation in hospice use rates than county indicators with a standard deviation (SD) of 10.0 versus 5.1 percentage points. The SD of the residual error is 5.7 percentage points. On average, variation within HRRs is about half the variation between regions (52%). Conclusions: The magnitude of unexplained residual variation in hospice use for NC and SC suggests that novel, end-of-life-specific service areas should be developed and tested to better capture geographic differences and inform research, health systems, and policy.


Subject(s)
Hospice Care , Terminal Care , Aged , Bayes Theorem , Humans , Medicare , Referral and Consultation , South Carolina , United States
5.
Stat Med ; 37(24): 3533-3546, 2018 10 30.
Article in English | MEDLINE | ID: mdl-30069901

ABSTRACT

We develop methodology for causal inference in observational studies when using propensity score subclassification on data constructed with probabilistic record linkage techniques. We focus on scenarios where covariates and binary treatment assignments are in one file and outcomes are in another file, and the goal is to estimate an additive treatment effect by merging the files. We assume that the files can be linked using variables common to both files, eg, names or birth dates, but that links are subject to errors, eg, due to reporting errors in the linking variables. We develop methodology for cases where such reporting errors are independent of the other variables on the files. We describe conceptually how linkage errors can affect causal estimates in subclassification contexts. We also present and evaluate several algorithms for deciding which record pairs to use in estimation of causal effects. Using simulation studies, we demonstrate that case selection procedures can result in improved accuracy in estimates of treatment effects from linked data compared to using only cases known to be true links.


Subject(s)
Medical Record Linkage , Propensity Score , Algorithms , Biostatistics , Causality , Computer Simulation , Data Interpretation, Statistical , Humans , Observational Studies as Topic/statistics & numerical data
6.
J Palliat Med ; 21(8): 1131-1136, 2018 08.
Article in English | MEDLINE | ID: mdl-29762075

ABSTRACT

BACKGROUND: Use of the Medicare hospice benefit has been associated with high-quality care at the end of life, and hospice length of use in particular has been used as a proxy for appropriate timing of hospice enrollment. Quantile regression has been underutilized as an alternative tool to model distributional changes in hospice length of use and hospice payments outside of the mean. OBJECTIVE: To test for heterogeneity in the relationship between patient characteristics and hospice outcomes across the distribution of hospice days. SETTING: Medicare Beneficiary Summary File and survey data (2014) for hospice beneficiaries in North and South Carolina with common terminal diagnoses. MEASUREMENTS: Distributional shifts associated with patient characteristics were evaluated at the 25th and 75th percentiles of hospice days and hospice payments using quantile regressions and compared to the mean shift estimated by ordinary least squares (OLS) regression. PRINCIPAL FINDINGS: Significant (p < 0.001) heterogeneity in the marginal effects on hospice days and costs was observed, with patient characteristics associated with generally larger shifts in the 75th percentile than the 25th percentile. Mean effects estimated by OLS regression overestimate the magnitude of the median marginal effects for all patient characteristics except for race. Results for hospice payments in 2014 were similar. CONCLUSIONS: Methodological decisions can have a meaningful impact in the evaluation of factors influencing hospice length of use or cost.


Subject(s)
Hospice Care/economics , Hospice Care/statistics & numerical data , Length of Stay/economics , Length of Stay/statistics & numerical data , Medicare/economics , Medicare/statistics & numerical data , Aged , Aged, 80 and over , Female , Forecasting , Humans , Male , North Carolina , Regression Analysis , Retrospective Studies , South Carolina , United States
7.
Sci Rep ; 8(1): 116, 2018 01 08.
Article in English | MEDLINE | ID: mdl-29311675

ABSTRACT

Baseball players must be able to see and react in an instant, yet it is hotly debated whether superior performance is associated with superior sensorimotor abilities. In this study, we compare sensorimotor abilities, measured through 8 psychomotor tasks comprising the Nike Sensory Station assessment battery, and game statistics in a sample of 252 professional baseball players to evaluate the links between sensorimotor skills and on-field performance. For this purpose, we develop a series of Bayesian hierarchical latent variable models enabling us to compare statistics across professional baseball leagues. Within this framework, we find that sensorimotor abilities are significant predictors of on-base percentage, walk rate and strikeout rate, accounting for age, position, and league. We find no such relationship for either slugging percentage or fielder-independent pitching. The pattern of results suggests performance contributions from both visual-sensory and visual-motor abilities and indicates that sensorimotor screenings may be useful for player scouting.


Subject(s)
Athletic Performance , Baseball , Psychomotor Performance , Adolescent , Adult , Algorithms , Humans , Models, Theoretical , Young Adult
8.
J Sports Sci ; 36(2): 171-179, 2018 Jan.
Article in English | MEDLINE | ID: mdl-28282749

ABSTRACT

This study aimed to evaluate the possibility that differences in sensorimotor abilities exist between hitters and pitchers in a large cohort of baseball players of varying levels of experience. Secondary data analysis was performed on 9 sensorimotor tasks comprising the Nike Sensory Station assessment battery. Bayesian hierarchical regression modelling was applied to test for differences between pitchers and hitters in data from 566 baseball players (112 high school, 85 college, 369 professional) collected at 20 testing centres. Explanatory variables including height, handedness, eye dominance, concussion history, and player position were modelled along with age curves using basis regression splines. Regression analyses revealed better performance for hitters relative to pitchers at the professional level in the visual clarity and depth perception tasks, but these differences did not exist at the high school or college levels. No significant differences were observed in the other 7 measures of sensorimotor capabilities included in the test battery, and no systematic biases were found between the testing centres. These findings, indicating that professional-level hitters have better visual acuity and depth perception than professional-level pitchers, affirm the notion that highly experienced athletes have differing perceptual skills. Findings are discussed in relation to deliberate practice theory.


Subject(s)
Athletic Performance/physiology , Baseball/physiology , Depth Perception/physiology , Visual Acuity/physiology , Adolescent , Adult , Age Factors , Bayes Theorem , Humans , Male , Motor Skills/physiology , Sensorimotor Cortex/physiology , Task Performance and Analysis , Young Adult
9.
Stat Methods Med Res ; 25(1): 188-204, 2016 Feb.
Article in English | MEDLINE | ID: mdl-22687877

ABSTRACT

In many observational studies, analysts estimate treatment effects using propensity scores, e.g. by matching or sub-classifying on the scores. When some values of the covariates are missing, analysts can use multiple imputation to fill in the missing data, estimate propensity scores based on the m completed datasets, and use the propensity scores to estimate treatment effects. We compare two approaches to implement this process. In the first, the analyst estimates the treatment effect using propensity score matching within each completed data set, and averages the m treatment effect estimates. In the second approach, the analyst averages the m propensity scores for each record across the completed datasets, and performs propensity score matching with these averaged scores to estimate the treatment effect. We compare properties of both methods via simulation studies using artificial and real data. The simulations suggest that the second method has greater potential to produce substantial bias reductions than the first, particularly when the missing values are predictive of treatment assignment.


Subject(s)
Models, Statistical , Propensity Score , Bias , Biostatistics , Breast Feeding/statistics & numerical data , Child , Child Development , Child, Preschool , Computer Simulation , Humans , Infant , Infant, Newborn , Observational Studies as Topic/statistics & numerical data , Treatment Outcome
10.
Multivariate Behav Res ; 50(4): 383-97, 2015.
Article in English | MEDLINE | ID: mdl-26257437

ABSTRACT

Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches.


Subject(s)
Behavioral Research/methods , Retrospective Studies , Statistics, Nonparametric , Adolescent , Adult , Child , Humans , Psychometrics/methods , Reproducibility of Results , Young Adult
11.
Stat Med ; 34(26): 3399-414, 2015 Nov 20.
Article in English | MEDLINE | ID: mdl-26095855

ABSTRACT

There are many advantages to individual participant data meta-analysis for combining data from multiple studies. These advantages include greater power to detect effects, increased sample heterogeneity, and the ability to perform more sophisticated analyses than meta-analyses that rely on published results. However, a fundamental challenge is that it is unlikely that variables of interest are measured the same way in all of the studies to be combined. We propose that this situation can be viewed as a missing data problem in which some outcomes are entirely missing within some trials and use multiple imputation to fill in missing measurements. We apply our method to five longitudinal adolescent depression trials where four studies used one depression measure and the fifth study used a different depression measure. None of the five studies contained both depression measures. We describe a multiple imputation approach for filling in missing depression measures that makes use of external calibration studies in which both depression measures were used. We discuss some practical issues in developing the imputation model including taking into account treatment group and study. We present diagnostics for checking the fit of the imputation model and investigate whether external information is appropriately incorporated into the imputed values.


Subject(s)
Antidepressive Agents, Second-Generation/therapeutic use , Depression/drug therapy , Fluoxetine/therapeutic use , Meta-Analysis as Topic , Models, Statistical , Adolescent , Calibration , Child , Female , Humans , Longitudinal Studies , Male , Psychology, Adolescent , Randomized Controlled Trials as Topic , Research Design , Treatment Outcome
12.
J Chem Educ ; 91(2): 165-172, 2014 Feb 11.
Article in English | MEDLINE | ID: mdl-24803686

ABSTRACT

We developed the Alcohol Pharmacology Education Partnership (APEP), a set of modules designed to integrate a topic of interest (alcohol) with concepts in chemistry and biology for high school students. Chemistry and biology teachers (n = 156) were recruited nationally to field-test APEP in a controlled study. Teachers obtained professional development either at a conference-based workshop (NSTA or NCSTA) or via distance learning to learn how to incorporate the APEP modules into their teaching. They field-tested the modules in their classes during the following year. Teacher knowledge of chemistry and biology concepts increased significantly following professional development, and was maintained for at least a year. Their students (n = 14 014) demonstrated significantly higher scores when assessed for knowledge of both basic and advanced chemistry and biology concepts compared to students not using APEP modules in their classes the previous year. Higher scores were achieved as the number of modules used increased. These findings are consistent with our previous studies, demonstrating higher scores in chemistry and biology after students use modules that integrate topics interesting to them, such as drugs (the Pharmacology Education Partnership).

13.
Bayesian Anal ; 8(2)2013 Jun 01.
Article in English | MEDLINE | ID: mdl-24358073

ABSTRACT

Multinomial outcomes with many levels can be challenging to model. Information typically accrues slowly with increasing sample size, yet the parameter space expands rapidly with additional covariates. Shrinking all regression parameters towards zero, as often done in models of continuous or binary response variables, is unsatisfactory, since setting parameters equal to zero in multinomial models does not necessarily imply "no effect." We propose an approach to modeling multinomial outcomes with many levels based on a Bayesian multinomial probit (MNP) model and a multiple shrinkage prior distribution for the regression parameters. The prior distribution encourages the MNP regression parameters to shrink toward a number of learned locations, thereby substantially reducing the dimension of the parameter space. Using simulated data, we compare the predictive performance of this model against two other recently-proposed methods for big multinomial models. The results suggest that the fully Bayesian, multiple shrinkage approach can outperform these other methods. We apply the multiple shrinkage MNP to simulating replacement values for areal identifiers, e.g., census tract indicators, in order to protect data confidentiality in public use datasets.

14.
Ethn Dis ; 22(1): 85-9, 2012.
Article in English | MEDLINE | ID: mdl-22774314

ABSTRACT

OBJECTIVES: Black women have increased risk of preterm birth compared to white women, and overall black women are in poorer health than white women. Recent recommendations to reduce preterm birth have focused on preconception health care. We explore the associations between indicators of maternal prepregnancy health with preterm birth among a sample of black women. DESIGN: The current study was prospective. SETTING: Enrollment occurred in prenatal clinics in Baltimore. PARTICIPANTS: Women (N=922) aged > or =18 were enrolled in the study. Data on maternal health, behaviors, and pregnancy outcome were abstracted from clinical records. MAIN OUTCOME MEASURE: Logistic regression was used to evaluate associations between behavioral and health status variables with preterm birth. RESULTS: In bivariate analysis, alcohol use, drug use and chronic diseases were associated with preterm birth. In the logistic regression analysis, drug use and chronic diseases were associated with preterm birth. CONCLUSIONS: These results demonstrate an association between maternal health and behaviors prior to pregnancy with preterm birth among black women. Providing access to health care prior to pregnancy to address behavioral and health risks may improve pregnancy outcomes among low-income black women.


Subject(s)
Black or African American , Health Status Indicators , Maternal Behavior , Premature Birth , Adolescent , Adult , Baltimore/epidemiology , Chronic Disease/epidemiology , Chronic Disease/ethnology , Female , Health Behavior , Humans , Logistic Models , Poverty , Pregnancy , Prenatal Care , Risk Factors , Risk-Taking , Substance-Related Disorders/complications , Substance-Related Disorders/epidemiology , Substance-Related Disorders/ethnology
15.
Stat Med ; 31(10): 949-62, 2012 May 10.
Article in English | MEDLINE | ID: mdl-22362635

ABSTRACT

Within causal inference, principal stratification (PS) is a popular approach for dealing with intermediate variables, that is, variables affected by treatment that also potentially affect the response. However, when there exists unmeasured confounding in the treatment arms--as can happen in observational studies--causal estimands resulting from PS analyses can be biased. We identify the various pathways of confounding present in PS contexts and their effects for PS inference. We present model-based approaches for assessing the sensitivity of complier average causal effect estimates to unmeasured confounding in the setting of binary treatments, binary intermediate variables, and binary outcomes. These same approaches can be used to assess sensitivity to unknown direct effects of treatments on outcomes because, as we show, direct effects are operationally equivalent to one of the pathways of unmeasured confounding. We illustrate the methodology using a randomized study with artificially introduced confounding and a sensitivity analysis for an observational study of the effects of physical activity and body mass index on cardiovascular disease.


Subject(s)
Models, Statistical , Population Dynamics , Randomized Controlled Trials as Topic/methods , Cohort Studies , Humans , Surveys and Questionnaires , Treatment Outcome
16.
Biometrics ; 68(1): 92-100, 2012 Mar.
Article in English | MEDLINE | ID: mdl-21689080

ABSTRACT

We describe a Bayesian quantile regression model that uses a confirmatory factor structure for part of the design matrix. This model is appropriate when the covariates are indicators of scientifically determined latent factors, and it is these latent factors that analysts seek to include as predictors in the quantile regression. We apply the model to a study of birth weights in which the effects of latent variables representing psychosocial health and actual tobacco usage on the lower quantiles of the response distribution are of interest. The models can be fit using an R package called factorQR.


Subject(s)
Bayes Theorem , Fetal Growth Retardation/epidemiology , Infant, Very Low Birth Weight , Maternal Exposure/statistics & numerical data , Proportional Hazards Models , Regression Analysis , Tobacco Smoke Pollution/statistics & numerical data , Birth Weight , Causality , Female , Humans , Infant, Low Birth Weight , Infant, Newborn , Prevalence
17.
J Am Stat Assoc ; 107(500): 1385-1394, 2012 Dec 01.
Article in English | MEDLINE | ID: mdl-25214699

ABSTRACT

Statistical agencies and other organizations that disseminate data are obligated to protect data subjects' confidentiality. For example, ill-intentioned individuals might link data subjects to records in other databases by matching on common characteristics (keys). Successful links are particularly problematic for data subjects with combinations of keys that are unique in the population. Hence, as part of their assessments of disclosure risks, many data stewards estimate the probabilities that sample uniques on sets of discrete keys are also population uniques on those keys. This is typically done using log-linear modeling on the keys. However, log-linear models can yield biased estimates of cell probabilities for sparse contingency tables with many zero counts, which often occurs in databases with many keys. This bias can result in unreliable estimates of probabilities of uniqueness and, hence, misrepresentations of disclosure risks. We propose an alternative to log-linear models for datasets with sparse keys based on a Bayesian version of grade of membership (GoM) models. We present a Bayesian GoM model for multinomial variables and offer an MCMC algorithm for fitting the model. We evaluate the approach by treating data from a recent US Census Bureau public use microdata sample as a population, taking simple random samples from that population, and benchmarking estimated probabilities of uniqueness against population values. Compared to log-linear models, GoM models provide more accurate estimates of the total number of uniques in the samples. Additionally, they offer record-level predictions of uniqueness that dominate those based on log-linear models.

18.
Ann Appl Stat ; 6(1): 229-252, 2012 Mar 01.
Article in English | MEDLINE | ID: mdl-23990852

ABSTRACT

When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects' identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data.

19.
Epidemiology ; 22(6): 859-66, 2011 Nov.
Article in English | MEDLINE | ID: mdl-21968775

ABSTRACT

Covariates may affect continuous responses differently at various points of the response distribution. For example, some exposure might have minimal impact on conditional means, whereas it might lower conditional 10th percentiles sharply. Such differential effects can be important to detect. In studies of the determinants of birth weight, for instance, it is critical to identify exposures like the one above, since low birth weight is a risk factor for later health problems. Effects of covariates on the tails of distributions can be obscured by models (such as linear regression) that estimate conditional means; however, effects on tails can be detected by quantile regression. We present 2 approaches for exploring high-dimensional predictor spaces to identify important predictors for quantile regression. These are based on the lasso and elastic net penalties. We apply the approaches to a prospective cohort study of adverse birth outcomes that includes a wide array of demographic, medical, psychosocial, and environmental variables. Although tobacco exposure is known to be associated with lower birth weights, the analysis suggests an interesting interaction effect not previously reported: tobacco exposure depresses the 20th and 30th percentiles of birth weight more strongly when mothers have high levels of lead in their blood compared with those who have low blood lead levels.


Subject(s)
Pregnancy Outcome/epidemiology , Regression Analysis , Causality , Data Interpretation, Statistical , Female , Humans , Infant, Low Birth Weight , Infant, Newborn , Linear Models , Pregnancy , Premature Birth/epidemiology , Prenatal Exposure Delayed Effects/epidemiology
SELECTION OF CITATIONS
SEARCH DETAIL
...