RESUMO
Analysis of integrated data often requires record linkage in order to join together the data residing in separate sources. In case linkage errors cannot be avoided, due to the lack a unique identity key that can be used to link the records unequivocally, standard statistical techniques may produce misleading inference if the linked data are treated as if they were true observations. In this paper, we propose methods for categorical data analysis based on linked data that are not prepared by the analyst, such that neither the match-key variables nor the unlinked records are available. The adjustment is based on the proportion of false links in the linked file and our approach allows the probabilities of correct linkage to vary across the records without requiring that one is able to estimate this probability for each individual record. It accommodates also the general situation where unmatched records that cannot possibly be correctly linked exist in all the sources. The proposed methods are studied by simulation and applied to real data.
RESUMO
The identification and treatment of "one-inflation" in estimating the size of an elusive population has received increasing attention in capture-recapture literature in recent years. The phenomenon occurs when the number of units captured exactly once clearly exceeds the expectation under a baseline count distribution. Ignoring one-inflation has serious consequences for estimation of the population size, which can be drastically overestimated. In this paper we propose a Bayesian approach for Poisson, geometric, and negative binomial one-inflated count distributions. Posterior inference for population size will be obtained applying a Gibbs sampler approach. We also provide a Bayesian approach to model selection. We illustrate the proposed methodology with simulated and real data and propose a new application in official statistics to estimate the number of people implicated in the exploitation of prostitution in Italy.