Search | VHL Regional Portal

Model-agnostic unsupervised detection of bots in a Likert-type questionnaire.

Ilagan, Michael John; Falk, Carl F.

Behav Res Methods ; 2023 Nov 20.

Article in English | MEDLINE | ID: mdl-37985637

ABSTRACT

To detect bots in online survey data, there is a wealth of literature on statistical detection using only responses to Likert-type items. There are two traditions in the literature. One tradition requires labeled data, forgoing strong model assumptions. The other tradition requires a measurement model, forgoing collection of labeled data. In the present article, we consider the problem where neither requirement is available, for an inventory that has the same number of Likert-type categories for all items. We propose a bot detection algorithm that is both model-agnostic and unsupervised. Our proposed algorithm involves a permutation test with leave-one-out calculations of outlier statistics. For each respondent, it outputs a p value for the null hypothesis that the respondent is a bot. Such an algorithm offers nominal sensitivity calibration that is robust to the bot response distribution. In a simulation study, we found our proposed algorithm to improve upon naive alternatives in terms of 95% sensitivity calibration and, in many scenarios, in terms of classification accuracy.

Multilevel analysis of matching behavior: A comparison of maximum likelihood and Bayesian estimation.

Ilagan, Michael John; Caron, Pier-Olivier; Miocevic, Milica.

J Exp Anal Behav ; 120(2): 253-262, 2023 09.

Article in English | MEDLINE | ID: mdl-37323053

ABSTRACT

While trying to infer laws of behavior, accounting for both within-subjects and between-subjects variance is often overlooked. It has been advocated recently to use multilevel modeling to analyze matching behavior. Using multilevel modeling within behavior analysis has its own challenges though. Adequate sample sizes are required (at both levels) for unbiased parameter estimates. The purpose of the current study is to compare parameter recovery and hypothesis rejection rates of maximum likelihood (ML) estimation and Bayesian estimation (BE) of multilevel models for matching behavior studies. Four factors were investigated through simulations: number of subjects, number of measurements by subject, sensitivity (slope), and variance of the random effect. Results showed that both ML estimation and BE with flat priors yielded acceptable statistical properties for intercept and slope fixed effects. The ML estimation procedure generally had less bias, lower RMSE, more power, and false-positive rates closer to the nominal rate. Thus, we recommend ML estimation over BE with uninformative priors, considering our results. The BE procedure requires more informative priors to be used in multilevel modeling of matching behavior, which will require further studies.

Subject(s)

Models, Statistical , Humans , Bayes Theorem , Multilevel Analysis , Sample Size

Supervised Classes, Unsupervised Mixing Proportions: Detection of Bots in a Likert-Type Questionnaire.

Ilagan, Michael John; Falk, Carl F.

Educ Psychol Meas ; 83(2): 217-239, 2023 Apr.

Article in English | MEDLINE | ID: mdl-36866070

ABSTRACT

Administering Likert-type questionnaires to online samples risks contamination of the data by malicious computer-generated random responses, also known as bots. Although nonresponsivity indices (NRIs) such as person-total correlations or Mahalanobis distance have shown great promise to detect bots, universal cutoff values are elusive. An initial calibration sample constructed via stratified sampling of bots and humans-real or simulated under a measurement model-has been used to empirically choose cutoffs with a high nominal specificity. However, a high-specificity cutoff is less accurate when the target sample has a high contamination rate. In the present article, we propose the supervised classes, unsupervised mixing proportions (SCUMP) algorithm that chooses a cutoff to maximize accuracy. SCUMP uses a Gaussian mixture model to estimate, unsupervised, the contamination rate in the sample of interest. A simulation study found that, in the absence of model misspecification on the bots, our cutoffs maintained accuracy across varying contamination rates.

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL