Your browser doesn't support javascript.
loading
Applying machine-learning to rapidly analyse large qualitative text datasets to inform the COVID-19 pandemic response: Comparing human and machine-assisted topic analysis techniques
Lauren Towler; Paulina Bondaronek; Trisevgeni Papakonstantinou; Richard Amlôt; Tim Chadborn; Ben Ainsworth; Lucy Yardley.
Affiliation
  • Lauren Towler; University of Southampton
  • Paulina Bondaronek; Office for Health Improvement & Disparities, Department of Health and Social Care
  • Trisevgeni Papakonstantinou; Office for Health Improvement & Disparities, Department of Health and Social Care
  • Richard Amlôt; Behavioural Science and Insights Unit, UK Health Security Agency
  • Tim Chadborn; Office for Health Improvement & Disparities, Department of Health and Social Care
  • Ben Ainsworth; University of Bath
  • Lucy Yardley; University of Bristol
Preprint in English | medRxiv | ID: ppmedrxiv-22274993
ABSTRACT
BackgroundMachine-assisted topic analysis (MATA) uses artificial intelligence methods to assist qualitative researchers to analyse large amounts of textual data. This could allow qualitative researchers to inform and update public health interventions in real-time, to ensure they remain acceptable and effective during rapidly changing contexts (such as a pandemic). In this novel study we aimed to understand the potential for such approaches to support intervention implementation, by directly comparing MATA and human-only thematic analysis techniques when applied to the same dataset (1472 free-text responses from users of the COVID-19 infection control intervention Germ Defence). MethodsIn MATA, the analysis process included an unsupervised topic modelling approach to identify latent topics in the text. The human research team then described the topics and identified broad themes. In human-only codebook analysis, an initial codebook was developed by an experienced qualitative researcher and applied to the dataset by a well-trained research team, who met regularly to critique and refine the codes. To understand similarities and difference, formal triangulation using a convergence coding matrix compared the findings from both methods, categorising them as agreement, complementary, dissonant, or silent. ResultsHuman analysis took much longer (147.5 hours) than MATA (40 hours). Both human-only and MATA identified key themes about what users found helpful and unhelpful (e.g. Boosting confidence in how to perform the behaviours vs Lack of personally relevant content). Formal triangulation of the codes created showed high similarity between the findings. All codes developed from the MATA were classified as in agreement or complementary to the human themes. Where the findings were classified as complementary, this was typically due to slightly differing interpretations or nuance present in the human-only analysis. ConclusionsOverall, the quality of MATA was as high as the human-only thematic analysis, with substantial time savings. For simple analyses that do not require an in-depth or subtle understanding of the data, MATA is a useful tool that can support qualitative researchers to interpret and analyse large datasets quickly. These findings have practical implications for intervention development and implementation, such as enabling rapid optimisation during public health emergencies. Contributions to the literatureO_LINatural language processing (NLP) techniques have been applied within health research due to the need to rapidly analyse large samples of qualitative data. However, the extent to which these techniques lead to results comparable to human coding requires further assessment. C_LIO_LIWe demonstrate that combining NLP with human analysis to analyse free-text data can be a trustworthy and efficient method to use on large quantities of qualitative data. C_LIO_LIThis method has the potential to play an important role in contexts where rapid descriptive or exploratory analysis of very large datasets is required, such as during a public health emergency. C_LI
License
cc_by
Full text: Available Collection: Preprints Database: medRxiv Type of study: Qualitative research Language: English Year: 2022 Document type: Preprint
Full text: Available Collection: Preprints Database: medRxiv Type of study: Qualitative research Language: English Year: 2022 Document type: Preprint
...