Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
PLoS One ; 15(12): e0243300, 2020.
Article in English | MEDLINE | ID: mdl-33370298

ABSTRACT

Data-driven and machine learning based approaches for detecting, categorising and measuring abusive content such as hate speech and harassment have gained traction due to their scalability, robustness and increasingly high performance. Making effective detection systems for abusive content relies on having the right training datasets, reflecting a widely accepted mantra in computer science: Garbage In, Garbage Out. However, creating training datasets which are large, varied, theoretically-informed and that minimize biases is difficult, laborious and requires deep expertise. This paper systematically reviews 63 publicly available training datasets which have been created to train abusive language classifiers. It also reports on creation of a dedicated website for cataloguing abusive language data hatespeechdata.com. We discuss the challenges and opportunities of open science in this field, and argue that although more dataset sharing would bring many benefits it also poses social and ethical risks which need careful consideration. Finally, we provide evidence-based recommendations for practitioners creating new abusive content training datasets.


Subject(s)
Databases, Factual , Hate , Language , Machine Learning , Humans
2.
JMIR Mhealth Uhealth ; 5(11): e168, 2017 Nov 01.
Article in English | MEDLINE | ID: mdl-29092810

ABSTRACT

BACKGROUND: The huge increase in smartphone use heralds an enormous opportunity for epidemiology research, but there is limited evidence regarding long-term engagement and attrition in mobile health (mHealth) studies. OBJECTIVE: The objective of this study was to examine how representative the Cloudy with a Chance of Pain study population is of wider chronic-pain populations and to explore patterns of engagement among participants during the first 6 months of the study. METHODS: Participants in the United Kingdom who had chronic pain (≥3 months) and enrolled between January 20, 2016 and January 29, 2016 were eligible if they were aged ≥17 years and used the study app to report any of 10 pain-related symptoms during the study period. Participant characteristics were compared with data from the Health Survey for England (HSE) 2011. Distinct clusters of engagement over time were determined using first-order hidden Markov models, and participant characteristics were compared between the clusters. RESULTS: Compared with the data from the HSE, our sample comprised a higher proportion of women (80.51%, 5129/6370 vs 55.61%, 4782/8599) and fewer persons at the extremes of age (16-34 and 75+). Four clusters of engagement were identified: high (13.60%, 865/6370), moderate (21.76%, 1384/6370), low (39.35%, 2503/6370), and tourists (25.44%, 1618/6370), between which median days of data entry ranged from 1 (interquartile range; IQR: 1-1; tourist) to 149 (124-163; high). Those in the high-engagement cluster were typically older, whereas those in the tourist cluster were mostly male. Few other differences distinguished the clusters. CONCLUSIONS: Cloudy with a Chance of Pain demonstrates a rapid and successful recruitment of a large, representative, and engaged sample of people with chronic pain and provides strong evidence to suggest that smartphones could provide a viable alternative to traditional data collection methods.

SELECTION OF CITATIONS
SEARCH DETAIL
...