Search | VHL Regional Portal

SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild.

Kossaifi, Jean; Walecki, Robert; Panagakis, Yannis; Shen, Jie; Schmitt, Maximilian; Ringeval, Fabien; Han, Jing; Pandit, Vedhas; Toisoul, Antoine; Schuller, Bjorn; Star, Kam; Hajiyev, Elnar; Pantic, Maja.

IEEE Trans Pattern Anal Mach Intell ; 43(3): 1022-1040, 2021 03.

Article in English | MEDLINE | ID: mdl-31581074

ABSTRACT

Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are increasingly becoming an indispensable part of our life. Accurately annotated real-world data are the crux in devising such systems. However, existing databases usually consider controlled settings, low demographic variability, and a single task. In this paper, we introduce the SEWA database of more than 2,000 minutes of audio-visual data of 398 people coming from six cultures, 50 percent female, and uniformly spanning the age range of 18 to 65 years old. Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations, mirroring, and continuously valued valence, arousal, liking, agreement, and prototypic examples of (dis)liking. This database aims to be an extremely valuable resource for researchers in affective computing and automatic human sensing and is expected to push forward the research in human behaviour analysis, including cultural studies. Along with the database, we provide extensive baseline experiments for automatic FAU detection and automatic valence, arousal, and (dis)liking intensity estimation.

Subject(s)

Algorithms , Emotions , Adolescent , Adult , Aged , Attitude , Databases, Factual , Face , Female , Humans , Middle Aged , Young Adult

Applying Artificial Intelligence Methods for the Estimation of Disease Incidence: The Utility of Language Models.

Zhang, Yuanzhao; Walecki, Robert; Winter, Joanne R; Bragman, Felix J S; Lourenco, Sara; Hart, Christopher; Baker, Adam; Perov, Yura; Johri, Saurabh.

Front Digit Health ; 2: 569261, 2020.

Article in English | MEDLINE | ID: mdl-34713043

ABSTRACT

Background: AI-driven digital health tools often rely on estimates of disease incidence or prevalence, but obtaining these estimates is costly and time-consuming. We explored the use of machine learning models that leverage contextual information about diseases from unstructured text, to estimate disease incidence. Methods: We used a class of machine learning models, called language models, to extract contextual information relating to disease incidence. We evaluated three different language models: BioBERT, Global Vectors for Word Representation (GloVe), and the Universal Sentence Encoder (USE), as well as an approach which uses all jointly. The output of these models is a mathematical representation of the underlying data, known as "embeddings." We used these to train neural network models to predict disease incidence. The neural networks were trained and validated using data from the Global Burden of Disease study, and tested using independent data sourced from the epidemiological literature. Findings: A variety of language models can be used to encode contextual information of diseases. We found that, on average, BioBERT embeddings were the best for disease names across multiple tasks. In particular, BioBERT was the best performing model when predicting specific disease-country pairs, whilst a fusion model combining BioBERT, GloVe, and USE performed best on average when predicting disease incidence in unseen countries. We also found that GloVe embeddings performed better than BioBERT embeddings when applied to country names. However, we also noticed that the models were limited in view of predicting previously unseen diseases. Further limitations were also observed with substantial variations across age groups and notably lower performance for diseases that are highly dependent on location and climate. Interpretation: We demonstrate that context-aware machine learning models can be used for estimating disease incidence. This method is quicker to implement than traditional epidemiological approaches. We therefore suggest it complements existing modeling efforts, where data is required more rapidly or at larger scale. This may particularly benefit AI-driven digital health products where the data will undergo further processing and a validated approximation of the disease incidence is adequate.

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL