Multimodal Semantic Mismatch Detection in Social Media Posts

Wang, K. H.; Zhao, S. Z.; Chan, D.; Zakhor, A.; Canny, J.

Wang, K. H.; Zhao, S. Z.; Chan, D.; Zakhor, A.; Canny, J..

2022 Ieee 24th International Workshop on Multimedia Signal Processing (Mmsp) ; 2022.

Article in English | Web of Science | ID: covidwho-2192021

ABSTRACT

ABSTRACT

Short videos have become the most popular form of social media in recent years. In this work, we focus on the threat scenario where video, audio, and their text description are semantically mismatched to mislead the audience. We develop self-supervised methods to detect semantic mismatch across multiple modalities, namely video, audio and text. We use state-of-the-art language, video and audio models to extract dense features from each modality, and explore transformer architecture together with contrastive learning methods on a dataset of one million Twitter posts from 2021 to 2022. Our best-performing method benefits from the robustness of Noise-Contrastive loss and the context provided by fusing modalities together using a cross-transformer. It outperforms state-of-the-art by over 9% in accuracy. We further characterize the performance of our system on topic-specific datasets containing COVID-19 and Russia-Ukraine related tweets, and shows that it outperforms state-of-the-art by over 17% in accuracy.

Keywords

Multimedia Forensics; Semantic Mismatch; Multimodal Representation; Learning; Deep Learning for Videos; Social Media

Fulltext

XML

Search on Google

Full text: Available Collection: Databases of international organizations Database: Web of Science Language: English Journal: 2022 Ieee 24th International Workshop on Multimedia Signal Processing (Mmsp) Year: 2022 Document Type: Article

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

Search on Google