ABSTRACT
This work proposes the development of a methodology that standardises the extraction, processing and analysis of natural language data for the study of gender-based violence evidenced on the Twitter social network. We develop a tool that may be exploited by different organisations, foundations, corporations, associations or state institutions that promote, exercise and disseminate human rights in Colombia and elsewhere. In this work, we take as a case study ten prominent female public figures in Colombia in the artistic, political and journalistic spheres. We extract a total of 39,629 tweet responses during a turbulent national strike amid the COVID-19 pandemic, and carry out topic identification and sentiment analysis. While we observe differences between the different roles based on natural language processing with different libraries, the are notable negative terms in the topics identified which are of concern as they may incite gender-based violence. It is expected that this proposed tool will benefit the decision-making of these institutions to issue early warnings, together with the exercise of the protection, prevention and defence of women's rights. © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org)