Annotated Data in Spanish for Toxicity and Insults in Digital Social Networks

Abstract

This repository contains data sets and materials for a gold standard elaboration on toxicity and incivility in the digital sphere based on human coding to benchmark algorithmic classification tasks with transformers and LLMs. The labelling progress is 62%. We are labelling two samples of novel datasets of political digital interactions on Twitter (rebranded as X). The first set comprises almost 5 million data points from three Latin American protest events: (a) protests against the coronavirus and judicial reform measures in Argentina during August 2020; (b) protests against education budget cuts in Brazil in May 2019; and (c) the social outburst in Chile stemming from protests against the underground fare hike in October 2019. We are focusing on interactions in Spanish to elaborate a gold standard for digital interactions in this language, therefore, we prioritise Argentinian and Chilean data. The second set contains more than 31 million messages and more than 9 million interactions between 2010 and 2022, covering the election of members of the first Constitutional Convention in Chile, the drafting process and the referendum in which the proposal was rejected.

Publication
Dataset, pre-release version v0.3.3 – Purple Butterfly, Leiden University, Universidad Diego Portales, University of California Irvine and Training Data Lab
Bastián González-Bustamante
Bastián González-Bustamante
Post-doctoral Researcher

Post-doctoral Researcher in Computational Social Science and a lecturer in Governance and Development at the Institute of Public Administration at the Faculty of Governance and Global Affairs at Leiden University, Netherlands. Lecturer at the School of Public Administration at Universidad Diego Portales and Research Associate in Training Data Lab, Chile.

Sebastián Rivera
Sebastián Rivera
Assistant Professor

Assistant Professor in the Government School and Public Administration at the Universidad Mayor, Chile. Researcher Associate in Training Data Lab, Chile.

Next
Previous