IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Our social communications and the expression of our beliefs and thoughts are becoming increasingly mediated and diffused by online social media. Beyond countless other advantages, this democratization and freedom of expression is also entailing the transfer of unpleasant offline behaviors to the online life, such as cyberbullying, sexting, hate speech and, in general, any behavior not suitable for the online community people belong to. To mitigate or even remove these threats from their platforms, most of the social media providers are implementing solutions for the automatic detection and filtering of such inappropriate contents. However, the data they use to train their tools are not publicly available. In this context, we release a dataset gathered from Mastodon, a distribute online social network which is formed by communities that impose the rules of publication, and which allows its users to mark their posts inappropriate if they perceived them not suitable for the community they belong to. The dataset consists of all the posts with public visibility published by users hosted on servers which support the English language. These data have been collected by implementing an ad-hoc tool for downloading the public timelines of the servers, namely instances, that form the Mastodon platform, along with the meta-data associated to them. The overall corpus contains over 5 million posts, spanning the entire life of Mastodon. We associate to each post a label indicating whether or not its content is inappropriate, as perceived by the user who wrote it. Moreover, we also provide the full description of each instance. Finally, we present some basic statistics about the production of inappropriate posts and the characteristics of their associated textual content.

Mastodon Content Warnings: Inappropriate Contents in a Microblogging Platform / M. Zignani, C. Quadri, A. Galdeman, S. Gaito, G.P. Rossi (PROCEEDINGS OF THE ... INTERNATIONAL AAAI CONFERENCE ON WEBLOGS AND SOCIAL MEDIA). - In: Proceedings of the International AAAI Conference on Web and Social Media[s.l] : Association for the Advancement of Artificial Intelligence, 2019. - ISBN 9781577358060. - pp. 639-645 (( Intervento presentato al 13. convegno International AAAI Conference on Web and Social Media tenutosi a Munich nel 2019.

Mastodon Content Warnings: Inappropriate Contents in a Microblogging Platform

M. Zignani;C. Quadri;A. Galdeman;S. Gaito;G.P. Rossi

2019

Abstract

Our social communications and the expression of our beliefs and thoughts are becoming increasingly mediated and diffused by online social media. Beyond countless other advantages, this democratization and freedom of expression is also entailing the transfer of unpleasant offline behaviors to the online life, such as cyberbullying, sexting, hate speech and, in general, any behavior not suitable for the online community people belong to. To mitigate or even remove these threats from their platforms, most of the social media providers are implementing solutions for the automatic detection and filtering of such inappropriate contents. However, the data they use to train their tools are not publicly available. In this context, we release a dataset gathered from Mastodon, a distribute online social network which is formed by communities that impose the rules of publication, and which allows its users to mark their posts inappropriate if they perceived them not suitable for the community they belong to. The dataset consists of all the posts with public visibility published by users hosted on servers which support the English language. These data have been collected by implementing an ad-hoc tool for downloading the public timelines of the servers, namely instances, that form the Mastodon platform, along with the meta-data associated to them. The overall corpus contains over 5 million posts, spanning the entire life of Mastodon. We associate to each post a label indicating whether or not its content is inappropriate, as perceived by the user who wrote it. Moreover, we also provide the full description of each instance. Finally, we present some basic statistics about the production of inappropriate posts and the characteristics of their associated textual content.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2019
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
2018_NotSafeForWork_ICWSM19_DatasetPaper (1).pdf accesso riservato Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Dimensione 1.57 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.57 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
3262-Article Text-6311-1-10-20190531.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 1.63 MB Formato Adobe PDF Visualizza/Apri	1.63 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/641570

Citazioni

ND

11

ND

ND

social impact