The Rise of GoodFATR: A Novel Accuracy Comparison Methodology for Indicator Extraction Tools

Caballero, J.; Gomez, G.; Matic, S.; Sánchez, G.; Sebastián, S.; Villacañas, A.

doi:10.1016/j.future.2023.02.012

To adapt to a constantly evolving landscape of cyber threats, organizations actively need to collect Indicators of Compromise (IOCs), i.e., forensic artifacts that signal that a host or network might have been compromised. IOCs can be collected through open-source and commercial structured IOC feeds. But, they can also be extracted from a myriad of unstructured threat reports written in natural language and distributed using a wide array of sources such as blogs and social media. There exist multiple indicator extraction tools that can identify IOCs in natural language reports. But, it is hard to compare their accuracy due to the difficulty of building large ground truth datasets. This work presents a novel majority vote methodology for comparing the accuracy of indicator extraction tools, which does not require a manually-built ground truth. We implement our methodology into GoodFATR, an automated platform for collecting threat reports from a wealth of sources, extracting IOCs from the collected reports using multiple tools, and comparing their accuracy. GoodFATR supports 6 threat report sources: RSS, Twitter, Telegram, Malpedia, APTnotes, and ChainSmith. GoodFATR continuously monitors the sources, downloads new threat reports, extracts 41 indicator types from the collected reports, and filters non-malicious indicators to output the IOCs. We run GoodFATR over 15 months to collect 472,891 reports from the 6 sources; extract 978,151 indicators from the reports; and identify 618,217 IOCs. We analyze the collected data to identify the top IOC contributors and the IOC class distribution. We apply GoodFATR to compare the IOC extraction accuracy of 7 popular open-source tools with GoodFATR's own indicator extraction module.

The Rise of GoodFATR: A Novel Accuracy Comparison Methodology for Indicator Extraction Tools / J. Caballero, G. Gomez, S. Matic, G. Sánchez, S. Sebastián, A. Villacañas. - In: FUTURE GENERATION COMPUTER SYSTEMS. - ISSN 0167-739X. - 144:(2023), pp. 74-89. [10.1016/j.future.2023.02.012]

The Rise of GoodFATR: A Novel Accuracy Comparison Methodology for Indicator Extraction Tools

Caballero, Juan;Gomez, Gibran;S. Matic;Sánchez, Gustavo;Sebastián, Silvia;Villacañas, Arturo

2023

Abstract

To adapt to a constantly evolving landscape of cyber threats, organizations actively need to collect Indicators of Compromise (IOCs), i.e., forensic artifacts that signal that a host or network might have been compromised. IOCs can be collected through open-source and commercial structured IOC feeds. But, they can also be extracted from a myriad of unstructured threat reports written in natural language and distributed using a wide array of sources such as blogs and social media. There exist multiple indicator extraction tools that can identify IOCs in natural language reports. But, it is hard to compare their accuracy due to the difficulty of building large ground truth datasets. This work presents a novel majority vote methodology for comparing the accuracy of indicator extraction tools, which does not require a manually-built ground truth. We implement our methodology into GoodFATR, an automated platform for collecting threat reports from a wealth of sources, extracting IOCs from the collected reports using multiple tools, and comparing their accuracy. GoodFATR supports 6 threat report sources: RSS, Twitter, Telegram, Malpedia, APTnotes, and ChainSmith. GoodFATR continuously monitors the sources, downloads new threat reports, extracts 41 indicator types from the collected reports, and filters non-malicious indicators to output the IOCs. We run GoodFATR over 15 months to collect 472,891 reports from the 6 sources; extract 978,151 indicators from the reports; and identify 618,217 IOCs. We analyze the collected data to identify the top IOC contributors and the IOC class distribution. We apply GoodFATR to compare the IOC extraction accuracy of 7 popular open-source tools with GoodFATR's own indicator extraction module.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Cyber Threat Intelligence; Indicators of Compromise; IOC; RSS; Telegram; Twitter
			
	Settori scientifico-disciplinari dell'articolo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				2023
			
	Rivista in ANCE
	
				FUTURE GENERATION COMPUTER SYSTEMS
			
	DOI
	
				https://dx.doi.org/10.1016/j.future.2023.02.012
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0167739X23000535-main.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 934.63 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	934.63 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
2208.00042v2.pdf accesso aperto Tipologia: Pre-print (manoscritto inviato all'editore) Dimensione 720.37 kB Formato Adobe PDF Visualizza/Apri	720.37 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1145175

Citazioni

ND

17

11

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca