The Dataset-Similarity-Based Approach to Select Datasets for Evaluation in Similarity Retrieval

Matiazzo, M.A.L.; de Castro-Silva, V.; Oyamada, R.S.; Kaster, D.S.

doi:10.1007/978-3-031-46994-7_11

Most papers on similarity retrieval present experiments exe- cuted on an assortion of complex datasets. However, no work focuses on analyzing the selection of datasets to evaluate the techniques proposed in the related literature. Ideally, the datasets chosen for experimental analysis should cover a variety of properties to ensure a proper evalu- ation; however, this is not always the case. This paper introduces the dataset-similarity-based approach, a new conceptual view of datasets that explores how they vary according to their characteristics. The app- roach is based on extracting a set of features from the datasets to rep- resent them in a similarity space and analyze their distribution in this space. We present an instantiation of our approach using datasets gath- ered by surveying the dataset usage in papers published in relevant conferences on similarity retrieval and sample analyses. Our analyses show that datasets often used together in experiments are more similar than they seem to be at first glance, reducing the variability. The pro- posed representation of datasets in a similarity space allows future works to improve the choice of datasets for running experiments in similarity retrieval.

The Dataset-Similarity-Based Approach to Select Datasets for Evaluation in Similarity Retrieval / M.A.L. Matiazzo, V. de Castro-Silva, R.S. Oyamada, D.S. Kaster (LECTURE NOTES IN COMPUTER SCIENCE). - In: Similarity Search and Applications / [a cura di] Oscar Pedreira, Vladimir Estivill-Castro. - Cham : Springer, 2023. - ISBN 978-3-031-46993-0. - pp. 125-132 (( Intervento presentato al 16. convegno International Conference on Similarity Search and Applications, SISAP tenutosi a A Coruña : October 9–11 nel 2023 [10.1007/978-3-031-46994-7_11].

The Dataset-Similarity-Based Approach to Select Datasets for Evaluation in Similarity Retrieval

Matiazzo M. A. L.;de Castro-Silva V.;R.S. Oyamada^Penultimo;Kaster D. S.

2023

Abstract

Most papers on similarity retrieval present experiments exe- cuted on an assortion of complex datasets. However, no work focuses on analyzing the selection of datasets to evaluate the techniques proposed in the related literature. Ideally, the datasets chosen for experimental analysis should cover a variety of properties to ensure a proper evalu- ation; however, this is not always the case. This paper introduces the dataset-similarity-based approach, a new conceptual view of datasets that explores how they vary according to their characteristics. The app- roach is based on extracting a set of features from the datasets to rep- resent them in a similarity space and analyze their distribution in this space. We present an instantiation of our approach using datasets gath- ered by surveying the dataset usage in papers published in relevant conferences on similarity retrieval and sample analyses. Our analyses show that datasets often used together in experiments are more similar than they seem to be at first glance, reducing the variability. The pro- posed representation of datasets in a similarity space allows future works to improve the choice of datasets for running experiments in similarity retrieval.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Similarity Retrieval; Datasets; Experimental Analysis; Similarity Space of Datasets
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2023
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-031-46994-7_11
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1021769

Citazioni

ND

2

1

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca