Query reformulation mining: models, patterns, and applications

Boldi, P.; Bonchi, F.; Castillo, C.; Vigna, S.

doi:10.1007/s10791-010-9155-3

Understanding query reformulation patterns is a key task towards next generation web search engines. If we can do that, then we can build systems able to understand and possibly predict user intent, providing the needed assistance at the right time, and thus helping users locate information more effectively and improving their web-search experience. As a step in this direction, we build a very accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92% accuracy. We then apply the model to automatically label two very large query logs sampled from different geographic areas, and containing a total of approximately 17 million query reformulations. We study the resulting reformulation patterns, matching some results from previous studies performed on smaller manually annotated datasets, and discovering new interesting reformulation patterns, including connections between reformulation types and topical categories. We annotate two large query-flow graphs with reformulation type information, and run several graph-characterization experiments on these graphs, extracting new insights about the relationships between the different query reformulation types. Finally we study query recommendations based on short random walks on the query-flow graphs. Our experiments show that these methods can match in precision, and often improve, recommendations based on query-click graphs, without the need of users’ clicks. Our experiments also show that it is important to consider transition-type labels on edges for having recommendations of good quality.

Query reformulation mining: models, patterns, and applications / P. Boldi, F. Bonchi, C. Castillo, S. Vigna. - In: INFORMATION RETRIEVAL. - ISSN 1386-4564. - 14:3(2011), pp. 257-289.

Query reformulation mining: models, patterns, and applications

P. Boldi^Primo;F. Bonchi;C. Castillo;S. Vigna^Ultimo

2011

Abstract

Understanding query reformulation patterns is a key task towards next generation web search engines. If we can do that, then we can build systems able to understand and possibly predict user intent, providing the needed assistance at the right time, and thus helping users locate information more effectively and improving their web-search experience. As a step in this direction, we build a very accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92% accuracy. We then apply the model to automatically label two very large query logs sampled from different geographic areas, and containing a total of approximately 17 million query reformulations. We study the resulting reformulation patterns, matching some results from previous studies performed on smaller manually annotated datasets, and discovering new interesting reformulation patterns, including connections between reformulation types and topical categories. We annotate two large query-flow graphs with reformulation type information, and run several graph-characterization experiments on these graphs, extracting new insights about the relationships between the different query reformulation types. Finally we study query recommendations based on short random walks on the query-flow graphs. Our experiments show that these methods can match in precision, and often improve, recommendations based on query-click graphs, without the need of users’ clicks. Our experiments also show that it is important to consider transition-type labels on edges for having recommendations of good quality.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
			Query log mining ; Query flow graph ; Session segmentation ; Query recommendation
		
	Settori scientifico-disciplinari dell'articolo
	
			Settore INF/01 - Informatica
		
	Data di pubblicazione
	
			2011
		
	Rivista in ANCE
	
			INFORMATION RETRIEVAL
		
	DOI
	
			https://dx.doi.org/10.1007/s10791-010-9155-3
		
	URL
	
			http://www.springerlink.com/content/0n121l8q6328k737/
		
	Tipologia
	
			Article (author)
		
	Appare nelle tipologie:
	
			01 - Articolo su periodico

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/156643

Citazioni

ND

29

20

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca