It has become evident that term weighting has a significant effect on relevant document retrieval for which various methods are proposed. However, the main question that arises is which weighting method is the best? In this paper, it is shown that proper aggregation of weights generated by carefully selected basic weighting methods improves retrieval of the relevant documents with respect to the user’s needs. Toward this aim, it is shown that even using simple central tendency measures such as average, median or mid-range over an appropriate subset of basic weighting methods provides term weight that not only outperforms using each basic weighting method but also results in more effective weights in comparison with recently proposed complicated weighting methods. Based on exploiting the proposed method on various datasets, we have studied the effects of normalization of the basic weights, normalization of the vector lengths, the use of different components in the term frequency factor, etc. Results reveal the criteria for selecting an appropriate subset of basic weighting methods that would be fed to the aggregator in order to achieve higher retrieval precision.

Effects of central tendency measures on term weighting in textual information retrieval / F. Ghahramani, H. Tahayori, A. Visconti. - In: SOFT COMPUTING. - ISSN 1432-7643. - 25:11(2021), pp. 7341-7378. [10.1007/s00500-021-05694-5]

Effects of central tendency measures on term weighting in textual information retrieval

A. Visconti
2021

Abstract

It has become evident that term weighting has a significant effect on relevant document retrieval for which various methods are proposed. However, the main question that arises is which weighting method is the best? In this paper, it is shown that proper aggregation of weights generated by carefully selected basic weighting methods improves retrieval of the relevant documents with respect to the user’s needs. Toward this aim, it is shown that even using simple central tendency measures such as average, median or mid-range over an appropriate subset of basic weighting methods provides term weight that not only outperforms using each basic weighting method but also results in more effective weights in comparison with recently proposed complicated weighting methods. Based on exploiting the proposed method on various datasets, we have studied the effects of normalization of the basic weights, normalization of the vector lengths, the use of different components in the term frequency factor, etc. Results reveal the criteria for selecting an appropriate subset of basic weighting methods that would be fed to the aggregator in order to achieve higher retrieval precision.
Central tendency measures; Information retrieval; Term weight normalization; Term weighting
Settore INF/01 - Informatica
2021
mar-2021
Article (author)
File in questo prodotto:
File Dimensione Formato  
2021_Article_EffectsOfCentralTendencyMeasur.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/844365
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
social impact