A ‘non-parametric’ version of the naive Bayes classifier

Soria, D.; Garibaldi, J.M.; Ambrogi, F.; Biganzoli, E.; Ellis, I.O.

doi:10.1016/j.knosys.2011.02.014

Many algorithms have been proposed for the machine learning task of classification. One of the simplest methods, the naive Bayes classifier, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-normal distributions are observed.

A ‘non-parametric’ version of the naive Bayes classifier / D. Soria, J. M. Garibaldi, F. Ambrogi, E. Biganzoli, I. O. Ellis. - In: KNOWLEDGE-BASED SYSTEMS. - ISSN 0950-7051. - 24:6(2011), pp. 775-784. [10.1016/j.knosys.2011.02.014]

A ‘non-parametric’ version of the naive Bayes classifier

D. Soria;J. M. Garibaldi;F. Ambrogi;E. Biganzoli^Penultimo;I. O. Ellis

2011

Abstract

Many algorithms have been proposed for the machine learning task of classification. One of the simplest methods, the naive Bayes classifier, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-normal distributions are observed.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore MED/01 - Statistica Medica
			
	Data di pubblicazione
	
				2011
			
	Rivista in ANCE
	
				KNOWLEDGE-BASED SYSTEMS
			
	DOI
	
				https://dx.doi.org/10.1016/j.knosys.2011.02.014
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/165610

Citazioni

ND

146

112

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca