Introduction. Classification and regression trees (CART) are binary recursive partitioning methods designed to construct prediction models for categorical (classification) or continuous (regression) variables from data. One of the key elements of classification trees is the assignment rule of each terminal node (leaf) to a class outcome. Objectives. Evaluating the performance of the ‘median trees’, built with a novel approach to class assignment, compared to the ‘modal trees’, built with the majority rule, when an ordinal outcome (y) is assumed. Materials and Methods. Modal trees were estimated using the modal class among the observations that fall into each leaf to assign y-classes, whereas median trees were estimated through the median class. According to the assignment rule adopted, the predicted power of the trees was evaluated by two different approaches: modal trees minimized the total number of errors; median trees minimized the sum of absolute distances between predicted class and observed class. Tree performances were evaluated through the gamma statistic, measuring the association between observed and predicted classes. Three real datasets with different number of y-levels (from four to six) were analyzed. Each dataset was divided into a training set, for building the trees, and a testing set, for evaluating prediction accuracy. A resampling of the testing set (n=30) was carried out to derive robust estimates. Binomial test and paired t-test were used to compare the significance of differences between tree performances. Results. Median tree performances were significantly better than modal ones with five and six y-classes. Significant differences were not observed with four levels of the outcome. No matter of the number of y-classes, median trees showed a simpler structure (smaller number of leaves) than modal ones. Conclusion. Median trees showed a better performance than modal trees with an increasing number of y-levels and generally provided a simpler structure which allows an easier interpretation of the patterns and connections among groups of interest.

A different class-assignment rule to build classification trees for ordinal outcomes / M. Di Maso, A. Lugo - In: Joint Meeting of the International Biometric Society (IBS)[s.l] : International Biometric Society (IBS), 2015 Jun 16. - pp. 10-10 (( convegno Joint meeting of the International Biometric Society (IBS) Austro-Swiss and Italian Regions tenutosi a Milano nel 2015.

A different class-assignment rule to build classification trees for ordinal outcomes

M. Di Maso;A. Lugo
2015

Abstract

Introduction. Classification and regression trees (CART) are binary recursive partitioning methods designed to construct prediction models for categorical (classification) or continuous (regression) variables from data. One of the key elements of classification trees is the assignment rule of each terminal node (leaf) to a class outcome. Objectives. Evaluating the performance of the ‘median trees’, built with a novel approach to class assignment, compared to the ‘modal trees’, built with the majority rule, when an ordinal outcome (y) is assumed. Materials and Methods. Modal trees were estimated using the modal class among the observations that fall into each leaf to assign y-classes, whereas median trees were estimated through the median class. According to the assignment rule adopted, the predicted power of the trees was evaluated by two different approaches: modal trees minimized the total number of errors; median trees minimized the sum of absolute distances between predicted class and observed class. Tree performances were evaluated through the gamma statistic, measuring the association between observed and predicted classes. Three real datasets with different number of y-levels (from four to six) were analyzed. Each dataset was divided into a training set, for building the trees, and a testing set, for evaluating prediction accuracy. A resampling of the testing set (n=30) was carried out to derive robust estimates. Binomial test and paired t-test were used to compare the significance of differences between tree performances. Results. Median tree performances were significantly better than modal ones with five and six y-classes. Significant differences were not observed with four levels of the outcome. No matter of the number of y-classes, median trees showed a simpler structure (smaller number of leaves) than modal ones. Conclusion. Median trees showed a better performance than modal trees with an increasing number of y-levels and generally provided a simpler structure which allows an easier interpretation of the patterns and connections among groups of interest.
Settore SECS-S/01 - Statistica
16-giu-2015
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
7.DiMaso2015IROeS.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 274.86 kB
Formato Adobe PDF
274.86 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/470389
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact