We introduce a new concept of nonparametric test for statistically deciding if a model fits a sample of data well. The employed statistic is the empirical cumulative distribution (e.c.d.f.) of the measure of the blocks determined by the ordered sample. For any distribution law underlying the data this statistic is distributed around a Beta cumulative distribution law (c.d.f.) so that the shift between the two curves is the statistic at the basis of the test. Its distribution is computed through a new bootstrap procedure from a population of free parameters of the model that are compatible with the sampled data according to the model. Closing the loop, we may expect that if the model fits the data well the Beta c.d.f. constitutes a template for the block e.c.d.f.s that are compatible with the observed data. In the paper we show how to appreciate the template functionality in the case of a good fit and also how to discriminate bad models. We show the test's potential in comparison to conventional tests, both in case studies and in a well-known benchmark for the semiparametric logistic model used widely in database analysis.

A New Goodness-Of-Fit Statistical Test / B. Apolloni, S. Bassis. - In: INTELLIGENT DECISION TECHNOLOGIES. - ISSN 1872-4981. - 1:4(2007), pp. 205-218. [10.3233/IDT-2007-1404]

A New Goodness-Of-Fit Statistical Test

B. Apolloni
Primo
;
S. Bassis
Ultimo
2007

Abstract

We introduce a new concept of nonparametric test for statistically deciding if a model fits a sample of data well. The employed statistic is the empirical cumulative distribution (e.c.d.f.) of the measure of the blocks determined by the ordered sample. For any distribution law underlying the data this statistic is distributed around a Beta cumulative distribution law (c.d.f.) so that the shift between the two curves is the statistic at the basis of the test. Its distribution is computed through a new bootstrap procedure from a population of free parameters of the model that are compatible with the sampled data according to the model. Closing the loop, we may expect that if the model fits the data well the Beta c.d.f. constitutes a template for the block e.c.d.f.s that are compatible with the observed data. In the paper we show how to appreciate the template functionality in the case of a good fit and also how to discriminate bad models. We show the test's potential in comparison to conventional tests, both in case studies and in a well-known benchmark for the semiparametric logistic model used widely in database analysis.
Nonparametric statistical test; goodness-of-fit test; beta distribution; tolerance regions; algorithmic inference; decision theory
Settore INF/01 - Informatica
2007
Article (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/55463
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact