The problem of statistical inference deals with the approximation of one or more unknown parameters on the basis of the study of random variables related to them, usually by referring to the elements of a sample. The basis of this kind of problem was deeply studied by the mathematical community in the period ranging from 1940 to 1960, building families of estimators for the unknown parameters whose statistical properties could assure a good performance in terms of minimization of suitable risk functions. On the other hand, several researchers working in the field of artificial intelligence started to focus on this problem about at the end of the previous mentioned period, when the development of computer science made available devices able to perform large amount of computations in a relatively small amount of time. This facilities enabled them to study statistics of growing complexity and therefore to study much difficult problems of estimation. They stated the foundations for the discipline now called computational learning, posing their focus mainly in assuring the effecive computability of the approximating functions. Their main aim was to build efficient algorithms and assuring bounds on the sample size necessary to achieve the desired generalisation -– an algorithmic counterpart on the concept of risk confidence. The scope of this work is to propose an inference model built by integrating the above described statistical and algorithmic properties of the estimators -– or, equivalently, of the algorithms –- used to achieve the parameters estimation goals. A twisting argument is introduced as a way to get the distribution of the parameters to be inferred as a function of the distribution of a minimal sufficient statistic (actually a relaxation of this concept). This procedure is quite general, since after standard regularity conditions, every family of distributions admits such a statistics for its free parameters. This model is applied to the fields of Boolean learning and of linear regression: in the former case, the model introduces the notions of sentry points and detail of the studied Boolean classes, and allows to deal with various kinds of errors affecting both the sample and the statistics. The performance of our estimators is then compared, in terms of confidence intervals, with support vector machines, a model recently introduced and widely used in the last years, getting more reliable shapes for these intervals. In the second case, the problem of finding a regression line in $R^2$ is considered, and the model is shown able to find confidence regions, both for the regression line and for the single points, without requiring the errors measurement to be distributed according to a Gaussian law. the sole constraint that a sufficient statistic for the unknown parameters can be found. In particular, error distributions belonging to a wide subclass of the exponential family can be dealt. Some insights on the non-linear regression fields are then given, noting that some properties of the model make it applicable also when the class of approximating functions have subsymbolical description, for instance when dealing with neural networks. A theoretical explanation for the local minima at the basis of the overfitting phenomenon is given and, as an application of the our approach, it is shown that the test phase of a neural network, that is the estimation of its generalisation capabilities, does not require more sample points than the phase of training, when the neural hypothesis is built.

Algorithmic approach to the statistical inference of non-Boolean function classes / D. Malchiodi ; B. Apolloni, V. Capasso. - : . DIPARTIMENTO DI MATEMATICA, DIPARTIMENTO DI SCIENZE DELL'INFORMAZIONE, 2000. ((12. ciclo, Anno Accademico 2008/2009.

### Algorithmic approach to the statistical inference of non-Boolean function classes

#####
*D. Malchiodi*

##### 2000

#### Abstract

The problem of statistical inference deals with the approximation of one or more unknown parameters on the basis of the study of random variables related to them, usually by referring to the elements of a sample. The basis of this kind of problem was deeply studied by the mathematical community in the period ranging from 1940 to 1960, building families of estimators for the unknown parameters whose statistical properties could assure a good performance in terms of minimization of suitable risk functions. On the other hand, several researchers working in the field of artificial intelligence started to focus on this problem about at the end of the previous mentioned period, when the development of computer science made available devices able to perform large amount of computations in a relatively small amount of time. This facilities enabled them to study statistics of growing complexity and therefore to study much difficult problems of estimation. They stated the foundations for the discipline now called computational learning, posing their focus mainly in assuring the effecive computability of the approximating functions. Their main aim was to build efficient algorithms and assuring bounds on the sample size necessary to achieve the desired generalisation -– an algorithmic counterpart on the concept of risk confidence. The scope of this work is to propose an inference model built by integrating the above described statistical and algorithmic properties of the estimators -– or, equivalently, of the algorithms –- used to achieve the parameters estimation goals. A twisting argument is introduced as a way to get the distribution of the parameters to be inferred as a function of the distribution of a minimal sufficient statistic (actually a relaxation of this concept). This procedure is quite general, since after standard regularity conditions, every family of distributions admits such a statistics for its free parameters. This model is applied to the fields of Boolean learning and of linear regression: in the former case, the model introduces the notions of sentry points and detail of the studied Boolean classes, and allows to deal with various kinds of errors affecting both the sample and the statistics. The performance of our estimators is then compared, in terms of confidence intervals, with support vector machines, a model recently introduced and widely used in the last years, getting more reliable shapes for these intervals. In the second case, the problem of finding a regression line in $R^2$ is considered, and the model is shown able to find confidence regions, both for the regression line and for the single points, without requiring the errors measurement to be distributed according to a Gaussian law. the sole constraint that a sufficient statistic for the unknown parameters can be found. In particular, error distributions belonging to a wide subclass of the exponential family can be dealt. Some insights on the non-linear regression fields are then given, noting that some properties of the model make it applicable also when the class of approximating functions have subsymbolical description, for instance when dealing with neural networks. A theoretical explanation for the local minima at the basis of the overfitting phenomenon is given and, as an application of the our approach, it is shown that the test phase of a neural network, that is the estimation of its generalisation capabilities, does not require more sample points than the phase of training, when the neural hypothesis is built.##### Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.