At the discovery of the software process through machine learning techniques

Damiani, E.; Gianini, G.

The fact that manual metrics collection often produces low quality data and the fact that some processes, such as light weight processes, by definition should not be tracked by heavy weight, invasive metrics, naturally leads to the use of the automatic, non-invasive collection of process data directly from the software process tools (such as requirement management environments, developing environments, and configuration management environments). The most apparent consequence of this is the natural emergence of very large data sets, namely with several thousands of events per developer per day. Those data, while on one hand might not be immediately interpretable, on the other, not only embed enough information to reconstruct some of the traditional process metrics, but - depending upon the extent automatic collection probes are allowed to work - can also hide a description of the process by far more accurate methods than those any manually gathered metrics can afford (e.g. a rich tracking of the navigation within the file system or within the very software structure, fine time granularity, tracing of the usage or of the switching between applications, tracing of the data they exchange, and so on). To make this wealth of information useful, some synthetic knowledge has to be drawn from the raw data, either in the form of exemplary local patterns (good practices) or in terms of global features of the dynamics of the process, so as to create new metrics that can be used later for software process validation/conformity assessment, or for process control and improvement. In this crucial activity of interpretation, due to the quantity of data to process and to their high dimensionality and complexity, the educated guess of trained statisticians and domain experts has to be supported by automatic knowledge extraction techniques and tools. In this talk we review the history of the effort of automatic knowledge extraction from software process data, describe the features of current approaches and outline the potential future developments and benefits for Software Process Engineering.

At the discovery of the software process through machine learning techniques / E. Damiani, G. Gianini - In: Proceedings of the IASTED International conference on software engineering : as part of the 24. IASTED International multi-conference on applied informatics : february 14-16, 2006, Innsbruck, Austria / [a cura di] P. Kokol. - Anaheim : ACTA press, 2006. - ISBN 0889865728. (( convegno IASTED International Conference on Software Engineering tenutosi a Innsbruck nel 2006.