This thesis investigates the semantic analysis of the human behaviour captured by video streaming, both from the theoretical and technological points of view. The video analysis based on the semantic content is in fact still an open issue for the computer vision research community, especially when real-time analysis of complex scenes is concerned. Automated video analysis can be described and performed at different abstraction levels, from the pixel analysis up to the human behaviour understanding. Similarly, the organisation of computer vision systems is often hierarchical with low-level image processing techniques feeding into tracking algorithms and, then, into higher level scene analysis and/or behaviour analysis modules. Each level of this hierarchy has its open issues, among which the main ones are: - motion and object detection: dynamic background modelling, ghosts, suddenly changes in illumination conditions; - object tracking: modelling and estimating the dynamics of moving objects, presence of occlusions; - human behaviour identification: human behaviour patterns are characterized by ambiguity, inconsistency and time-variance. Researchers proposed various approaches which partially address some aspects of the above issues from the perspective of the semantic analysis and understanding of the video streaming. Many progresses were achieved, but usually not in a comprehensive way and often without reference to the actual operating situations. A popular class of approaches has been devised to enhance the quality of the semantic analysis by exploiting some background knowledge about scene and/or the human behaviour, thus narrowing the huge variety of possible behavioural patterns by focusing on a specific narrow domain. In general, the main drawback of the existing approaches to semantic analysis of the human behaviour, even in narrow domains, is inefficiency due to the high computational complexity related to the complex models representing the dynamics of the moving objects and the patterns of the human behaviours. In this perspective this thesis explores an innovative, original approach to human behaviour analysis and understanding by using the syntactical symbolic analysis of images and video streaming described by means of strings of symbols. A symbol is associated to each area of the analysed scene. When a moving object enters an area, the corresponding symbol is appended to the string describing the motion. This approach allows for characterizing the motion of a moving object with a word composed by symbols. By studying and classifying these words we can categorize and understand the various behaviours. The main advantage of this approach consists in the simplicity of the scene and motion descriptions so that the behaviour analysis will have limited computational complexity due to the intrinsic nature both of the representations and the related operations used to manipulate them. Besides, the structure of the representations is well suited for possible parallel processing, thus allowing for speeding up the analysis when appropriate hardware architectures are used. The theoretical background, the original theoretical results underlying this approach, the human behaviour analysis methodology, the possible implementations, and the related performance are presented and discussed in the thesis. To show the effectiveness of the proposed approach, a demonstrative system has been implemented and applied to a real indoor environment with valuable results. Furthermore, this thesis proposes an innovative method to improve the overall performance of the object tracking algorithm. This method is based on using two cameras to record the same scene from different point of view without introducing any constraint on cameras’ position. The image fusion task is performed by solving the correspondence problem only for few relevant points. This approach reduces the problem of partial occlusions in crowded scenes. Since this method works at a level lower than that of semantic analysis, it can be applied also in other systems for human behaviour analysis and it can be seen as an optional method to improve the semantic analysis (because it reduces the problem of partial occlusions).

SEMANTIC ANALYSIS AND UNDERSTANDING OF HUMAN BEHAVIOUR IN VIDEO STREAMING / A. Amato ; relatore: Vincenzo Piuri ; correlatore: Vincenzo Di Lecce ; coordinatore: Ernesto Damiani. Universita' degli Studi di Milano, 2011 Mar 24. 23. ciclo, Anno Accademico 2010. [10.13130/amato-alberto_phd2011-03-24].

SEMANTIC ANALYSIS AND UNDERSTANDING OF HUMAN BEHAVIOUR IN VIDEO STREAMING

A. Amato
2011

Abstract

This thesis investigates the semantic analysis of the human behaviour captured by video streaming, both from the theoretical and technological points of view. The video analysis based on the semantic content is in fact still an open issue for the computer vision research community, especially when real-time analysis of complex scenes is concerned. Automated video analysis can be described and performed at different abstraction levels, from the pixel analysis up to the human behaviour understanding. Similarly, the organisation of computer vision systems is often hierarchical with low-level image processing techniques feeding into tracking algorithms and, then, into higher level scene analysis and/or behaviour analysis modules. Each level of this hierarchy has its open issues, among which the main ones are: - motion and object detection: dynamic background modelling, ghosts, suddenly changes in illumination conditions; - object tracking: modelling and estimating the dynamics of moving objects, presence of occlusions; - human behaviour identification: human behaviour patterns are characterized by ambiguity, inconsistency and time-variance. Researchers proposed various approaches which partially address some aspects of the above issues from the perspective of the semantic analysis and understanding of the video streaming. Many progresses were achieved, but usually not in a comprehensive way and often without reference to the actual operating situations. A popular class of approaches has been devised to enhance the quality of the semantic analysis by exploiting some background knowledge about scene and/or the human behaviour, thus narrowing the huge variety of possible behavioural patterns by focusing on a specific narrow domain. In general, the main drawback of the existing approaches to semantic analysis of the human behaviour, even in narrow domains, is inefficiency due to the high computational complexity related to the complex models representing the dynamics of the moving objects and the patterns of the human behaviours. In this perspective this thesis explores an innovative, original approach to human behaviour analysis and understanding by using the syntactical symbolic analysis of images and video streaming described by means of strings of symbols. A symbol is associated to each area of the analysed scene. When a moving object enters an area, the corresponding symbol is appended to the string describing the motion. This approach allows for characterizing the motion of a moving object with a word composed by symbols. By studying and classifying these words we can categorize and understand the various behaviours. The main advantage of this approach consists in the simplicity of the scene and motion descriptions so that the behaviour analysis will have limited computational complexity due to the intrinsic nature both of the representations and the related operations used to manipulate them. Besides, the structure of the representations is well suited for possible parallel processing, thus allowing for speeding up the analysis when appropriate hardware architectures are used. The theoretical background, the original theoretical results underlying this approach, the human behaviour analysis methodology, the possible implementations, and the related performance are presented and discussed in the thesis. To show the effectiveness of the proposed approach, a demonstrative system has been implemented and applied to a real indoor environment with valuable results. Furthermore, this thesis proposes an innovative method to improve the overall performance of the object tracking algorithm. This method is based on using two cameras to record the same scene from different point of view without introducing any constraint on cameras’ position. The image fusion task is performed by solving the correspondence problem only for few relevant points. This approach reduces the problem of partial occlusions in crowded scenes. Since this method works at a level lower than that of semantic analysis, it can be applied also in other systems for human behaviour analysis and it can be seen as an optional method to improve the semantic analysis (because it reduces the problem of partial occlusions).
24-mar-2011
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
Semantic analysis of video streaming ; human behavior ; grammar based approach
PIURI, VINCENZO
DAMIANI, ERNESTO
Doctoral Thesis
SEMANTIC ANALYSIS AND UNDERSTANDING OF HUMAN BEHAVIOUR IN VIDEO STREAMING / A. Amato ; relatore: Vincenzo Piuri ; correlatore: Vincenzo Di Lecce ; coordinatore: Ernesto Damiani. Universita' degli Studi di Milano, 2011 Mar 24. 23. ciclo, Anno Accademico 2010. [10.13130/amato-alberto_phd2011-03-24].
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R07643.pdf

Open Access dal 29/03/2011

Tipologia: Tesi di dottorato completa
Dimensione 2.08 MB
Formato Adobe PDF
2.08 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/155497
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact