An architectural model for combining spatial-based and object-based information for attentive video analysis