A spatio-temporal learning approach for crowd activity modelling to detect anomalies

Rao, Arjun

145031_Rao2009.pdf (2.219Mb)

Access Status

Open access

Authors

Rao, Arjun

Date

2009

Supervisor

Dr. Tele Tan

Dr. Mihai Lazarescu

Type

Thesis

Award

MSc

Metadata

Show full item record

School

Department of Computing

URI

http://hdl.handle.net/20.500.11937/636

Collection

Curtin Theses

Abstract

With security and surveillance gaining paramount importance in recent years, it has become important to reliably automate some surveillance tasks for monitoring crowded areas. The need to automate this process also supports human operators who are overwhelmed with a large number of security screens to monitor. Crowd events like excess usage throughout the day, sudden peaks in crowd volume, chaotic motion (obvious to spot) all emerge over time which requires constant monitoring in order to be informed of the event build up. To ease this task, the computer vision community has been addressing some surveillance tasks using image processing and machine learning techniques. Currently tasks such as crowd density estimation or people counting, crowd detection and abnormal crowd event detection are being addressed. Most of the work has focused on crowd detection and estimation with the focus slowly shifting on crowd event learning for abnormality detection.This thesis addresses crowd abnormality detection. However, by way of the modelling approach used, implicitly, the tasks of crowd detection and estimation are also handled. The existing approaches in the literature have a number of drawbacks that keep them from being scalable for any public scene. Most pieces of work use simple scene settings where motion occurs wholly in the near-field or far-field of the camera view. Thus, with assumptions on the expected location of person motion, small blobs are arbitrarily filtered out as noise when they may be legitimate motion in the far-field. Such an approach makes it difficult to deal with complex scenes where entry/exit points occur in the centre of the scene or multiple pathways running from the near to the far-field of the camera view that produce blobs of differing sizes. Further, most authors assume the number of directions people motion should exhibit rather than discover what these may be. Approaches with such assumptions would result in loss of accuracy while dealing with (say) a railway platform which shows a number of motion directions, namely two-way, one-way, dispersive, etc. Finally, very few contributions of work use time as a video feature to model the human intuitiveness of time-of-day abnormalities. That is certain motion patterns may be abnormal if they have not been seen for a given time of day. Most works use it (time) as an extra qualifier to spatial data for trajectory definition.In this thesis most of these drawbacks have been addressed by dealing with these in the modelling of crowd activity. Firstly, no assumptions are made on scene structure or blob sizes resulting therefrom. The optical flow algorithm used is robust and even the noise presented (which is infact unwanted motion of swaying hands and legs as opposed to that from the torso) is fairly consistent and therefore can be factored into the modelling. Blobs, no matter what the size are not discarded as they may be legitimate emerging motion in the far-field. The modelling also deals with paths extending from the far to the near-field of the camera view and segments these such that each segment contains self-comparable fields of motion. The need for a normalisation factor for comparisons across near and far field motion fields implies prior knowledge of the scene. As the system is intended for generic public locations having varying scene structures, normalisation is not an option in the processing used and yet the near & far-field motion changes are accounted for. Secondly, this thesis describes a system that learns the true distribution of motion along the detected paths and maintains these. The approach is such that doing so does not generalise the direction distributions which would cause loss in precision. No impositions are made on expected motion and if the underlying motion is well defined (one-way or two-way), then this is represented as a well defined distribution and as a mixture of directions if the underlying motion presents itself as so.Finally, time as a video feature is used to allow for activity to re-enforce itself on a daily basis such that motion patterns for a given time and space begin to define themselves through re-enforcement which acts as the model used for abnormality detection in time and space (spatio-temporal). The system has been tested with real-world data datasets with varying fields of camera view. The testing has shown no false negatives, very few false positives and detects crowd abnormalities quite well with respect to the ground truths of the datasets used.