Mapping the semantic landscape of film: computational extraction of indices through film grammar
This thesis presents work aimed at exploiting the grammar of film for the purpose of automated film understanding, and addresses the semantic gap that exists between the simplicity of features that can be currently computed in automated content indexing systems and the richness of semantics in user queries posed for media search and retrieval. The problem is set within the broader context of the need for enabling technologies for multimedia content management, and arises in response to the growing presence of multimedia data made possible by advances in storage, processing, and transmission technologies. The first demonstration of this philosophy uses the attributes of motion and shot length to define and compute a novel measure of film tempo. Tempo flow plots are defined and derived for a number of full length movies, and edge analysis is performed leading to the extraction of dramatic story sections and events signaled by their unique tempo. In addition to the development of this computable tempo measure, a study is conducted as to the usefulness of biasing it toward either of its constituents, namely motion or shot length. Thirdly, a refinement is made to the shot length normalizing mechanism, driven by the peculiar characteristics of shot length distribution exhibited by movies. The next aspect of film examined is film rhythm. In the rhythm model presented, motion behaviour is classified as being either nonexistent, fluid or staccato for a given shot. Shot neighbourhoods in movies are then grouped by proportional makeup of these motion behavioural classes to yield seven high-level rhythmic arrangements that prove adept at indicating likely scene content (e.g., dialogue or chase sequence). The second part of the investigation presents a novel computational model to detect editing patterns as either metric, accelerated, decelerated, or free.It is also found that combined motion and editing rhythms allow us to determine that the media content has changed and hypothesize as to why this is so. Three such categories are presented along with their efficacy for capturing useful film elements (e.g., scene change precipitated by plot event). Finally, the first attempt to extract narrative structure, the prevalent 3-Act storytelling paradigm in film, is detailed. The identification of act boundaries in the narrative allows for structuralizing film at a level far higher than existing segmentation frameworks which include shot detection and scene identification, and provides a reliable basis for inferences about the semantic content of dramatic events in film. Additionally, the narrative constructs identified have analogues in many other domains, including news, training video, sitcoms, etc., making these ideas widely applicable. A novel act boundary posterior function for Act 1 and 2 is derived using a Bayesian formulation under guidance from film grammar, tested under many configurations, and the results are reported for experiments involving 25 full-length movies. The framework is shown to have a role in both the automatic and semi-interactive setting for semantic analysis of film.
|dc.subject||automatic film interrogation|
|dc.subject||semantic gap indexing|
|dc.title||Mapping the semantic landscape of film: computational extraction of indices through film grammar|
|curtin.department||School of Computing|