TIWO: update on progress
November 2004
Reports summarising the work done in each of the three main workpackages
are now available from this website.
Also available are the MPhil-PhD transfer reports by Elia Tomadaki, Andrew
Vassiliou and Yan Xu which detail some of the research findings mentioned
below.
In TIWO we have developed
models and algorithms for generating machine-executable representations of
semantic video content from different kinds of text that describe the moving
image. Previously systems dealing
with semantic video content have treated it as an inventory of events and
existents, organised in space and time, but have not dealt the narrative aspects
of moving images. Video retrieval
systems have tended to use visual features, or information extracted from one
kind of text – typically subtitles, or closed captions. We focussed on films where dealing with
semantic content involves modelling and generating representations of a film’s
narrative, i.e. a sequence of events connected by cause-effect relationships
where the agents of cause-effect are often characters with mental states,
goals, beliefs and desires. Our
approach is to extract and integrate information from different kinds of texts
associated with films, including film scripts, plot summaries and audio
description. Results will be
applied to assist audio description professionals and film viewers in
retrieving and navigating digital film libraries. Progress has been made with respect to three main
challenges: cross-document co-reference; extraction of information about
characters’ emotions; and, novel kinds of video browsing.
Cross Document Co-reference
A first step in integrating
information from different texts is to identify cross-document co-reference, i.e. fragments of different texts that
refer to the same entity or event.
Most previous work has concentrated on information about entities
extracted from different texts of the same type, e.g. news stories. We are working on information about
events in two very different text types – plot summaries (typically about 200
words long, referring to about 10 major events in a film) and audio description
(typically about 5000-8000 words long, describing the on-screen action for the
visually impaired). Our method is
to select keywords for each event in the first text (plot summary) and do an
IR-like search in the second text (audio description). Selecting and matching verbs directly
is not possible, for example a ‘murder’ event mentioned in a plot summary is
described as a sequence of smaller actions in the audio description. Selecting and matching the participants
of events, and their grammatical roles, achieves about 50-60% precision and recall. Ongoing work concerns ‘query expansion’
of verbs and we are evaluating existing schemes for event decomposition and
knowledge representation for this task.
Affect in Text Describing Films
We have found that one way to
access information about a film’s narrative is to concentrate on characters’ emotional states. A character’s emotional state can be
considered as their reaction to events unfolding around them, and their
reaction is determined by how they think those events impact on their goals. Thus information about characters’
emotional states can be revealing of a film’s narrative modelled as a sequence
of events connected by cause-effect relationships. We have developed a method for extracting information about
characters’ emotions from time-aligned texts such as audio description and film
scripts. This information appears
to be useful for video retrieval by story similarity, and for reasoning about a
film’s narrative.
Video Browsing via Characters’ Affect States and Goals
One motivation for generating
machine-executable representations of a film’s semantic content is to
facilitate novel kinds of video browsing. We are developing a video browsing
system based on representations of characters’ affect states and goals. At any point in the film the user is
shown key-frames from other scenes that are related to the current scene. The system will be evaluated in terms
of how it helps users find answers to questions they have about a film,
particularly of the kind ‘Why did X do Y’.