TIWO - Television in Words: FINAL REPORT
EPSRC
GR/R67194/01, January 2002-5, principal investigator – Dr Andrew Salway
www.computing.surrey.ac.uk/personal/pg/A.Salway/tiwo/TIWO.htm
The overall aim was to develop computational
understanding of narrative in multimedia systems with applications for digital
libraries. The novel research scenario
was that of audio description whereby stories told in moving images (film and
television) are retold in words for visually impaired audiences. Two general advances have been made. First, we have shown how semantic video
content can be analysed and described in terms of narrative structures. Second, we have opened up audio description
as a scenario for interdisciplinary research.
The main research results comprise: basic knowledge about how experts
put moving images into words, new methods for extracting information about a
film’s story, and an evaluation of a story-based hypervideo structure for
browsing films. So far there have been
six publications [1-6] in the areas of multimedia, language engineering and
artificial intelligence; five further papers are under review [7-11] and four
journal papers are in preparation [12-15].
TIWO has also had cross-disciplinary impact with interest generated in
the fields of information science, audiovisual translation, film theory,
narratology, and new media. We envisage
two ways in which project results will mean more audio description is made for
television programmes and films, above the amounts required by legislation, to
the benefit of visually impaired audiences.
First, some results might be applied to part-automate the writing and
editing of audio description scripts to reduce costs and improve quality. Second, we have demonstrated how audio
description can be reused to generate metadata for video retrieval and browsing
in digital libraries: this should give broadcasters and film distributors more
incentive to produce audio description.
The industry-based TIWO Round Table was active throughout the project
with partners contributing the time of senior personnel worth £23,500 and audio
description data. Discussions about
exploitation and knowledge transfer are in progress with BBC, ITFC, Softel,
RNIB and Philips Research.
International
Research Context
Narrative is a multi-faceted phenomenon studied by
philosophers, literature and film scholars, linguists, cognitive scientists and
computer scientists [23]. The study of
narrative explores what the media-independent features of stories are, how
different kinds of media can convey (the same) stories, and how stories are
understood [24]. Narrative involves
chains of events in cause-effect relationships occurring in space and time,
where the agents of cause-effect are characters with goals, beliefs and
emotions [25]. This definition applies
to actual and fictional stories told in any media. For some, narrative abilities considered both as a mode of
thought and of discourse are fundamental to intelligence [26, 27]. Within computer science narrative is
important for story understanding and generation systems, human-computer
interaction, and virtual worlds for entertainment and education [28-31]. We have proposed a further reason to study
narrative within the context of computing.
It has been noted that intelligent multimedia information retrieval
systems must analyse and describe the content of multimedia data at different
levels of abstraction, in order to give users intuitive retrieval, browsing and
summarization functionality [32]. In
[4] we argue that to deal with the semantic content of multimedia data it is
essential to analyse and describe a level of abstraction relating to narrative structures;
the MPEG-7 standard for a multimedia content description interface includes
descriptors for semantic entities in narrative worlds [33]. As an example application, consider a video
player that visualizes a video’s storyline, explains why a character behaves in
a certain way and retrieves videos depicting similar stories. The ‘semantic gap’ is a major challenge for
multimedia computing and the mapping of visual features to the meanings
conveyed by image/video data requires the integration of information from
associated text [34]. The integration
of visual and verbal information is a longstanding issue for AI [35, 36].
Intuitive and innovative forms of video retrieval,
browsing and re-use in digital libraries require video data to be made into a
structured medium by analysis and description of its content [37]. The description of video content requires
video data models and knowledge representation formalisms [38-41]. The analysis of video content requires the
extraction and fusion of features from visual, audio and textual data
streams. A small set of semantic video
features can be detected automatically from the visual component of video data
[42]. Most solutions also make some use
of text associated with moving images such as closed captions and scripts
[43-45]; the recently launched Video
Google system appears to base its retrieval of television programmes on
keywords in closed captions [46]. In MUMIS, information was integrated from
multiple texts, like commentaries and match reports, to index videos of soccer
matches [47]. Most research in
multimedia information retrieval and extraction has dealt with the semantic
content of multimedia data as spatio-temporal inventories of entities and
events, which neglects important narrative aspects. However, over the last 2-3 years there has been growing interest,
within the multimedia community, in analysing films in digital libraries and
this has led to more attention being paid to story-related features. Films are interesting and challenging examples
of multimedia data for a number of reasons.
The content of a film is communicated to the audience by patterns of
light and shade, dialogue, sound effects, film editing techniques, music, and
the actions of characters. To
appreciate a film, a viewer must not only recognise what is depicted, but also
understand cause-effect relationships, characters’ emotional reactions and
ideas about the filmmaker’s intention.
Film browsing systems based on representations of story structures have
been proposed, but the representations were hand-crafted [48, 49]. An important first step in structuring the
semantic video content of films automatically is the segmentation of shots and
scenes [50]. Pixel motion and shot
length were combined to measure the tempo of films: changes in tempo were found
to coincide with key points in a film’s story [51]. The presence and absence of characters gives a rhythm that may be
used for topic segmentation and film classification [52]. Colour features have been used to classify the
moods of scenes and film genres [53, 54].
In July 2004, the use of film screenplays as descriptions of video data
was discussed in [55]. In October 2004
researchers presented a ‘narrative abstraction model’ that generates video
summaries based on measures of dramatic importance by analysing the patterns of
characters’ appearances and interactions [56].
In TIWO we have identified audio description as a
scenario to study narrative in multimedia systems and as a new source of
information to structure video data. We
believe that as a surrogate for the moving image, audio description is a more
computationally-tractable source of information about a film’s story than the
video data itself, and as such can provide a stepping stone for crossing the
semantic gap. Compared with closed
captions and scripts, we believe that audio description gives more reliable
information about what is depicted in films and television programmes. Audio description is made to enhance the
enjoyment of television and cinema for visually impaired viewers. In the gaps between existing speech, audio
description gives key information about scenes, characters’ appearances and
actions via an extra audio track. Audio
description is scripted before it is recorded so it is available as time-coded
written text. Audio description is made
by trained professionals: it may take 60 hours to describe a 2-hour film. In the UK, the 2003 Communications Act
requires established terrestrial, satellite and cable television broadcasters
to provide audio description with 10% of their output. There are 150 cinemas in the UK that provide
it for most major films. Audio
description is also available in the US, Canada, France, Germany, Japan,
Australia, Ireland and Spain.
Key Advances
made by TIWO
We have introduced and developed the idea that
narrative is important for understanding the relationship between visual and
verbal information in computational systems, and for intelligent multimedia
knowledge management applications [4].
We believe that this will lead to a theoretical underpinning for
existing and future approaches to bridging the semantic gap.
We have developed algorithms that generate new
descriptions of semantic video content relating to narrative structures from
different kinds of text associated with films [1-5]. This work is unique in dealing with films’ stories in terms of
characters’ emotions. When viewers
watch a film they make sense of, and anticipate, the unfolding events depicted
on-screen, based at least in part on what they think about characters’
cognitive states, e.g. their goals, beliefs and emotions. In [2] we presented a method to
extract and visualise information about characters’ emotions in films from
audio description with 83% precision and 63% recall. This
information may be useful for video retrieval by story similarity, video
summarisation and for reasoning about a film’s story. The method is based on Ortony’s cognitive
theory of emotions that links a character’s emotional states to the events in
their environment [57]. In further work
we proposed a metric for comparing the similarity of two stories (films) based
on the distribution of characters’ emotions [1], and showed how emotions can be
assigned to specific characters [18]. We were also
interested in extracting and integrating information from plot summaries and
film scripts. A first step in
information integration is to identify cross-document co-reference (CDCR), i.e.
fragments of different texts that refer to the same entity or event [58]. We proposed algorithms to detect
cross-document co-reference between mentions of events in two very different
text types – plot summaries and audio description. This is hard because matching verbs directly is not possible, for
example a ‘murder’ event mentioned in a plot summary is described as a sequence
of smaller actions in the audio description.
We found that simply selecting and matching the participants of events,
and their grammatical roles, achieves about 50-60% precision and recall
[3]. Our ongoing work is trying to
automatically correlate verbs in plot summaries with verbs/phrases in audio
description in order to do query expansion for CDCR. Another kind of integration needs to take place between
information extracted from texts and the intervals of video data it refers to:
our investigation of temporal information in audio description specified three
distinct tasks and new requirements for temporal information annotation schemes
[5].
We have
investigated the use of Lehnert’s plot units to structure hypervideo links
between intervals of video data [10, 20].
Hypervideo offers new ways to watch and to interact with video data
[59], but little research has been done into how hypervideo can be used to
watch and re-watch feature films. To be
engaging and to facilitate intuitive interaction, it
is important that hypervideo structures reflect story structures. Previous researchers have proposed knowledge
representation formalisms to structure films for hypervideo browsing [48, 49],
but this work did not include any detailed analysis of story structures, nor
extensive user evaluation. We have
analysed two full-length feature films in terms of plot units. Plot units represent cause-effect
relationships between characters’ affect states and the events in a story
[60]. We developed the NAFI system
(Navigating Films) to store and edit data about plot units, to allow users to
navigate films by following hypervideo links based on the structure of the plot
units, and to record users’ actions for subsequent analysis by researchers;
NAFI is available to researchers from the TIWO website. The effect of plot units was evaluated by
having 30 users complete question-answering tasks using the system. Results suggest that with the hypervideo
links subjects gave better answers to questions about the
film’s story. Feedback showed that they
found the navigation experience to be engaging and enjoyable. Now we are looking to partially automate the
generation of plot unit data, encouraged by recent automatic analyses of story
structure, including our own [2] and others [51, 56].
A crucial part of the basic research in TIWO,
underlying the developments described above, has been our corpus-based
investigations into the language used to describe moving images in audio
description, film scripts and plot summaries.
We have been interested to determine whether these texts exhibit
distinctive idiosyncratic linguistic features.
This linguistic variance is predicted because the texts are written by
trained experts for a specific purpose [61].
On the one hand, distinctive linguistic features would be interesting to
give insights into how experts put moving images into words, and on the other
hand they can be exploited by information extraction techniques. Results show an unusually high number of
open-class words among the most frequent words in corpora of audio description,
plot summaries and film scripts, some of which relate to the emotions that
characters are experiencing [19, 21]. In audio description there is a tendency for
verbs to refer to material processes, whereas in plot summaries more verbs
refer to mental processes [3]. There is
a preponderance of words relating to temporal information in audio description
[5]. We found that phrases with the
words looks, turns, smiles and door were especially
frequent in screenplays and audio description and argued that they are frequent
because the basic actions they describe are important story-telling elements
for filmed narrative [11]. Previous
work in mapping audio-visual features to high-level film content has drawn on
film theory and the conventions of film grammar [50, 51]. In ongoing work we are
exploring whether the depiction of basic actions follows patterns in line with
film-making and story-telling conventions.
As well as helping information extraction, we hope that our kind of
analysis can provide empirical data to test theories about film-making and
story-telling.
The only research project about audio description prior
to TIWO was AUDETEL which dealt with technical issues for the broadcast of
audio description and developed guidelines for audio describers [62]. We believe that we have been successful in
promoting audio description as a research scenario for a number of research
communities. For multimedia computing,
audio description is a novel kind of text to use for structuring video data and
there is potential to develop and apply multimedia processing techniques in
systems that semi-automate the production of audio description, e.g. to detect
quiet audio spaces, scene changes, characters and actions, and to summarise
descriptions to fit available time. For
artificial intelligence, audio description allows researchers to study story
understanding with examples of complex stories that are told in a relatively
constrained language. TIWO has also
stimulated interest in audio description from corpus linguists and in the field
of audiovisual translation.
TIWO has contributed to other research on
visual-verbal information that the PI has been involved with during the last
three years, including: the analysis of painting captions [6]; the
classification of different ways images and texts combine in communication [7,
8]; and the analysis of surveillance video [9].
Review of
Project Plan and Description of Work Carried Out
Eleftheria Tomadaki focused
on corpus analysis and the extraction of information from audio description and
plot summaries. Yan Xu focused on video
data modelling, knowledge representation and the development of the NAFI
system. Andrew Vassiliou extended and
synthesised results from the language engineering and knowledge representation
strands.
Like previous work in intelligent multimedia
information retrieval [32], we sought to combine data modelling, knowledge
representation and the automatic analysis of multimedia content – in our case
via text associated with moving images, and we evaluated our work with metrics
such as precision and recall. We
grounded our work in theories about media, specifically narrative, analogous to
recent proposals for computational media aesthetics [63]. Software was developed following an
object-oriented approach with UML and Java, with reuse of existing software
where possible. The project was managed
effectively with all objectives being met.
A report was written for each workpackage 1-3 to facilitate
dissemination and technology transfer by summarizing and contextualizing the
deliverables for a general readership [16-18].
Workpackage 1: Adapt and Apply Video Data
Modelling and Knowledge Representation
The report for WP1 [16] reviews video data compression
formats, metadata standards, video data models and the use of knowledge
representation formalisms in systems that process video data. It also shows our modelling of filmic
narrative content: this work comprises a series of UML models of narrative
based on a theory of narratology [64], and the representation of two feature
films using a knowledge representation formalism [60]. With feedback from the Round Table, the
outcome was an ‘ideal’ representation of narrative content which fed into
Workpackage 3.
Workpackage 2: Adapt and Apply Language Engineering
Techniques
A
500,000 word corpus of audio description scripts (60 feature films and a selection of television
programmes)
was gathered from audio describers at ITFC, RNIB and BBC. To ensure a representative corpus, we
consulted two audio description experts and established nine categories of
films in terms of how the experts thought audio description would vary. We also gathered and analysed corpora of plot summaries
(114 films, 15,500 words) and film scripts (71 films, 1,930,000 words) from the
web. Surrey’s System Quirk was used to
analyse the idiosyncratic linguistic features in these corpora. Other existing packages were applied for
information extraction including Sheffield’s GATE, the Connexor tagger and
WordNet. The corpus gathering, analysis
and results are summarised in the report for WP2 [17]; results fed into
Workpackage 3.
Workpackage 3: Specification and Prototyping of an
Audio Description System
At each Round Table meeting there was discussion about
how current and future technology could be applied to make better quality audio
description more efficiently, and how audio description could be reused as a
source of descriptions for video indexing and browsing. These discussions were in the context of
results from WP1 and WP2 presented by the research team and the Round Table’s knowledge
of current audio description practice and technology. The report for WP3 [18] contains a detailed specification of user
requirements for a system to support the production and the reuse of audio
description, a review of currently available systems and technologies and
summaries of our prototypes. Four
systems were prototyped and evaluated to address some of the desired
functionality whilst relating to the overall project aim of understanding
narrative in multimedia systems. The
prototypes were for: keyword-based video retrieval; extraction of information
about characters’ emotions; the integration of plot summaries and audio
descriptions; and, hypervideo browsing.
Workpackage 4: TIWO Industry-based Round Table
There were seven meetings of the Round Table, each
about six hours long, at the university, ITFC, RNIB and Softel between
17/12/2001 and 7/2/2005. In between
these the research team visited individual organisations. Meetings were used to elicit expertise about
audio description practice and technology, audio description data and user
requirements and to communicate research findings and identify opportunities
for exploitation. For dissemination
purposes, the project website makes available all publications, reports,
presentations and software. There was a
high level of attendance at meetings by senior personnel (see Final Report form
for details of the partners’ participation).
Research Impact
of TIWO and Benefits to Society
TIWO is timely as the multimedia community focuses
more on how to analyse and describe higher-levels of ‘semantic content’ in
multimedia data. TIWO is also timely
regarding efforts to raise the profile of audio description, notably by the
RNIB, following the 2003 Communications Act.
The review letters from the project partners encourage our belief that
TIWO has and will continue to have social and commercial benefit by showing how
technology can help to make better quality audio description more cheaply, and
by showing the added-value of reusing audio description for video retrieval and
browsing. Our findings about the
language of audio description are relevant for those training audio describers
and for those who develop and maintain standards for audio description. We expect that the kinds of information about
narrative structures that can be extracted automatically from audio description
will support novel ways for users to interact with digital video libraries of
film and television.
Invited Lecture, Talks and Seminars given about TIWO
“Computing
Moving Images: beyond the pixel", Cross-overs
in Audiovisual Arts and Interactive Media, University of Art and Design
Helsinki, June 2004
“Computer
Science R&D and Relevance to New Media”, Interactive Screen, Banff New Media Institute, Canada, July 2003.
“Intelligent Multimedia Data Systems: adding structure and meaning to data”, Symp. on AI and Games, BNMI, August 2002.
“A
Corpus-based Analysis of the Language of Audio Description”, Transmedia International Conference on
Audiovisual Translation, Barcelona, June 2005.
"Television in Words: the TIWO
project", International Conference on Audiovisual Translation,
London, February 2004.
· “The
analysis of an audio description corpus”, Faculty of Translation, University of
Granada, April 2005.
· "Image-Text Relations in Multimedia Systems", School of Comp.,
Math. and Info. Sciences, Uni. of Brighton, April 2004.
· “Collateral Media and Narrative”, Centre for Digital Video Processing,
Dublin City University, August 2003.
· “Explicating
the Image-Text Link for Cross-modal IR”, Department of Information Studies,
University of Sheffield, July 2003.
· “Narrative
Intelligence for Multimedia”, Department of Computer Science, University of
Exeter, May 2003.
·
“Audio Description for Film and TV: computing
narrative.” Centre for Corpus
Linguistics, Uni. of Birmingham, Feb. 2003.
· “What’s
in a Link Between an Image and a Text?”
Institute for Language, Speech and Hearing, Uni. of Sheffield, May 2002.
TIWO Publications and Reports, including under
review and in preparation
[1] Vassiliou, Salway and Pitt (2004), ‘Formalising Stories: sequences of
events and state changes’, IEEE Conference
on Multimedia and Expo, ICME 2004.
[2] Salway
and Graham (2003), ‘Extracting Information about
Emotions in Films’, ACM Multimedia 2003,
pp. 299-302.
[3] Tomadaki and Salway (2005), ‘Matching verb attributes for cross-document
event co-reference’, in Erk et al. (eds.) Procs. Interdisciplinary Workshop
on the Identification and Representation of Verb Features and Verb Classes,
pp. 127-132.
[4] Salway, Graham, Tomadaki and Xu (2003), ‘Linking Video and Text via
Representations of Narrative', AAAI
Spring Symposium on Intelligent Multimedia Knowledge Management, pp.
104-112. ISBN 1-57735-190-8.
[5]
Salway and Tomadaki (2002), ‘Temporal
Information in Collateral Texts for Indexing Moving Images’, in Setzer
and Gaizauskas (eds.) Procs. LREC 2002 W’shop on
Annotation Standards for Temporal Information in Natural Language,
pp. 36-43.
[6] Salway and Frehen (2002), ‘Words for Pictures: analysing a corpus of art
texts’, in Procs. TKE 2002 –
Terminology and Knowledge Engineering.
[7] Martinec and Salway, ‘A system for image-text relations in new (and
old) media’, submitted to J. of Visual
Communication.
[8]
Salway and Martinec, ‘Image-text relations for
multimedia information extraction’, submitted to 5th ACM Symp. Doc. Eng.
[9]
Ahmad, Bennett, Mountstephens, Cheng,
Vassiliou and Salway, ‘Video Summarisation: Variation in Attention and
Linguistic Description of Surveillance Videos’, submitted to IEE International Symposium on Imaging for
Crime Detection and Prevention.
[10] Salway and Xu, ‘Navigating Stories in Films: a case study in
hypervideo’, submitted to IEEE ICME 2005.
[11]
Salway, Vassiliou and Ahmad, ‘What Happens
in Films?’, submitted to IEEE ICME 2005.
[12] Salway,
Vassiliou, Tomadaki and Ahmad, ‘Extracting narrative information’, to be
submitted to IEEE Trans. on Multimedia.
[13]
Xu and Salway, ‘Using plot units to structure films as
hypervideo’, to be submitted to ACM
Computers in Entertainment.
[14]
Salway, ‘Audio Description: challenges and
opportunities for multimedia computing’, to be submitted to IEEE Multimedia.
[15]
Salway, ‘Experts’ descriptions of visual information’,
to be submitted to Computers and the
Humanities.
[16]
Salway,
Vassiliou and Xu (2005), ‘Data Models and Knowledge Representation for Video
Data’. TIWO report for WP 1.
[17]
Salway,
Tomadaki and Vassiliou (2005), ‘Building and Analysing a Corpus of Audio
Description.’ TIWO report WP 2.
[18]
Salway
(2005), ‘AuDesc System Specification and Prototypes’. TIWO report for WP 3.
[19]
Tomadaki
(2003), ‘Integrating Information from Collateral Media’, MPhil-PhD Transfer
Report, University of Surrey.
[20]
Xu
(2003), ‘Video Retrieval by Semantic Content’, MPhil-PhD Transfer Report,
University of Surrey
[21]
Vassiliou
(2004), ‘Representing Narrative in Multimedia Systems’, MPhil-PhD Transfer
Report, University of Surrey.
[22] A.
Salway and K. Ahmad, ‘Talking Pictures: Indexing and Representing Video with
Collateral Texts’, Procs. 14th TWLT, 85-94, 1998.
[23]
D. Herman, Story Logic: problems and possibilities of
narrative, Uni. of Nebraska Press, 2002.
[24]
M.-L. Ryan
(ed.), Narrative Across Media: the
languages of storytelling. Uni. of
Nebraska Press, 2004.
[25] D. Bordwell and K.
Thompson, Film Art: An Introduction.
McGraw-Hill 5th Edition, New York, 1997.
[26] J. Bruner, ‘The
Narrative Construction of Reality.’ Critical
Inquiry 18, pp. 1-21, 1991.
[27]
R. Schank, Tell me a Story: narrative and
intelligence. Northwestern University Press, 1990.
[28]
R. Schank and C. K. Riesbeck, Inside Computer
Understanding: five programs plus miniatures. Lawrence Erlbaum Associates: Hillsdale, NJ, 1981.
[29]
E. T. Mueller,
‘Story understanding through multi-representation model construction’, Procs
HLT-NAACL 2003 Workshop, 46-53.
[30]
C. B. Callaway
and J. C. Lester, ‘Narrative Prose Generation’, Artificial Intelligence
139, 213-252, 2002.
[31]
M. Mateas and P. Sengers (eds.), Narrative Intelligence. John Benjamins, 2002.
[32]
M. Maybury, Intelligent Multimedia Information
Retrieval. AAAI Press / The MIT
Press, 1997.
[33]
B. S. Manjunath, P. Salembier and T. Sikora (eds.), Introduction to MPEG-7: multimedia content
description interface. John Wiley
and Sons, 2002.
[34] Content-based
image retrieval: the end of the early years’, IEEE Trans. PAMI, 22 (12), 1349 - 1380, 2000.
[35]
R. K. Srihari,
‘Computational Models for Integrating Linguistic and Visual Information: A
Survey’, Artificial Intelligence Review,
8(5-6), 349-369, 1995.
[36]
K. Barnard et
al, ‘Matching Words and Pictures’, Journal of Machine Learning Research,
3, 1107-1135, 2003.
[37] N. Dimitrova et al,
‘Applications of Video-Content Analysis and Retrieval’, IEEE Multimedia 7(3), 42-55, 2002.
[38]
S.-C. Chen, R.
L. Kashyap and A. Ghafoor, Semantic
Models for Multimedia Database Searching and Browsing. Kluwer Academic Publishers, 2000.
[39]
H. W. Agius and
M. C. Angelides, ‘Modelling Content for Semantic-Level Querying of Multimedia’,
Multimedia Tools and Applications,
15, 5-37, 2001.
[40]
F. Kokkoras et
al., ‘Smart VideoText: a Video Data Model based on Conceptual Graphs’, Multimedia
Systems, 8, 328-338, 2002.
[41]
A. Parkes ‘The Prototype CLORIS System’, Information Processing and Management 25
(2), 171-186, 1989.
[42]
A. F. Smeaton,
P. Over and W. Kraaij, ‘TRECVID:
Evaluating the Effectiveness of Information Retrieval Tasks on Digital Video’, ACM
Multimedia 2004.
[43]
H. D. Wactlar et
al., ‘Lessons Learned from Building a Terabyte Digital Video Library’, Computer, Feb 1999, 66-73.
[44]
S. Satoh, Y.
Nakamura and T. Kanade, ‘Name-it: Naming and detecting faces in news videos’, IEEE Multimedia, 6 (1), 22-35, 1999.
[45]
J. Wachman and
R.W. Picard, ‘Tools for Browsing a TV Situation Comedy Based on Content
Specific Attributes’, MM Tools and Apps,
13, 255-284, 2001.
[47]
J. Kuper,
‘Intelligent Multimedia Indexing and Retrieval through Multi-source Information
Extraction and Merging’, IJCAI 2003, 409-414.
[48] V. Roth,
‘Content-based retrieval from digital video’, Image and Vision Computing,
17, 531-540, 1999.
[49] R. B. Allen and J.
Acheson, ‘Browsing the Structure of Multimedia Stories’, Procs. 5th ACM
Conference on Digital Libraries, 11-18, 2000.
[50] H. Sundaram and S.-F.
Chang, ‘Computable Scenes and Structures in Films’, IEEE Trans. Multimedia 4 (4), 482-491, 2002.
[51] B. Adams, C. Dorai,
and S. Venkatesh, ‘Towards Automatic Extraction of Expressive Elements for
Motion Pictures: Tempo’, IEEE Trans.
Multimedia 4 (4), 472-481, 2002.
[52] K. Shirahama, K.
Iwamoto, and K. Uehara, ‘Video Data Mining: Rhythms in a Movie’, Procs. IEEE Int. Conf. Multimedia and Expo,
ICME 2004.
[53] C.-Y. Wei, N.
Dimitrova and S.-F. Chang, ‘Color-Mood Analysis of Films Based on Syntactic and
Psychological Models”, ICME 2004.
[54] H.-B. Kang, ‘Affective
Content Detection using HMMs’, ACM Multimedia 2003, 259-262.
[55] R. Turetsky and N.
Dimitrova, ‘Screenplay Alignment for Closed-System Speaker Identification and
Analysis of Feature Films’, Procs. ICME
2004.
[56]
B. Jung et al.,
‘Narrative Abstraction Model for Story-oriented Video’, ACM Multimedia 2004, 828-835.
[57] A. Ortony, G. L. Clore
and A. Collins, The Cognitive Structure of Emotions. Cambridge
University Press, 1988.
[58]
A. Bagga and B. Baldwin, ‘Cross-Document Event-Coreference’, ACL'99
Workshop on Coreference and Its Applications, 1-8, 1999.
[59]
H. T. Jiang and
A. K. Elmagarmid, ‘Spatial and Temporal Content-Based Access to Hypervideo
Databases’, VLDB Journal 7 (4), 226-238, 1998.
[60] W. G. Lehnert, ‘Plot
Units and Narrative Summarization’, Cognitive
Science 4, 293-331, 1981.
[61] K. Ahmad and M. Rogers, ‘The Analysis of Text
Corpora for the Creation of Advanced Terminology Databases’, in S. E. Wright
and G. Budin, The Handbook of Terminology Management. Amsterdam: John Benjamins, 2001.
[62]
N. K. Lodge, N.
W. Green and J. P. Nunn, `Audetel, Audio Described Television’, International
Broadcasting Convention, 140-145, 1994.
[63] C. Dorai and S.
Venkatesh, ‘Computational Media Aesthetics: Finding Meaning Beautiful!’, IEEE Multimedia 8(4), 10-12,
2001.
[64]
S. Chatman, Story
and Discourse: narrative structure in fiction and film. Ithaca: Cornell University Press, 1978.