TIWO - Television in Words: FINAL REPORT

 

EPSRC GR/R67194/01, January 2002-5, principal investigator – Dr Andrew Salway

www.computing.surrey.ac.uk/personal/pg/A.Salway/tiwo/TIWO.htm

 

The overall aim was to develop computational understanding of narrative in multimedia systems with applications for digital libraries.  The novel research scenario was that of audio description whereby stories told in moving images (film and television) are retold in words for visually impaired audiences.  Two general advances have been made.  First, we have shown how semantic video content can be analysed and described in terms of narrative structures.  Second, we have opened up audio description as a scenario for interdisciplinary research.  The main research results comprise: basic knowledge about how experts put moving images into words, new methods for extracting information about a film’s story, and an evaluation of a story-based hypervideo structure for browsing films.  So far there have been six publications [1-6] in the areas of multimedia, language engineering and artificial intelligence; five further papers are under review [7-11] and four journal papers are in preparation [12-15].  TIWO has also had cross-disciplinary impact with interest generated in the fields of information science, audiovisual translation, film theory, narratology, and new media.  We envisage two ways in which project results will mean more audio description is made for television programmes and films, above the amounts required by legislation, to the benefit of visually impaired audiences.  First, some results might be applied to part-automate the writing and editing of audio description scripts to reduce costs and improve quality.  Second, we have demonstrated how audio description can be reused to generate metadata for video retrieval and browsing in digital libraries: this should give broadcasters and film distributors more incentive to produce audio description.  The industry-based TIWO Round Table was active throughout the project with partners contributing the time of senior personnel worth £23,500 and audio description data.  Discussions about exploitation and knowledge transfer are in progress with BBC, ITFC, Softel, RNIB and Philips Research.

                                         

International Research Context

Narrative is a multi-faceted phenomenon studied by philosophers, literature and film scholars, linguists, cognitive scientists and computer scientists [23].  The study of narrative explores what the media-independent features of stories are, how different kinds of media can convey (the same) stories, and how stories are understood [24].  Narrative involves chains of events in cause-effect relationships occurring in space and time, where the agents of cause-effect are characters with goals, beliefs and emotions [25].  This definition applies to actual and fictional stories told in any media.  For some, narrative abilities considered both as a mode of thought and of discourse are fundamental to intelligence [26, 27].  Within computer science narrative is important for story understanding and generation systems, human-computer interaction, and virtual worlds for entertainment and education [28-31].  We have proposed a further reason to study narrative within the context of computing.  It has been noted that intelligent multimedia information retrieval systems must analyse and describe the content of multimedia data at different levels of abstraction, in order to give users intuitive retrieval, browsing and summarization functionality [32].  In [4] we argue that to deal with the semantic content of multimedia data it is essential to analyse and describe a level of abstraction relating to narrative structures; the MPEG-7 standard for a multimedia content description interface includes descriptors for semantic entities in narrative worlds [33].  As an example application, consider a video player that visualizes a video’s storyline, explains why a character behaves in a certain way and retrieves videos depicting similar stories.  The ‘semantic gap’ is a major challenge for multimedia computing and the mapping of visual features to the meanings conveyed by image/video data requires the integration of information from associated text [34].  The integration of visual and verbal information is a longstanding issue for AI [35, 36]. 

 

Intuitive and innovative forms of video retrieval, browsing and re-use in digital libraries require video data to be made into a structured medium by analysis and description of its content [37].  The description of video content requires video data models and knowledge representation formalisms [38-41].  The analysis of video content requires the extraction and fusion of features from visual, audio and textual data streams.  A small set of semantic video features can be detected automatically from the visual component of video data [42].  Most solutions also make some use of text associated with moving images such as closed captions and scripts [43-45]; the recently launched Video Google system appears to base its retrieval of television programmes on keywords in closed captions [46].  In MUMIS, information was integrated from multiple texts, like commentaries and match reports, to index videos of soccer matches [47].  Most research in multimedia information retrieval and extraction has dealt with the semantic content of multimedia data as spatio-temporal inventories of entities and events, which neglects important narrative aspects.  However, over the last 2-3 years there has been growing interest, within the multimedia community, in analysing films in digital libraries and this has led to more attention being paid to story-related features.  Films are interesting and challenging examples of multimedia data for a number of reasons.  The content of a film is communicated to the audience by patterns of light and shade, dialogue, sound effects, film editing techniques, music, and the actions of characters.  To appreciate a film, a viewer must not only recognise what is depicted, but also understand cause-effect relationships, characters’ emotional reactions and ideas about the filmmaker’s intention.  Film browsing systems based on representations of story structures have been proposed, but the representations were hand-crafted [48, 49].  An important first step in structuring the semantic video content of films automatically is the segmentation of shots and scenes [50].  Pixel motion and shot length were combined to measure the tempo of films: changes in tempo were found to coincide with key points in a film’s story [51].  The presence and absence of characters gives a rhythm that may be used for topic segmentation and film classification [52].  Colour features have been used to classify the moods of scenes and film genres [53, 54].  In July 2004, the use of film screenplays as descriptions of video data was discussed in [55].  In October 2004 researchers presented a ‘narrative abstraction model’ that generates video summaries based on measures of dramatic importance by analysing the patterns of characters’ appearances and interactions [56].

 

In TIWO we have identified audio description as a scenario to study narrative in multimedia systems and as a new source of information to structure video data.  We believe that as a surrogate for the moving image, audio description is a more computationally-tractable source of information about a film’s story than the video data itself, and as such can provide a stepping stone for crossing the semantic gap.  Compared with closed captions and scripts, we believe that audio description gives more reliable information about what is depicted in films and television programmes.  Audio description is made to enhance the enjoyment of television and cinema for visually impaired viewers.  In the gaps between existing speech, audio description gives key information about scenes, characters’ appearances and actions via an extra audio track.  Audio description is scripted before it is recorded so it is available as time-coded written text.  Audio description is made by trained professionals: it may take 60 hours to describe a 2-hour film.  In the UK, the 2003 Communications Act requires established terrestrial, satellite and cable television broadcasters to provide audio description with 10% of their output.  There are 150 cinemas in the UK that provide it for most major films.  Audio description is also available in the US, Canada, France, Germany, Japan, Australia, Ireland and Spain. 

 

Key Advances made by TIWO

 

Narrative for the Description of Multimedia Data

We have introduced and developed the idea that narrative is important for understanding the relationship between visual and verbal information in computational systems, and for intelligent multimedia knowledge management applications [4].  We believe that this will lead to a theoretical underpinning for existing and future approaches to bridging the semantic gap. 

 

Extraction of Information about Narrative Structures in Films

We have developed algorithms that generate new descriptions of semantic video content relating to narrative structures from different kinds of text associated with films [1-5].  This work is unique in dealing with films’ stories in terms of characters’ emotions.  When viewers watch a film they make sense of, and anticipate, the unfolding events depicted on-screen, based at least in part on what they think about characters’ cognitive states, e.g. their goals, beliefs and emotions.  In [2] we presented a method to extract and visualise information about characters’ emotions in films from audio description with 83% precision and 63% recall.  This information may be useful for video retrieval by story similarity, video summarisation and for reasoning about a film’s story.  The method is based on Ortony’s cognitive theory of emotions that links a character’s emotional states to the events in their environment [57].  In further work we proposed a metric for comparing the similarity of two stories (films) based on the distribution of characters’ emotions [1], and showed how emotions can be assigned to specific characters [18].  We were also interested in extracting and integrating information from plot summaries and film scripts.  A first step in information integration is to identify cross-document co-reference (CDCR), i.e. fragments of different texts that refer to the same entity or event [58].  We proposed algorithms to detect cross-document co-reference between mentions of events in two very different text types – plot summaries and audio description.  This is hard because matching verbs directly is not possible, for example a ‘murder’ event mentioned in a plot summary is described as a sequence of smaller actions in the audio description.  We found that simply selecting and matching the participants of events, and their grammatical roles, achieves about 50-60% precision and recall [3].  Our ongoing work is trying to automatically correlate verbs in plot summaries with verbs/phrases in audio description in order to do query expansion for CDCR.  Another kind of integration needs to take place between information extracted from texts and the intervals of video data it refers to: our investigation of temporal information in audio description specified three distinct tasks and new requirements for temporal information annotation schemes [5].

 

Hypervideo Browsing of Films by Narrative Structures

We have investigated the use of Lehnert’s plot units to structure hypervideo links between intervals of video data [10, 20].  Hypervideo offers new ways to watch and to interact with video data [59], but little research has been done into how hypervideo can be used to watch and re-watch feature films.  To be engaging and to facilitate intuitive interaction, it is important that hypervideo structures reflect story structures. Previous researchers have proposed knowledge representation formalisms to structure films for hypervideo browsing [48, 49], but this work did not include any detailed analysis of story structures, nor extensive user evaluation.  We have analysed two full-length feature films in terms of plot units.  Plot units represent cause-effect relationships between characters’ affect states and the events in a story [60].  We developed the NAFI system (Navigating Films) to store and edit data about plot units, to allow users to navigate films by following hypervideo links based on the structure of the plot units, and to record users’ actions for subsequent analysis by researchers; NAFI is available to researchers from the TIWO website.  The effect of plot units was evaluated by having 30 users complete question-answering tasks using the system.  Results suggest that with the hypervideo links subjects gave better answers to questions about the film’s story.  Feedback showed that they found the navigation experience to be engaging and enjoyable.  Now we are looking to partially automate the generation of plot unit data, encouraged by recent automatic analyses of story structure, including our own [2] and others [51, 56]. 

 

Analysis of Collateral Text Corpora: Audio Description, Film Scripts and Plot Summaries

A crucial part of the basic research in TIWO, underlying the developments described above, has been our corpus-based investigations into the language used to describe moving images in audio description, film scripts and plot summaries.  We have been interested to determine whether these texts exhibit distinctive idiosyncratic linguistic features.  This linguistic variance is predicted because the texts are written by trained experts for a specific purpose [61].  On the one hand, distinctive linguistic features would be interesting to give insights into how experts put moving images into words, and on the other hand they can be exploited by information extraction techniques.  Results show an unusually high number of open-class words among the most frequent words in corpora of audio description, plot summaries and film scripts, some of which relate to the emotions that characters are experiencing  [19, 21].  In audio description there is a tendency for verbs to refer to material processes, whereas in plot summaries more verbs refer to mental processes [3].  There is a preponderance of words relating to temporal information in audio description [5].  We found that phrases with the words looks, turns, smiles and door were especially frequent in screenplays and audio description and argued that they are frequent because the basic actions they describe are important story-telling elements for filmed narrative [11].  Previous work in mapping audio-visual features to high-level film content has drawn on film theory and the conventions of film grammar [50, 51].  In ongoing work we are exploring whether the depiction of basic actions follows patterns in line with film-making and story-telling conventions.  As well as helping information extraction, we hope that our kind of analysis can provide empirical data to test theories about film-making and story-telling. 

 

Audio Description as a Novel Research Scenario

The only research project about audio description prior to TIWO was AUDETEL which dealt with technical issues for the broadcast of audio description and developed guidelines for audio describers [62].  We believe that we have been successful in promoting audio description as a research scenario for a number of research communities.  For multimedia computing, audio description is a novel kind of text to use for structuring video data and there is potential to develop and apply multimedia processing techniques in systems that semi-automate the production of audio description, e.g. to detect quiet audio spaces, scene changes, characters and actions, and to summarise descriptions to fit available time.  For artificial intelligence, audio description allows researchers to study story understanding with examples of complex stories that are told in a relatively constrained language.  TIWO has also stimulated interest in audio description from corpus linguists and in the field of audiovisual translation.

 

Contributions To Related Work

TIWO has contributed to other research on visual-verbal information that the PI has been involved with during the last three years, including: the analysis of painting captions [6]; the classification of different ways images and texts combine in communication [7, 8]; and the analysis of surveillance video [9].

 

Review of Project Plan and Description of Work Carried Out

Eleftheria Tomadaki focused on corpus analysis and the extraction of information from audio description and plot summaries.  Yan Xu focused on video data modelling, knowledge representation and the development of the NAFI system.  Andrew Vassiliou extended and synthesised results from the language engineering and knowledge representation strands. 

 

Like previous work in intelligent multimedia information retrieval [32], we sought to combine data modelling, knowledge representation and the automatic analysis of multimedia content – in our case via text associated with moving images, and we evaluated our work with metrics such as precision and recall.  We grounded our work in theories about media, specifically narrative, analogous to recent proposals for computational media aesthetics [63].  Software was developed following an object-oriented approach with UML and Java, with reuse of existing software where possible.  The project was managed effectively with all objectives being met.  A report was written for each workpackage 1-3 to facilitate dissemination and technology transfer by summarizing and contextualizing the deliverables for a general readership [16-18].

 

Workpackage 1: Adapt and Apply Video Data Modelling and Knowledge Representation

The report for WP1 [16] reviews video data compression formats, metadata standards, video data models and the use of knowledge representation formalisms in systems that process video data.  It also shows our modelling of filmic narrative content: this work comprises a series of UML models of narrative based on a theory of narratology [64], and the representation of two feature films using a knowledge representation formalism [60].  With feedback from the Round Table, the outcome was an ‘ideal’ representation of narrative content which fed into Workpackage 3.

 

Workpackage 2: Adapt and Apply Language Engineering Techniques

A 500,000 word corpus of audio description scripts (60 feature films and a selection of television programmes) was gathered from audio describers at ITFC, RNIB and BBC.  To ensure a representative corpus, we consulted two audio description experts and established nine categories of films in terms of how the experts thought audio description would vary.  We also gathered and analysed corpora of plot summaries (114 films, 15,500 words) and film scripts (71 films, 1,930,000 words) from the web.  Surrey’s System Quirk was used to analyse the idiosyncratic linguistic features in these corpora.  Other existing packages were applied for information extraction including Sheffield’s GATE, the Connexor tagger and WordNet.  The corpus gathering, analysis and results are summarised in the report for WP2 [17]; results fed into Workpackage 3.

 

Workpackage 3: Specification and Prototyping of an Audio Description System

At each Round Table meeting there was discussion about how current and future technology could be applied to make better quality audio description more efficiently, and how audio description could be reused as a source of descriptions for video indexing and browsing.  These discussions were in the context of results from WP1 and WP2 presented by the research team and the Round Table’s knowledge of current audio description practice and technology.  The report for WP3 [18] contains a detailed specification of user requirements for a system to support the production and the reuse of audio description, a review of currently available systems and technologies and summaries of our prototypes.  Four systems were prototyped and evaluated to address some of the desired functionality whilst relating to the overall project aim of understanding narrative in multimedia systems.  The prototypes were for: keyword-based video retrieval; extraction of information about characters’ emotions; the integration of plot summaries and audio descriptions; and, hypervideo browsing.

 

Workpackage 4: TIWO Industry-based Round Table

There were seven meetings of the Round Table, each about six hours long, at the university, ITFC, RNIB and Softel between 17/12/2001 and 7/2/2005.  In between these the research team visited individual organisations.  Meetings were used to elicit expertise about audio description practice and technology, audio description data and user requirements and to communicate research findings and identify opportunities for exploitation.  For dissemination purposes, the project website makes available all publications, reports, presentations and software.  There was a high level of attendance at meetings by senior personnel (see Final Report form for details of the partners’ participation). 

 

 

Research Impact of TIWO and Benefits to Society

TIWO is timely as the multimedia community focuses more on how to analyse and describe higher-levels of ‘semantic content’ in multimedia data.  TIWO is also timely regarding efforts to raise the profile of audio description, notably by the RNIB, following the 2003 Communications Act.  The review letters from the project partners encourage our belief that TIWO has and will continue to have social and commercial benefit by showing how technology can help to make better quality audio description more cheaply, and by showing the added-value of reusing audio description for video retrieval and browsing.  Our findings about the language of audio description are relevant for those training audio describers and for those who develop and maintain standards for audio description.  We expect that the kinds of information about narrative structures that can be extracted automatically from audio description will support novel ways for users to interact with digital video libraries of film and television. 

 

 

Invited Lecture, Talks and Seminars given about TIWO

 
Invited Keynote Lecture

 “Computing Moving Images: beyond the pixel", Cross-overs in Audiovisual Arts and Interactive Media, University of Art and Design Helsinki, June 2004

 
Invited Talks at International Meetings

 “Computer Science R&D and Relevance to New Media”, Interactive Screen, Banff New Media Institute, Canada, July 2003.

 “Intelligent Multimedia Data Systems: adding structure and meaning to data”, Symp. on AI and Games, BNMI, August 2002.

 

 
Talks at International Conferences with Refereed Abstracts

 “A Corpus-based Analysis of the Language of Audio Description”, Transmedia International Conference on Audiovisual Translation, Barcelona, June 2005.

"Television in Words: the TIWO project", International Conference on Audiovisual Translation, London, February 2004.

 
Invited Seminars

·   “The analysis of an audio description corpus”, Faculty of Translation, University of Granada, April 2005.

·   "Image-Text Relations in Multimedia Systems", School of Comp., Math. and Info. Sciences, Uni. of Brighton, April 2004.

·   “Collateral Media and Narrative”, Centre for Digital Video Processing, Dublin City University, August 2003.

·   “Explicating the Image-Text Link for Cross-modal IR”, Department of Information Studies, University of Sheffield, July 2003.

·   “Narrative Intelligence for Multimedia”, Department of Computer Science, University of Exeter, May 2003.

·   “Audio Description for Film and TV: computing narrative.”  Centre for Corpus Linguistics, Uni. of Birmingham, Feb. 2003.

·   “What’s in a Link Between an Image and a Text?”  Institute for Language, Speech and Hearing, Uni. of Sheffield, May 2002.

 

TIWO Publications and Reports, including under review and in preparation

[1]     Vassiliou, Salway and Pitt (2004), ‘Formalising Stories: sequences of events and state changes’, IEEE Conference on Multimedia and Expo, ICME 2004.

[2]     Salway and Graham (2003), ‘Extracting Information about Emotions in Films’, ACM Multimedia 2003, pp. 299-302. 

[3]     Tomadaki and Salway (2005), ‘Matching verb attributes for cross-document event co-reference’, in Erk et al. (eds.) Procs. Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes, pp. 127-132.

[4]     Salway, Graham, Tomadaki and Xu (2003), ‘Linking Video and Text via Representations of Narrative', AAAI Spring Symposium on Intelligent Multimedia Knowledge Management, pp. 104-112.  ISBN 1-57735-190-8.

[5]     Salway and Tomadaki (2002), ‘Temporal Information in Collateral Texts for Indexing Moving Images’, in Setzer and Gaizauskas (eds.) Procs. LREC 2002 W’shop on Annotation Standards for Temporal Information in Natural Language, pp. 36-43.

[6]     Salway and Frehen (2002), ‘Words for Pictures: analysing a corpus of art texts’, in Procs. TKE 2002 – Terminology and Knowledge Engineering.

[7]     Martinec and Salway, ‘A system for image-text relations in new (and old) media’, submitted to J. of Visual Communication.

[8]     Salway and Martinec, ‘Image-text relations for multimedia information extraction’, submitted to 5th ACM Symp. Doc. Eng.

[9]     Ahmad, Bennett, Mountstephens, Cheng, Vassiliou and Salway, ‘Video Summarisation: Variation in Attention and Linguistic Description of Surveillance Videos’, submitted to IEE International Symposium on Imaging for Crime Detection and Prevention.

[10]  Salway and Xu, ‘Navigating Stories in Films: a case study in hypervideo’, submitted to IEEE ICME 2005.

[11]  Salway, Vassiliou and Ahmad, ‘What Happens in Films?’, submitted to IEEE ICME 2005.

[12]  Salway, Vassiliou, Tomadaki and Ahmad, ‘Extracting narrative information’, to be submitted to IEEE Trans. on Multimedia.

[13]  Xu and Salway, ‘Using plot units to structure films as hypervideo’, to be submitted to ACM Computers in Entertainment.

[14]  Salway, ‘Audio Description: challenges and opportunities for multimedia computing’, to be submitted to IEEE Multimedia.

[15]  Salway, ‘Experts’ descriptions of visual information’, to be submitted to Computers and the Humanities.

[16]  Salway, Vassiliou and Xu (2005), ‘Data Models and Knowledge Representation for Video Data’.  TIWO report for WP 1.

[17]  Salway, Tomadaki and Vassiliou (2005), ‘Building and Analysing a Corpus of Audio Description.’  TIWO report WP 2.

[18]  Salway (2005), ‘AuDesc System Specification and Prototypes’.  TIWO report for WP 3.

[19]  Tomadaki (2003), ‘Integrating Information from Collateral Media’, MPhil-PhD Transfer Report, University of Surrey.

[20]  Xu (2003), ‘Video Retrieval by Semantic Content’, MPhil-PhD Transfer Report, University of Surrey

[21]  Vassiliou (2004), ‘Representing Narrative in Multimedia Systems’, MPhil-PhD Transfer Report, University of Surrey.

 

References

[22]  A. Salway and K. Ahmad, ‘Talking Pictures: Indexing and Representing Video with Collateral Texts’, Procs. 14th TWLT, 85-94, 1998.

[23]  D. Herman, Story Logic: problems and possibilities of narrative, Uni. of Nebraska Press, 2002.

[24]  M.-L. Ryan (ed.), Narrative Across Media: the languages of storytelling.  Uni. of Nebraska Press, 2004.

[25]  D. Bordwell and K. Thompson, Film Art: An Introduction.  McGraw-Hill 5th Edition, New York, 1997.

[26]  J. Bruner, ‘The Narrative Construction of Reality.’  Critical Inquiry 18, pp. 1-21, 1991.

[27]  R. Schank, Tell me a Story: narrative and intelligence. Northwestern University Press, 1990.

[28]  R. Schank and C. K. Riesbeck, Inside Computer Understanding: five programs plus miniatures.  Lawrence Erlbaum Associates: Hillsdale, NJ, 1981.

[29]  E. T. Mueller, ‘Story understanding through multi-representation model construction’, Procs HLT-NAACL 2003 Workshop, 46-53.

[30]  C. B. Callaway and J. C. Lester, ‘Narrative Prose Generation’, Artificial Intelligence 139, 213-252, 2002.

[31]  M. Mateas and P. Sengers (eds.), Narrative Intelligence. John Benjamins, 2002.

[32]  M. Maybury, Intelligent Multimedia Information Retrieval.  AAAI Press / The MIT Press, 1997.

[33]  B. S. Manjunath, P. Salembier and T. Sikora (eds.), Introduction to MPEG-7: multimedia content description interface.  John Wiley and Sons, 2002.

[34]  A. W. M. Smeulders. et al., ‘Content-based image retrieval: the end of the early years’, IEEE Trans. PAMI, 22 (12), 1349 - 1380, 2000.

[35]  R. K. Srihari, ‘Computational Models for Integrating Linguistic and Visual Information: A Survey’, Artificial Intelligence Review, 8(5-6), 349-369, 1995.

[36]  K. Barnard et al, ‘Matching Words and Pictures’, Journal of Machine Learning Research, 3, 1107-1135, 2003.

[37]  N. Dimitrova et al, ‘Applications of Video-Content Analysis and Retrieval’, IEEE Multimedia 7(3), 42-55, 2002.

[38]  S.-C. Chen, R. L. Kashyap and A. Ghafoor, Semantic Models for Multimedia Database Searching and Browsing.  Kluwer Academic Publishers, 2000.

[39]  H. W. Agius and M. C. Angelides, ‘Modelling Content for Semantic-Level Querying of Multimedia’, Multimedia Tools and Applications, 15, 5-37, 2001.

[40]  F. Kokkoras et al., ‘Smart VideoText: a Video Data Model based on Conceptual Graphs’, Multimedia Systems, 8, 328-338, 2002.

[41]  A. Parkes ‘The Prototype CLORIS System’, Information Processing and Management 25 (2), 171-186, 1989.

[42]  A. F. Smeaton, P. Over and W. Kraaij, ‘TRECVID: Evaluating the Effectiveness of Information Retrieval Tasks on Digital Video’, ACM Multimedia 2004.

[43]  H. D. Wactlar et al., ‘Lessons Learned from Building a Terabyte Digital Video Library’, Computer, Feb 1999, 66-73.

[44]  S. Satoh, Y. Nakamura and T. Kanade, ‘Name-it: Naming and detecting faces in news videos’, IEEE Multimedia, 6 (1), 22-35, 1999.

[45]  J. Wachman and R.W. Picard, ‘Tools for Browsing a TV Situation Comedy Based on Content Specific Attributes’, MM Tools and Apps, 13, 255-284, 2001.

[46]  http://video.google.com/

[47]  J. Kuper, ‘Intelligent Multimedia Indexing and Retrieval through Multi-source Information Extraction and Merging’, IJCAI 2003, 409-414.

[48]  V. Roth, ‘Content-based retrieval from digital video’, Image and Vision Computing, 17, 531-540, 1999.

[49]  R. B. Allen and J. Acheson, ‘Browsing the Structure of Multimedia Stories’, Procs. 5th ACM Conference on Digital Libraries, 11-18, 2000.

[50]  H. Sundaram and S.-F. Chang, ‘Computable Scenes and Structures in Films’, IEEE Trans. Multimedia 4 (4), 482-491, 2002.

[51]  B. Adams, C. Dorai, and S. Venkatesh, ‘Towards Automatic Extraction of Expressive Elements for Motion Pictures: Tempo’, IEEE Trans. Multimedia 4 (4), 472-481, 2002.

[52]  K. Shirahama, K. Iwamoto, and K. Uehara, ‘Video Data Mining: Rhythms in a Movie’, Procs. IEEE Int. Conf. Multimedia and Expo, ICME 2004.

[53]  C.-Y. Wei, N. Dimitrova and S.-F. Chang, ‘Color-Mood Analysis of Films Based on Syntactic and Psychological Models”, ICME 2004.*

[54]  H.-B. Kang, ‘Affective Content Detection using HMMs’, ACM Multimedia 2003, 259-262.

[55]  R. Turetsky and N. Dimitrova, ‘Screenplay Alignment for Closed-System Speaker Identification and Analysis of Feature Films’, Procs. ICME 2004.

[56]  B. Jung et al., ‘Narrative Abstraction Model for Story-oriented Video’, ACM Multimedia 2004, 828-835.

[57]  A. Ortony, G. L. Clore and A. Collins, The Cognitive Structure of Emotions. Cambridge University Press, 1988.

[58]  A. Bagga and B. Baldwin, ‘Cross-Document Event-Coreference’, ACL'99 Workshop on Coreference and Its Applications, 1-8, 1999.

[59]  H. T. Jiang and A. K. Elmagarmid, ‘Spatial and Temporal Content-Based Access to Hypervideo Databases’, VLDB Journal 7 (4), 226-238, 1998.

[60]  W. G. Lehnert, ‘Plot Units and Narrative Summarization’, Cognitive Science 4, 293-331, 1981.

[61]  K. Ahmad and M. Rogers, ‘The Analysis of Text Corpora for the Creation of Advanced Terminology Databases’, in S. E. Wright and G. Budin, The Handbook of Terminology Management.  Amsterdam: John Benjamins, 2001.

[62]  N. K. Lodge, N. W. Green and J. P. Nunn, `Audetel, Audio Described Television’, International Broadcasting Convention, 140-145, 1994.

[63]  C. Dorai and S. Venkatesh, ‘Computational Media Aesthetics: Finding Meaning Beautiful!’,  IEEE Multimedia 8(4), 10-12, 2001.

[64]  S. Chatman, Story and Discourse: narrative structure in fiction and film.  Ithaca: Cornell University Press, 1978.