Department of Ele
ctronic Engineering Department o
f Computing Department 
of Mathematics Department of Physi
cs
  Home Undergraduate Postgraduate Research Contacts Search

 

  • Department of Computing
  • SoCIS
  •  

    Aims and objectives of the technology research

    IMPRESS – IMPRession Evidence & Serial-crime profiling System

    Previous Research and Track Records –Sheffield, Surrey, & Strathclyde Universities

     

    NLP Group, Dept. of Computer Science, University of Sheffield

    Sheffield NLP group is an RAE-2001 5* group in a 5 grade Department of Computer Science (DCS), and together with the DCS Speech group and the Information Retrieval group in the 5* Information Studies Department, Sheffield has the largest language and information research grouping (about 70 people with seven professors) in England, and possibly Britain. Active research areas relevant to this proposal include:

     

    The EPSRC-supported GATE infrastructure for NLE R&D is being used extensively at Sheffield and abroad (Cun 02, http://gate.ac.uk/).  Dialogue modelling and the modelling of belief structures in time is another important area of work (Wil 02 and EU-projects AMITIES, COMIC and FaSIL). Multilingual and multimodal information extraction from the web, newswires, scientific journals, legal and medical documents has been successfully carried out (see Hum 00 and May 02, supported by past EPSRC projects GATE GR/M13473 and GATE1 GR/M31699/01, and EU projects NAMIC, AVENTINUS, MUMIS and UK Government project MUSE).  The EPSRC 5-university IRC AKT on Knowledge Management Project (GR/N15764/01) in whichwill use GATE as its language engineering infrastructure GATE plays a major role (Cir 02).  Research has been undertaken on empirical semantics, word sense disambiguation and semantic induction from data structures (Ste 01, and EPSRC project MALT GR/M73521/01 and EU project ECRAN).  ) and on Measures of text reuse and similarity (Clo 02 & EPSRC project METER GR/M3404/01) (Clo 02). Question-answering systems in journalism environment (new EPSRC project CUBREPORTER, GR/R91465/01) which competes in the US TREC QA competition. GATE infrastructure and modules play key roles at Sheffield in EPSRC projects MyGrid, MIAKT, and EMILLE, and in EU Projects DOT.KOM, CLARITY and CLEF. 

     

    Of direct relevance to the present proposal are the joint Sheffield-Surrey EPSRC SOCIS project (GR/M89676) the UK Government MUSE project, which have linked extended lexical material directly to the retrieval of structured images (crime images in SOCIS) and video segments (football sequences in MUSE) retrieved from structured commentaries (see, Pas 02, Pas 03).  Sheffield has also emerged as a leading centre in the development of adaptive IE, having developed algorithms for wrapper-induction IE, obtaining excellent experimental results on publicly available corpora (Cir 01). The Amilcare adaptive IE system is emerging as a reference IE system in the Semantic Web field and is part of the EPSRC AKT IRC Consortium.  Sheffield has also had considerable experience, in the EU COMIC 5FP project, in adaptive dialogue management, particularly dialogue content resulting from the fusion of multi-modal inputs (e.g. speech, text, writing, gesture and vision) into a single communicative representation.

     

    Yorick Wilks is Professor of Computer Science at the University of Sheffield and Director of ILASH, the Institute of Language, Speech and Hearing. He has published numerous articles and seven books in that area of Artificial Intelligence and NLP. He is also (three times) a member of the EPSRC College of Computing, a Fellow of the European and American Association for Artificial Intelligence, on advisory committees for the National Science Foundation, and on the boards of some fifteen AI-related journals. He created the DIDEROT IE system for ARPA in the US in 1990-93 and was the principal investigator of the GATE / LaSIE / ECRAN / MUSE / MUMIS / NAMIC IE projects at Sheffield.

     

    Centre for Knowledge Management, Dept. of Computing, University of Surrey

    The Centre is within an RAE-2001 6 rated Unit of Assessment (Elec. Eng).  The Centre currently has 7 academics, 4 RA’s and 20 PhD students.  The Centre is amongst the most active terminology and knowledge acquisition centres in Europe.  Active research areas relevant to this proposal include:

     

    The Centre has developed a knowledge management system, System Quirk (with grants from the EPSRC, EU, & DTI funding), for investigating the link between different modalities of communication: namely, text, image, and time-serial data (Ahm 01a, Gil 02, Sal 98, www.surrey.ac.uk/Quirk).  The system has been downloaded by over 900 organisations worldwide.  In the EPSRC-supported SOCIS project, image and collateral captions were being fused for retrieving one (say image) from another (collateral text): terms were automatically extracted from the collateral texts and used to index images automatically – descriptions of crime-scene images provided by investigators were used to index crime-scene images (Scene of Crime Information System – SOCIS, GR/M89041/01).  Four UK Police Forces have evaluated the web-enabled SOCIS system and the results have been encouraging enough for SOLCARA PLC to explore commercial viability of the system (www.computing.surrey.ac.uk/SOCIS). Qualitative, opinion-related, data is being extracted from (financial) news wires and correlated with time-serial data (share price movement) to generate buy and sell signals (EU-IST GIDA Project No. 2000-31123 , EU-IST Project ACE No. 22271).  In a current EPSRC project (Television in Words –TiWO, GR/R67194/01) audio descriptions of moving images for the visually impaired are being processed by exploring narrative structures of films (Sal 03 & www.computing.surrey.ac.uk/TIWO).  The automatic extraction of the conceptual structure of a domain by examining texts of the domain (EU-IST SALT Project 1999-10951, ESPRIT-LE Projects Interval No. 4002 and Transterm No. 62-055) is being used to track the emergence of concepts in semiconductor technology, in artificial intelligence and in health care (Alt 02).  Text categorisation is a key interest of the Centre: automatic terminology extraction is used to facilitate text categorisation (Ahm 01b).  Work has been carried out on learning: multiMulti-net neural computing systems have been developed to learn to categorise classify images and categorise collateral texts separately and simultaneously to learn the relationship between the two (Ahm 02a). 

     

    Work directly related to this project relates to the automatic construction of thesaurus of terms, i.e. conceptually organised terms, within sub-specialisms (forensic science à crime scene photography à footwear impression) (Ahm 03a); this is an example of data mining on textual sources.  The use of multi-net neural classifier systems, each classifiers specializing in a specific task, will be crucial for collating images and texts from different sources, as in the case of impression evidence about a single individual from different scenes of crime (Ahm 03b).  This classifier is based on an earlier data mining project sponsored under a TCS grant (TCS 1940).

     

    Khurshid Ahmad is Professor of Artificial Intelligence at the University of Surrey and Head of Department.  He has published over a 100 articles and two books in the area of computer-assisted learning and terminology extraction.  He is a member of the EPSRC College of Peer Review.  He is responsible for the development of System Quirk and is the Principal Investigator of GIDA and SOCIS, and of SALT, INTERVAL, and TRANSTERM in the past.  He has served as a visiting professor at the Copenhagen Business School

     

    Andrew Salway is Lecturer in Multi-media systems.  He is the Principal Investigator on the EPSRC-sponsored TIWO project.  He has given invited lectures in Japan, Canada, Australia and in the UK on the relationship between moving images and their textual description.  He works closely with the BBC, ITFC, Tate Gallery and the Royal National Institute for the Blind and the Banff Centre for New Media (Canada). 

     

    Chris Handy is Tutor in Information Extraction and has previously worked for Surrey Police.  His research is on how forensic scientists examine, classify and report images.  He works in close co-operation with the Met’s Crime Academy and is exploring the exploitation of the SOCIS system with Solcara PLC with help from the DTI. He is currently putting together asetting up an MSc course in Forensic Info Systems with the the the Met Police’s Crime Academy.

     

    Forensic Science Unit, University of Strathclyde

    The Forensic Science Unit (FSU) at the University of Strathclyde established the first UK postgraduate degree course in the forensic sciences about 30 years ago.  The FSU plays a key role in teaching and research in the area of forensic science.  Research in the FSU is focussed toward forensic issues such as drug profiling and crime scene reconstruction. The FSU are a founder member of the European Network of Forensic Science Institutes (ENFSI) and its research is funded by the BBSRC, the EPSRC and pharmaceutical companies.  Academics within the FSU are practitioners and are authorised for the forensic examination of drugs, dyes, documents (inks) and criminalistics (physical evidence).

     

    Dr Adrian Linacre is a Senior Lecturer in the FSU.  He specialises in the application of DNA analysis to forensic examination, especially in cases of drug trafficking.  He has pioneered the use of grass as evidential material.  His research informs the course on DNA profiling he has established within the FSU.   Dr Linacre has published over 25 papers in international journals and written 7 book chapters.  Dr Linacre is secretary and treasurer for the Competence Assurance Project of ENFSI, with the work funded by a grant from the joint action programme of the EU for co-operation between law enforcement agencies in the Union (EU Joint Action OISIN 96/636/JAI of 20 December 1996).  Recently, he became an assessor for the Council for the Registration of Forensic Practitioners (CRFP) in the area of human contact traces.

     


    CASE FOR SUPPORT

    Introduction

    The SOCIS project provided us with a wholly novel way to link the structured information in pictures and captions together directly.  In IMPRESS we intend to build on this by investigating a range of data mining (DM) techniques applied to this result, so as to identify patterns that can be linked directly to individuals, in this case persistent offenders whose patterns of offence have not been obvious to investigators.  The initial success of multimedia computing was the integration of text, image, video and audio data at the level of the bit-stream so that they could be stored, accessed and processed by the same system.  However, integration at higher levels of abstraction remains an unsolved problem; cf. the EPSRC’s ‘grand challenge’ of capturing and storing digital human memories.  One aspect of this problem is the fusion of information from heterogeneous sources, including different kinds of media, different coding formats, different languages and different points of view.  The scenario of crime detection and prevention is an interesting scenario for exploring these problems.  Scene-of-crime officers gather records of impression evidence (photographs and textual descriptions) which on their own make little sense, but taken together can be interpreted by investigators to solve a case.  The explanation of the chain of events and individual(s) that resulted in the crime scene, relies on credible and robust evidence. 

    Accomplished investigators learn not only to identify individual items of impression evidence, but learn to correlate the different items collected in different places/times/modalities.  This assimilation of disparate data to produce one significant item of information –credible, robust evidence – is an abstraction worthy of the grand challenge.  The IMPRESS project will attempt to mimic the behaviour of an accomplished investigator.

     

    In SOCIS we dealt only with the integration of visual and textual information related only to items of impression evidence and their description.  IMPRESS will extend the integration of multimedia crime-scene information at three distinct but interrelated levels incorporating both impression evidence and Modus Operandi (MO) reports.  First by using established information extraction techniques we will link the MO’s – lengthier, more interpretive free texts.  Second our focus will be on crimes where both the MO and impression evidence are available.  In this connection, IMPRESS will address the integration of descriptions and MOs given by different members of the same police force.  Third, IMPRESS will address the integration of information gathered by different police forces.  Each of the three levels depends crucially on maintaining and continually updating the inventory of concepts and terms, and the relationship amongst terms.  IMPRESS will not only be able to learn idiosyncratic words and image fragments, but will learn to correlate the words and images.

     

    The development of the IMPRESS system will incorporate findings from these three strands of research in order to integrate heterogeneous crime scene information into common machine-executable representations – a task, in this context, akin to data preparation which is a crucial precursor to data mining.  The project will go on to evaluate a range of data mining techniques in the IMPRESS system for identifying habitual criminals from impression evidence and MOs across many crime scenes. 

     

    SoCIS established strong co-operative links with police forces and software companies and, having favourably evaluated the SoCIS system, they are keen to participate in the IMPRESS project: accompanying this proposal are 9 letters of support from 5 police forces, a software company and a university department of forensic science who will form the IMPRESS Round Table (including three letters from different departmental heads in the Metropolitan Police).  The Round Table will serve a number of functions including: (i) provision of multimedia crime scene data; (ii) user requirements and feedback for the IMPRESS system; and, (iii) dissemination of project results.   Five police forces have already supplied impression evidence for the SoCIS project and this data will be used in IMPRESS along with MOs from the West Midlands Police; IMPRESS complements West Midland’s FLINTS system. The distributed nature of data related to habitual criminals is an ideal test case for EPSRC’s GRID for supporting e-Science activities and Internet II emerging in the USA.

     

    Background

    Property related offences, especially theft offences, criminal damage, and burglary, accounted for over 75% of all the 5.5 million offences reported in the UK in 2001/02; many of these crimes are committed by habitual criminals.  The need for intelligence led policing has been stressed by ACPO and it has been noted by its chairperson that "police activity has shifted its centre of balance away from reactive investigation after events towards targeting active criminals on the basis of intelligence.” (David Phillips, ACPO Chair).These habitual criminals are the key targets of the Association of Chief Police Officers: David Phillips, Chair ACPO, has argued for a shift from ‘reactive investigation’ to an ‘intelligence led’ targeting of the habitual.  Edmond Locard, one of the pioneering forensic scientists suggested that whenever any two objects come into contact there is invariably a transfer of material from each object onto the other.  Whatever the criminal steps on he or she leaves a footwear impression, whatever he or she touches there is a finger-mark.  The criminal leaves behind biological ‘impressions’ – hair and bodily fluids, leaves tool-marks, tyre marks, and more recently forensic scientists have developed techniques to record ear prints and clothes prints.  The impression evidence, photographed by trained personnel, and is described by investigators in the idiosyncratic language of forensic science.  The image and the description forms an integral part of the evidence but are seldom used together.  A system to identify commonalities in impression evidence, gathered at different places and at different times, would assist in detecting and apprehending serial criminals.  The correlations of impression evidence found at different crime scenes related to the same offender will yield robust evidence. The awareness amongst criminals that tiny amounts of impression evidence left at the scene of crime can be traced back to them will act as a deterrent.  The ways in which investigators learn to identify individual impressions and learn to correlate the different impressions, from different places collected at different times, is fascinating in itself for those building learning systems. 

     

    Large volumes of data related to the modus operandi (MO) is being collected: the West Midlands Police Force collects 3,000 free-text MO’s daily amounting to 1 million items per year.  The Force is currently using an intelligent workflow system, FLINTS, to combine ‘forensic and physical evidential “hits”’ to display links between criminals and the evidence.  FLINTS produces a profile of offenders and of crimes committed.  There are evidence tracking systems that deal with the movement of crime related exhibits (cf LOCARDä Evidence Tracking System) from ‘crime scene through to court’; this tracking is again performed through a class description, much like that used by freight handling organisations.  The Metropolitan Police (North London Branch) has developed a profiling system based on a detailed categorization of crime scenes and suspects/convicted criminals.  There image-oriented workflow systems, used mainly by US Police Forces, that help in the categorization of images according to the intrinsic visual features.  FLINTS, LOCARD and imaging workflow systems all can be used as systems for building the profile of a habitual criminal. 

     

    A future workflow system should be able to process and fuse together the impression evidence in the two modalities, free text descriptions and images.  Given the large volume of impression evidence, it is important for such a system to learn to fuse the different items of impression evidence in a coherent whole.  The workflow system should use the high-performance/high-speed and secure, broad bandwidth data/communications networks (e.g. Internet II or the GRID).  There are problems related to the variance in the ways different Police Forces describe and image a scene of crime and indeed there are variations within a large Force.  Nevertheless, there are similarities in descriptions as manifested by terminology, reflecting the conceptual structure of forensic science and criminalistics, on the one hand and the specialized nature of the images of impressions.  There is a need to have a holistic view of the impression evidence: different types of evidence and two different modalities.  Training of forensic scientists and officers should reflect this.  An intelligent workflow system for impression evidence, that pro-actively fuses, and learns to fuse, descriptions and images, will require active co-operation between the end-users (the Police Forces), the academic researchers in information extraction, in text and image mining, and in Grid technologies, and software vendors specialized in working with the Police Forces. 

     

    There are a number of projects currently sponsored by the EPSRC in areas as diverse as Psychology, Medicine and Computing (Crime VUS -GR/N09701/01, IXI- GR/S21526/01, Spectral Retrieval-GR/N33348/01, Freedom To Forget - GRA91364/01), which focus on the analysis and management of images with some support for language engineering techniques. In other projects (MIAS -GR/R83972/01, GR/M66233/01) one of the important research questions is the use of (manually-created) ontologies and thesauri. The IMPRESS project will benefit from lessons learned in these projects about the management of image repositories and inform them in return about image-text interactions and how adaptive learning systems can be beneficial.  The research work undertaken already in building image management systems that beneficially use texts collateral to the image has been documented (Sri 00).  The key question of inter-indexer variability has not been extensively discussed (Eak 99).  Multiple classifier systems have been used deal with properties of an individual image (colours, shapes, texture) and there is some work on relating images to sound (Rol 02).

     

    Aims and objectives of the technology research

    The project’s technological thrust is to build on the achievements in this domain of the SOCIS project: the direct linking of data in the forensic domain drawn from both text and image fields, in a way we believe to be original. The aim is then to consolidate that investment and research achievement by extending it to the IMPRESS system so that a range of machine learning techniques, applied to the multi-modal SOCIS data, will allow the emergence of data clusters that correspond not only, as in other systems, to geographical areas, but to individuals seen as complex constructions of correlated traits that will enable that individual to be identified, located and caught.  The system will be embellished with a learning component that learns to correlate the textual descriptions and image features of individual impressions, and subsequently learns to correlate different types of impression evidence.  Our objectives are:

    1.      To develop a computational method for fusing information from different types of impression evidence, and from accompanying descriptions and MOs provided by experts; and to investigate how the link between  images and texts can be learnt by machines.

    2.      To investigate the similarities and differences in how experts describe impression evidence and articulate Modi Operandi (MOs); and to automatically generate a domain ontology from their texts.

    3.      To develop a system that will take the fused information and the domain ontology, and then apply existing data mining techniques in order to detect patterns relating serial offenders; and to seek early exploitation of the IMPRESS system by demonstrating prototypes to the Round Table throughout.

    4.      To incorporate computer-based evidence methods within the forensic science teaching curriculum.

     

    The excitement and novelty of the research challenges to be addressed

    Our approach is different from the FLINTS system (West Midlands Police), which helps to visualise highly structured data.  Our effort will be more general, both as regards techniques and as regards data (in that it extends over both text and image data) starting with Kohonen maps (a neural computing method) over all features covered by SOCIS in an effort to develop individual profiles over data automatically.  The information extraction techniques developed in SOCIS will be used in IMPRESS to process the Modus Operandi data (in free-natural language text) accessible to FLINTS.  Furthermore, the storage and processing of images of impression evidence, together with the automatic correlation with the linguistic description of the images, will result in an exciting and novel impression-evidence management system.  This novel system will complement existing systems like FLINTS.

     

    Forensic scientists perform a crucially important real-world task where heuristics abound and much knowledge is personal knowledge; in some respects, the forensic scientists behave like other diagnosticians that have been investigated in the AI literature and in other respects they behave like hypotheses-makers as in law and cognate subjects.  Literature on contents management, knowledge management and multimedia systems constantly refer to the intellectual challenge of dealing with a mixture of perceptual and cognitive modalities.  Information fusion, where it helps in the enterprise of forensic intelligence, will throw light on intelligent information processing in human beings on the one hand and will help in the development of robust systems on the other. We believe our approach to building repositories of structured information from otherwise unstructured data (descriptions in strings of comprising texts and images comprising pixel clusters), is different from that of conventional prescriptive or highly theoretical approaches in the literature for building knowledge bases; we have used pre-existing semi-structured repositories of knowledge including free text, lexical and terminological resources, to maximally exploit the content in them with our integrated extraction techniques.  The proposed research will lead to a unique combination of advanced and developing technologies enabling the fusion of information and extraction of key facts from this fusion.  An understanding of how experts describe visual information will ground the technological advances; both in terms of their expert knowledge and of the special language they use to articulate their descriptions.  Such understanding will be informed by, and may in turn inform, the training of forensic scientists.

     

    Relevance of research to the TCPD vision, and other beneficiaries

    The proposed research will address problems of crime detection by the intelligent management of crime scene data, offender identification by predictive techniques considering crime patterns and person detection by novel data acquisition and processing techniques.  These problems have been identified in the TCPD Vision (Nov 2001).  The three Universities in the project will help undertake an investigation of the variation in the way in which impression evidence is described, stored and retrieved.  Strathclyde will provide a forensic science framework for describing and linking impression evidence and will use the results to inform their teaching and learning programmes.  We will raise awareness of research in our specialised areas within the forensic science and the police community through novel manners of interaction and dissemination.

     

    What we propose is in some respects blue sky research: text mining appears blue sky in that our premise is that we can process natural language texts despite its deep grounding in human nature and culture, but by carefully targeting on a specialist enterprise such difficulties can be alleviated as we and others have demonstrated in dealing with texts in science, medicine, engineering, and the arts.  Computer systems that can learn patterns, textual or image patterns, and indeed that can learn to correlate patterns in different modalities, appear blue sky research: again, it has been demonstrated that by carefully selecting categories of images and texts, systems can indeed endeavour to learn characteristics of images and texts.  The variation in the perception of images and texts is an open question in cognitive sciences: training and experience can reduce this variance. 

     

    Nature of the research team and its ability to deliver the research project aims

    The specific research advance from the existing EPSRC SOCIS cooperation (Universities of Surrey and Sheffield, EPSRC Grant No. GR/M89041/01) has resulted in an integrated working prototype that links the searching of structured images to the searching of structured captions; this bringing together of meaningful structures in both language and vision has been a holy grail of artificial intelligence for over thirty years and we believe we have made real progress as partners in SOCIS, rather than just using one modality (e.g. language) as an index for searching another (e.g. pictures or video).  The current work in two separate projects (Surrey’s TIWO and Sheffield’s MUSE) were planned in the light of experience gained from SOCIS.  Sheffield and Surrey together have brought extensive and evaluated experience in information extraction, a widely distributed architecture for modular natural language processing applications of this sort, and techniques for building and integrating ontologies with semantic structural representations.  Work has also been successfully carried out in neural network based learning of images and texts that describe these images.  Strathclyde FSU has made original contributions to the areas of forensic science and criminalistics.  The University of Surrey and Strathclyde have begun negotiations about a joint MSc course in Forensic Information Systems.    The three universities have established an excellent rapport with major Police Forces in the UK, and indeed a significant proportion of their recent work has benefited from such an interaction.  This is manifested by expressions of support; the Manager of the Kent Police’s Forensic Sci. Service and President-elect of the UK Forensic Science Society (2003-04), has agreed to Chair the project management committee, the IMPRESS Round Table.

     

    In the SOCIS project, Surrey and Sheffield relied on the scene-of-crime officers for understanding how they interpret images focusing mainly on footwear impressions and tool-marks.  This input will be broadened and formalized by Forensic Science Unit, Univ. of Strathclyde.  The Unit will organise the input of the expertise for the IMPRESS system, and will lead the evaluation of the IMPRESS system at regular intervals during the lifetime of the project.  The Met’s Crime Academy, part of the Met’s Specialist Crime Directorate, is helping Surrey in understanding the effect of training of forensic professionals on the variation in their description of scene-of-crime images. 

     

    Detailed Work Plan

    The work will be split into five work packages. As part of each work package each partner will plan and manage the work to be completed, prepare progress reports and participate in meetings (both technical and Round Tables).

     

    Work Package 1: Domain Modelling (10 person months: Surrey 4, Strathclyde 6)

    An understanding of the domain and knowledge of the needs of the users will be made explicit through techniques of knowledge acquisition such as brainstorming, structured interviews and case studies.  Domain specific data models such as the National Intelligence Model will be investigated to determine their utility.

    Milestone: User Requirements Specification (Month 6)

     

    Work Package 2: Terminology and Ontology Building (12 person months: Surrey 6, Strathclyde 6)

    This workpackage will test the hypothesis that experts share a special language and will examine how their descriptions of impression evidence and their articulations of MOs vary, both within and across police forces.  The automated generation of a domain ontology will be evaluated as a means of alleviating problems caused by such variance for multimedia information integration.  Methods developed in SOCIS, for automatically building glossaries and conceptual structures from text corpora, will be incorporated in IMPRESS. The SOCIS modules will be enhanced to build the terminology of emerging sub-domains, like ear and clothes impressions, by using search engines as text providers. The SOCIS text corpus (a collection of texts with 0.75 million words) will be enhanced by the 1 million plus MO reports (c. 10 million words) provided mainly by the West Midland’s Force. The SOCIS image repository, provided by the Met’s Crime Academy, will be used to focus on the key visual features essential for linking crime scenes. This repository will be expanded under the guidance of Strathclyde and the Forces.  An ontology of the domain will be semi-automatically constructed based on a domain-specific corpus and tuned in with the NIM and PITO Common Data Model. This ontology will help address the issue of terminology variation amongst the police forces by acting as a translation filter as well as generating term clusters for information extraction and retrieval purposes.

    Milestone: Automatic terminology/ontology component (Month 10)

     

    Work Package 3: Text-Image Data Mining and Multi-modal Data Fusion (16 person months: Surrey 16, Sheffield 12)

    Information extraction based methods for text mining, developed in the SOCIS project, will be tested on large volume data provided by West Midlands and other Forces. Image and text analysis techniques developed in SOCIS will be evaluated and expanded further. Components of GATE and QUIRK will be adapted for the IMPRESS prototype for text mining, and image analysis toolboxes, e.g MATLAB, will be used for image analysis. Multi-modal data fusion is the integration of disparate data from a variety of sources and of differing modalities into a formal framework. The fusion of this data can be greater than the sum of the parts, providing a rich, homogenous data representation. A variety of existing multi-net neural systems and adaptive IE systems will be evaluated to produce a representation of the data acquired in WPs 1, 2.

    Milestone: Text-Image data mining components (Month 12), Multi-modal data fusion component (Month 16).

     

    Work Package 4: System Development (33 person months: Surrey 18, Sheffield 9, Strathclyde 6)

    A system will be developed to extract patterns of similar criminal behaviour across crime scenes in order to identify serial criminals. The IMPRESS system will be based on the SOCIS web-based system and will be enhanced to include image analysis, multi-modal processing and a learning capability, and GRID-enablement. The system will incorporate results from WPs 2 and 3, along with existing data mining techniques. The prototype will be continually tested and will be released to the various Police Forces. Standard software engineering methodologies will be adhered to with a continuous prototyping approach adopted, allowing domain experts to evaluate, and provide feedback, on system design and functionality.

    Milestones: Prototype I (Month 16), Prototype II (Month 20)

     

    Work Package 5: Round Table Meetings and Knowledge Dissemination (13 person months: Sur 4, Sheff 3, Strathclyde 6)

    IMPRESS will follow three dissemination routes: First, journal publications and peer-reviewed conferences in the areas of information extraction/NLP, data mining, multimedia systems and the bi-annual meetings of the UK Forensic Science Society. Second, Strathclyde, in collaboration with the Surrey and Sheffield will produce a training programme that can be delivered either at universities teaching forensic science or at recognised training establishments. Third crucial route is the IMPRESS Round Table. A round table (RT), comprising forensic professionals, and associated software houses, across the UK, will be formed at the outset of the project. The RT will be chaired by a leading forensic professional and will meet at quarterly intervals for the validation of knowledge gathered, and for user testing and evaluation of the IMPRESS prototype. SOLCARA will be on the RT to address questions related to the commercial potential of the IMPRESS system.  The Round Table will be expanded to include other Police Forces and software houses in the UK.

    Milestone: Web Site (Mnth 1), Training Programme (Mnth 22), Evaluation report/ Final Report (Mnth 24)

     

    Justification of Resources

    The University of Surrey will appoint one RA and one PhD student, Sheffield will appoint and RA, and Strathclyde one PhD student. In addition funding is requested for a part time Knowledge Dissemination Officer for editorial and clerical work.  The main focus of the Research Assistants work will be on designing and implementing the IMPRESS system.  The RA’s will first evaluate, and subsequently implement a system comprising data mining, adaptive information extraction, neural computing, data fusion and image processing techniques.  The existing SoCIS software will be used as a starting point. The Project Students will undertake domain modelling, terminology and ontology construction.  Funding is requested for travel to allow the collaborating Universities to meet to discuss the ongoing work and to enable the regular Round Table meetings to be held at each University in turn. Funding for technical support and library provision is requested.

     

    Evidence of connectivity and explanation of the value of the proposed Collaboration

    We have formed a consortium that includes 5 police forces (Kent Constabulary, the Metropolitan Police and the Metropolitan Police Crime Academy, Strathclyde Police, Surrey, West Midlands), a software systems house (Solcara) specializing in police computer systems and a University (London Metropolitan University) Forensic Science Department. The consortium will act both as a source of domain knowledge and as one method of knowledge dissemination. Additional dissemination will be achieved by the employment of a knowledge dissemination officer who will provide secretarial, editorial and presentation support for the dissemination of information to interested parties. Solcara PLC will be involved in exploring the commercial potential of the IMPRESS system.

     

    Outline of management structure

    The proposed consortium includes UK police forces, three Universities, and a software systems developer specializing in crime prevention and detection technologies.  The project will be co-coordinated by a Round Table chaired by an experienced forensic science practitioner from one of the five Police forces.  The Round Table will be involved in chasing progress as per the work plan and will suggest changes or modifications accordingly.  The universities together will produce quarterly progress reports for the duration of the project. The software houses will explore possible exploitation routes with the support of the Universities. 

                  

    Arrangements for take up (IPR ownership)

    During the life of the project IPR ownership will rest jointly with the Universities of Surrey, Sheffield and Strathclyde.  This could be transferred, in whole or in part, subject to discussion, to any interested parties on completion of the project. 

     

    References

    Ahm 01a:       Ahmad, K. (2001).  ‘The Role of Specialist Terminology in Artificial Intelligence and Knowledge Acquisition’.  In (Eds. ) S-E. Wright & G. Budin.  Handbook of Terminology Management.  Amsterdam: John Benjamins Pub. Co.  pp  809-844.

    Ahm 01b:       Ahmad, K., Vrusias, B. &Ledford, A.,(2001)  Choosing Feature Sets for Training and Testing Self-Organising Maps: A Case Study, Neural Comp & App, Volume 10, pp 56-66.

    Ahm 02a        Ahmad, K., Bale, T., & Casey, M.  (2002) Connectionist Simulation of Quantification Skills.  Connection Science Vol. 14 (No. 3).  pp 165-201.

    Ahm 02b        Ahmad, K., Vrusias, B. &Tariq, M. (2002), Co-operative Neural Networks and Integrated Classification, Proc. 2002 Int. Joint Conf. on Neural Networks Piscataway: IEEE Press.  pp.1546-1551,.

    Ahm 03a:       Ahmad, K., Tariq, M., Vrusias, B. and Handy C.(2003). Corpus-Based Thesaurus Construction for Image Retrieval in Specialist Domains.  In (Ed). Proc 25th European Conf on Inf. Retrieval Research (ECIR-03, Pisa, Italy) LNCS-2633.  Heidelberg:Springer Verlag.  pp 502-510

    Ahm 03b:       Ahmad, K., Casey, M. & Vrusias, B., Combining Multiple Modes of Information using Unsupervised Neural Classifiers, Proc.  MCS 03.  LNCS 2709.  Heidelberg: Springer-Verlag.

    Alt 02: Al-Thubaity, A. & Ahmad, K. (2002) Tracking the Knowledge of Emergent Domains.  Proc 6th Int. Conf. on Inf. Visulaisation (London).  Los Alamitos: IEEE Comp. Press. pp 685-690. 

    Cir 01: F. Ciravegna (2001)   (LP)2, an Adaptive Algorithm for Information Extraction from Web-related Texts. In Proc. IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle 2001

    Cir 02: F. Ciravegna, A. Dingli, Y. Wilks, D. Petrelli (2002) Timely and Non-Intrusive Active Document Annotation via Adaptive Information Extraction. In Proc. of ECAI Workshop on Semantic Authoring, Annotation and Knowledge Markup, Lyon, France, 2002.

    Clo 02: P. Clough, R. Gaizauskas, S. Piao and Y. Wilks (2002) METER: MEasuring TExt Reuse. Proceedings of the Association for Comp. Linguistics.  July 2002.

    Cun 02: H. Cunningham (2002) GATE, a General Architecture for Text Engineering. Journal of Computers and the Humanities, Vol. 36, pp. 223-254.

    Eak 99    Eakins, J.P., Graham, M.E.: Content-based Image Retrieval: A Report to the JISC Technology Applications Programme. Image Data Research Institute Newcastle, Northumbria

    Gil 02: Gillam, L.  (2002).  (Ed.) Workshop on Financial News: Making Money in the Financial Services Industry.  Int Conf. on Terminology and Knowledge Eng. (August 2002, Nancy, France).

    Hum 00: K. Humphreys, G. Demetriou and R. Gaizauskas (2000) Two Applications of Information Extraction to Biological Science Journal Articles: Enzyme Interactions and Protein Structures. In Proc. Pacific Symposium on Biocomputing Honolulu.

    May 02: D. Maynard, H. Cunningham, K. Bontcheva and M. Dimitrov (2002) Adapting A Robust Multi-Genre NE System for Automatic Content Extraction, In Proc.of the 10th Int. Conf. on Art. Int.: Methodology, Systems, Applications (AIMSA 2002)

    Pas 02: Pastra, K., Saggion, H., Wilks, Y.,Extracting relational facts for indexing and retrieval of crime-scene photographs, Knowledge-Based Systems, Elsevier Science (forthcoming)

    Pas 03: Pastra, K., Saggion, H., Wilks, Y., Intelligent Indexing of Crime-Scene Photographs, IEEE Intelligent Systems, Special Issue on "Advances In Natural Language Processing", vol. 18 (1) pp. 55-61, 2003.

    Rol 02    F. Roli, J. Kittler (Eds.), Multiple Classifier Systems: Proceedings (LNCS), Third International Workshop, MCS 2002, Cagliari, Italy, June 24-26, 2002.

    Sal 03: Salway, A.,  Graham, M., Tomadaki, E., & Xu J.,(2003), ‘Linking Video and Text via Representations of Narrative', AAAI Spring Symposium on Intelligent Multimedia Knowledge Management, Palo Alto, 24-26 March 2003.

    Sal 98:     Salway, A. & Ahmad, K.  (1998) Talking Pictures: Indexing and Representing Video with Collateral Texts. In (Eds.) D.Hiemstra, F de Jong and K.Netter. Twente Workshop on Lang. Tech. in Multimedia Info. Retrieval, December 7-8, 1998. Enschede: Univ. Twente. pp85-94.

    Sri 00      Srihari, R K & Zhang Z:  Show&Tell: A Semi-Automated Image Annotation System. IEEE MultiMedia 7(3): 61-71 (2000)

    Ste 01: M. Stevenson and Y. Wilks, The Interaction of Knowledge Sources in Word Sense Disambiguation. Journal of Comp. Linguistics, 2001.


    IMPRESS Detailed Work Plan – Gantt Chart

     

     

     

     

     

     

     

    WP1: Domain Modelling,                                                                                           Milestone: User Requirements Specification

     

    WP2: Terminology and Ontology Building,                                                             Milestone: Automatic terminology/ontology component

     

    WP3: Text-Image Data Mining and Multi-modal Data Fusion.                Milestone: Text-Image data components, Multimodal data fusion component

     

    WP4: System Development                                                                        Milestone: Prototype I (month 16), Prototype II (month 20)

     

    WP5 (a): Knowledge Dissemination                                            Milestone: Web Site (Mnth 1), Training Programme (Mnth 22), Evaluation report/ Final Report (Mnth 24)

     

    WP5 (b): Round Table Meetings