TERMINOLOGY RESOURCES


SCOPE AND QUALITY OF RESOURCES


General Terminology Resources

Present Situation


In the course of the survey conducted as part of the POINTER Project, a large number of terminological resources were discovered, although many gaps and problems also became apparent. Some of the most important basic types of resource are texts, dictionaries, glossaries and thesauri, with monolingual and multilingual examples of each category being found. In practice, terminology resources are concentrated on technical and scientific areas (UDC 6), with some UDC 3 (social and economic sciences). As regards multilingual resources, in practice the market is dominated by bilingual, trilingual, or quadrilingual lexica, although in the area of information and documentation a wide range of multilingual thesauri are available for different subject fields. Multilingual terminological glossaries, on the other hand, are poorly disseminated and only a few of them fulfil minimum quality requirements. In addition, multilingual terminology in the eleven languages of the EU is practically non-existent, except in the case of EURODICAUTOM.

A quick analogy may serve to make the difference between the different types of resources clear: thesauri can be likened to the tools used to find knowledge, while dictionaries and glossaries are the keys to understanding it. Dictionaries and terminology resources contain similar items of data, but these are presented and organised in different ways as a result of the different methods employed in lexicography and terminography respectively (see Chap. I: "Introduction" for a more detailed discussion of this subject). Dictionaries inform us of the different meanings which can be expressed by a single word, while terminology resources, on the other hand, provide us with the different words which express the same meaning.

Dictionaries are produced using a purely descriptive process. Terminology glossaries are normally produced using two phases. The first phase employs a descriptive process in which all terms which describe the same concept are grouped together. The second phase is prescriptive, with a preferred term being selected from those identified as synonyms, i.e. as designating the same concept. In making such a decision, there are various degrees of prescription, according to the requirements of the subject field, among other criteria. This is illustrated in Figure 3 below.


Figure 3 : Degrees of Prescription in Terminology Work

Problems


The problems affecting terminology resources in Europe have already been discussed in some detail in, among other places, Chap. 3.2: "National and Regional Aspects of Terminology Work". For the sake of clarity, and because of the importance of the subject, the most important points are repeated here.

Lack of Terminology Resources
Lack of Reutilisation
Quality
Neologisms
Multilingual Resources

Solutions


It is obvious that the aim of any solution must ultimately be the efficient provision of an adequate body of adequate quality (machine-readable) resources in all languages and for all domains. This involves the creation of resources where these do not already exist, their distribution, and the provision of information on them and in particular on how to access them. In turn, this depends on adequate training and quality awareness and validation schemes. These themes are the subject of separate chapters in this Final Report (cf. Chap. 2.1: "Impact and Awareness", Chap. 4.5: "Quality and Validation of Terminological Resources", and Chap. 5.2: "Training"), and are not dealt with more closely here. Nevertheless, there are a number of both general and specific recommendations which can be made.

Recommendations


Lesser-used and Minority Languages

Present Situation


Lesser-Used Languages

Minority Languages

Problems


Lesser-Used Languages

Minority Languages

The following problems can be identified:

Solutions


In the case of both lesser-used and minority languages, national member states and the EU itself are both already aware of the situation and have supported initiatives aimed at improvement. It should however be stressed that further-reaching support is needed.

Recommendations


Specific Domains -Case Studies

Present Situation


Three domain-specific case studies were performed as part of Phase II of the POINTER Project, in order to examine in detail how domain issues could be approached and, in particular, what measures could sensibly be adopted in innovative and/or politically sensitive areas. The three domains chosen were:

In spite of the fact that all three subject areas studied have in common a high level of topicality, situations and problems encountered were found to differ a great deal.

Social Protection and Labour Law

Telecommunications

Environment

Problems


The most outstanding problem which the three areas have in common is lack of co-ordination and, in particular, the fact that when tasks are allocated, it is rare that the partners involved actually do work together.

Another problem lies in the basic lack of awareness about how important terminology work really is. This problem is common to all three areas, but most apparent in the area of social protection and labour law.

The following domain-specific problems were also identified:

Social Protection and Labour Law

Telecommunications

Environment

Solutions


A process should be developed which enables the growth of networks which serves as the basis for shared research and development activities and mutual services. This should be a process in which industry and commerce, academia, politics and administration work together and provide an initiative for further process chains.

Recommendations


Social Protection and Labour Law
Telecommunications
Environment

TERMINOLOGY CREATION AND TERMINOGRAPHY


Terminology Creation

Present Situation


Terminological resources are created consciously and unconsciously in controlled processes and spontaneously in all facets of life, at all language levels and in all fields of speciality. This takes place through very diverse processes; the results vary greatly in quality.

This chapter will treat the creation of terms or groups of terms and the creation of terminology databases for specific purposes or tasks.

Private Sector

Monolingual and multilingual terminological resources come into existence in connection with various activities and for various kinds of applications in the framework of complex processes. The diversity of these terminologies becomes especially clear in the context of translation. A simplified process chain for the creation of documentation in private industry was selected as a model to elucidate terminology creation phenomena and actions. This process chain is assumed to include the following stages:

A product comes into existence in the process chain; its documentation is created in parallel. The concept of the product, its design, all data and technical properties, etc., are devised and elaborated in the engineering stages (research and development, and manufacturing). The information required for tailoring the product to market needs and for marketing it is produced in the marketing stage.

Text fragments are created in these stages which contain all significant terminology. These fragments also include new terms. In engineering departments, new terms may result from inventions, new technical requirements, new developments and from market and user acceptance criteria. The product documentation, such as user manuals, descriptions, etc., must be written for the customers and must use their language (or rather, that of the market). Information on this aspect is supplied/provided during the marketing stage of the product/service.

Text fragments, technical data, background information, etc., are the input for the authoring processes, for technical writing. In the engineering departments, the new terms were most likely not documented let alone processed terminologically. That means new terms appear without any explanation of their meaning, their concept, etc. Terminological explanations are necessary, and generally are created by technical writers together with the specialists of each particular subject field. This work results in monolingual terminological records, which are integrated into an existing database of a terminology management system. This is the ideal situation, but file cards can still be found.

In this manner, a terminology database comes into being or is expanded which serves as a basis for technical writing and also as the foundation for creating the multilingual terminologies required in translation work.

Where terminology already exists, it can consequently be assumed that the new terminologies must be processed and integrated into an existing terminology database. All unknown terms in the texts to be translated (not always the new terms) are clarified during the translation process and the foreign-language equivalents are retrieved or determined. Ideally, the translators and terminologists collaborate with specialists in the engineering departments and/or with technical writers to process the terminology, clarify the meaning of the terms and possibly to formulate a foreign-language definition. The further terminological processing of new terms involves a series of various procedures and steps.

If equivalents for the unknown terms cannot be found by consulting the authors of the texts or specialists, terminology databases can be accessed. This is rarely successful. There are very few generally accessible terminology databases which provide terminology from rapidly-changing fields. The next step is to "scan" the literature - on paper or in databases. In general, translators and terminologists are not specialists for the subject field in question. They therefore search by means of words, but not on a conceptual basis. The latter is only possible in collaboration with specialists. This is also true for the search for foreign-language equivalents in monolingual literature of the target language.

This very time-consuming search is usually in vain, not only for truly new terms, the result being incomplete entries in terminology databases or the card index. The terminology database is expanded with the results of the current work. Provisional entries are verified from the point of view of content and language.

The pressure of deadlines means the results of time-consuming searches cannot be included in current translation work. Instead, the term is paraphrased, or hasty, tentative "translations" of the unknown terms are employed.

This characterises the situation especially well in large corporations. In smaller companies, terminology management systems and similar technologies are seldom in use.

Public Sector

The model described for private industry can easily be transferred to other environments. The "production" processes in the public sector are analogous to those found in the private sector, i.e. "products" (e.g. regulations, laws and other documents) are developed in a conceptual process and tailored to external requirements and circumstances, and especially to particular target groups. Depending on the internal structure of the agencies, the authoring process is carried out by editorial staff, publication departments, etc.

Translation needs are considerable in multinational agencies representing member nations with different languages, as well as in national governments with more than one official language, and in international organisations.

The internal consultations between the above-mentioned stages of the model tend to follow a common pattern, especially as regards translation processes. Terminology databases and terminology management systems are in use. Terminology work is quite comparable, in particular in the context of terminology creation. However, preparatory terminology work is done on a broader scale and more consistently. The multilingual institutions provide excellent preconditions for this.

Controlled and Uncontrolled Terminology Creation

Standardisation

Private industry

Public sector

Neologisms in Terminology

New terms are intentionally coined or come into being accidentally in all domains in special as well as general language. Whenever a concrete or abstract object is discovered, invented, created, or introduced into an existing environment, a new concept or a number of new concepts will come into existence and will consequently be named. They will be named more or less at random, but will be strongly affected by the subjective view of the "inventor". Of course, many other factors, and especially the influence of foreign languages, also play a role in this process. The new terms thus coined will not always be accompanied by an explanation, much less a definition, and the concept structure in which they are embedded may be apparent only to the creator of the terms. Moreover, some such newly coined terms may already exist within the same or different subject fields or domains, and may have the same, a similar, or a completely different meaning. Highly ambiguous terms will thus be generated.

This is a worst-case description of reality, which explains in part why communication, for example among research and development units, is not always satisfactory, is sometimes even misleading and rarely maximises efficiency.

There are, however, quite a few examples of proper terminology work in research and development, e.g. in large and medium-size enterprises where special language communication is managed as a controlled process.

In addition, there are observatories of neology(1) in several European countries, including France, Portugal, Spain (Spanish and Catalan) and Denmark. These observatories have databases of neologisms which deserve to be better known by terminologists and translators. There are also new initiatives seeking to exploit the possibilities offered by electronic networks for exchanging information on neology. One such is the French-language project Balneo, sponsored by Rint, and accessible on Wais.

The terminological deficiencies outlined above are partly remedied only long after the new terms have been coined. This type of work is performed by glossary committees, terminology working groups within large companies, administrative and standardisation bodies, etc.

In this context, it should be noted that the phenomena related to neologism or the coining of terms in science and technology are comparable with the terminology situation and processes in administrative and governmental institutions, and in politics. New ideas, new regulations or deregulations, new taxes, etc. bring about new concepts, which in turn require that new terms be coined. The deficiencies inherent in this process are similar to those outlined in connection with research and development activities.

Over time, special language terms migrate into general language. Their concepts become increasingly fuzzy, and very often new terms, sometimes derived from the original ones, are coined to cover the same concept.

In multilingual communication, problems of neologism occur at the language interfaces where concepts are to be transferred from a source language into a target language. This task is a difficult one, above all in cases where new terms - neologisms - are contained in highly specialised source-language texts.

Problems


The situation outlined above shows that there are a large number of many isolated, different sources generating considerable volumes of new terms almost continuously. Apart from a few exceptions, it is not possible to harmonise or co-ordinate processes, and a lack of cohesion is often the result. A relatively small percentage of the "new" terminology is stored in databases or terminology management systems. Most of it is documented on paper, rendering efficient dissemination difficult. Some existing terminologies are also not made available, while others are not immediately accessible. Most of the individual "collections" are not compatible, making it difficult to share them. Copyright issues also limit interchange of results. The impact of these problems to be solved becomes clear when one considers the high rate of innovation in engineering and science, the rapid structural changes in the business sector, in the public sector as well as in national, European and international economies, and the tidal changes in the political realm.

The numbers involved clearly illustrate the order of magnitude of the problem. The overall number of terms resources is estimated to be approximately 50 to 100 times greater than the number of general language words in each language.

Solutions


Solutions should be sought in the context of the origins/provenance and effects of the problems outlined above. It will hardly be possible to change the fundamental behavioural patterns of those involved. However, by constantly monitoring changes in languages and their terminology components, the overall situation can surely be improved. In the processes connected with controlled terminology creation, consistent use of terminology management systems or comparable database technologies is desirable. The terminological (terminographic) records should be expanded to include additional data categories, allowing all kinds of neologism to be treated specifically. This could be used to reduce the number of redundant neologisms created.

In general, an attempt should be made to co-ordinate individual activities at least in related fields, in the public sector, in private industry, universities, etc. Additionally, the harmonisation of procedures for terminology work should be encouraged wherever sensible in order to facilitate evaluation and to improve the quality of the results, and to reduce overall expenditure on terminology work.

Spontaneous term creation cannot be influenced. Here, it is only possible to continuously monitor language change. This requires intensive text corpora analyses of marketing literature, conference proceedings,, journals, etc. Above all, it is necessary to intensively analyse texts in subject fields of innovative disciplines, such as research and development reports, but also product documentation texts, such as operating manuals.

Terms from specialised vocabularies are increasingly finding their way into everyday language. A new type of dictionary needs to be developed which would systematically take this aspect into consideration, i.e., by explaining the usage and concepts of terms in new technologies to a lay audience. This dictionary type should be made available via a variety of on-line and off-line media.

Recommendations


Terminography

Present Situation


The following section will attempt to detail some significant steps in the processes of creating and using specialised dictionaries.

Specialised Dictionaries

The concept of a specialised dictionary or LSP dictionary (on paper, computer, microfiche, CD-ROM, or other storage medium) is taken here in a very broad sense, to include the following types of dictionary (cf. [Picht/Drask 85]):

In recent years, the dictionary market has been characterised by a large number of general monolingual and multilingual dictionaries and by an increasingly large demand for specialised dictionaries.

With the increasing volume of information, publishers have especially begun to produce electronic dictionaries, i.e. machine-readable collections of terms.

Available terminology repositories provide linguistic and conceptual information on terms, e.g. equivalents, homonyms, etc. This information may be insufficient to support effective human translation or in particular, accurate and efficient machine translation, a process demanding much more explicit data. With the development of large-scale NLP (Natural Language Processing) systems for real world applications, there is a need for large-scale terminological resources.

Terminological lexica have to be used in many varied applications, such as

The Structure of Specialised Dictionaries

The structure of terminological entries in specialised dictionaries differs considerably from one specialised dictionary to another, e.g. as regards the ordering systems, the treatment of ambiguities, and the word groups, i.e. multi-word terms.

The terminological entry should ideally contain:

The advantages of computer assistance in the production and maintenance of specialised dictionaries is well known. Today, terminological repositories are usually implemented by means of term banks based on the relational type. Since the majority of current term banks are term-oriented (as opposed to concept-oriented):

Users of Terminographic Products

Users of specialised dictionaries can be divided into two main categories:

Producers of Terminographic Products

The authors of terminographic products constitute a heterogeneous group:

There are various types of publishers who regularly or occasionally publish terminology repositories:

Publishing of Terminographic Products

The terminographic work process can be characterised as a complex activity consisting of sequential, iterative and interwoven steps, such as planning, editing, correction, revision and production. Not all steps are always equally relevant (revised editions vs. processing of complete databases).

The final products of these terminological activities are all kinds of specialised data collections, such as systematic specialised dictionaries, glossaries and thesauri. When a specialised dictionary is created, high goals are often unattainable because of financial and other constraints (size, restrictions on contents).

"Publishing on demand" and "printing on profile" are becoming increasingly popular in the field of dictionary publishing.

Publishing companies often utilise the following means to make printed materials available electronically:

Terminological data is increasingly marked with a standard mark-up language such as SGML (Standard Generalised Mark-up Language). SGML makes it possible to interchange terminologies and documents between different systems.

Problems


Solutions


Recommendations


Reusability of lexicographic resources

Present Situation and Problems


An important statistic in the context of a European terminological infrastructure is the fact that the specialist terms in some monolingual, unabridged general-language (LGP) dictionaries comprise as much as 40% of the total number of entries in those dictionaries [Lan 89]. Terms included in such dictionaries tend to be those which are used and encountered by both experts and laypeople. Hence, some general-language dictionaries may be regarded as a potentially valuable terminology resource.

At least three projects in this field exist in the French-speaking world, one by the Réseau Lexicologie, Terminologie, Traduction, Industries de la langue of the AUPELF-UREF, another by CNRS (CRIN-INaLF) and the third DELA (Dictionnaire électronique de langue francaise) by LADL (Laboratoire Automatique Documentaire et Linguistique de l'Université de Paris 7).

In discussing the occurrence of terms in LGP (Language for General Purpose) dictionaries, it is important to consider how such terms can be defined and identified. However, the differentiation of terms from words is not straightforward, since the relationship between general language and sublanguages (or special languages - LSPs) is an interdependent one. Not only do terms and words draw on the same set of phonological, morphological and morpho-syntactic rules, as well as on the same inventory of word forms in many cases, but particular forms may also cross the boundary between the two language varieties.


 Borrowing      Elaboration         Characteristics         Examples    
from and to                                                             

LSP -> LGP   A                 lessening of precision    parameter      
             special-language  not subject-field         (mathematics)  
             term becomes      specific                  paranoid       
             field-external    broader and ill-defined   (psychology)   
             and is used in    range of users                           
             the general       different linguistic                     
             language.         characteristics                          
                               more flexible?                           

LGP -> LSP   A                 increase in precision     window         
             general-language  subject-field specific    (information   
             word is adopted   narrower and              technology)    
             for a special     clearly-defined range of  squish         
             language.         users                     (linguistics)  
                               different linguistic                     
                               characteristics - more                   
                               restricted?                              

LSP 1 ->     A term from one   maintenance of precision  virus          
LSP 2        special language  but different             (microbiology/ 
             is adopted by     subject-field specific    information    
             another special   (treated as different     technology)    
             language.         lemmata)                  morphology     
                               different range of users  (biology;      
                               different linguistic      linguistics)   
                               characteristics                          
                               differently restricted?                  



Table 2 : Borrowing Relationships: Special Languages and General Language

Table 2 shows that there are three principal relationships that are crucial for importing terminology from dictionaries or from other specialist disciplines.

If data from LGP dictionaries is to be re-used for terminological purposes (however broadly defined), then particular senses, or even sub-senses, need to be identified within the entry, since the headword itself is often polysemous (i.e. it has multiple meanings) in the lexicographical approach. The headword cannot therefore be easily characterised as a term or as a word, since it potentially represents both. Furthermore, the same entry may contain terms from several subject fields. The consistent and precise use of subject-field labels is therefore of some importance. The selection of subject-field labels also needs to be addressed, including, for instance, the often ad hoc expansion of an established inventory at varying levels of specificity.

The definition - or definitions in the case of polysemous headwords - is a key element of any entry distinguishing senses and linking entries semantically. For those headwords (or senses of headwords) which may be classed as "terms", the definitions help to map out an area of knowledge within the boundary indicated by the subject-field label. Practice in LGP dictionaries varies with regard to the technicality of the register in which the definition is formulated, depending on the perceived needs of the users.

The automatic extraction of terminological data from currently available LGP dictionaries (both monolingual and bilingual) is problematic for a variety of reasons including:

Solutions


Solutions to these problems are likely to be medium- rather than short-term, since they are dependent on the availability of machine-readable dictionaries, in which data is more consistently represented to the user. In addition, the emergence of computational lexicography and the use of corpora mean that dictionaries, including general-language dictionaries, can be updated more frequently and in a more representative way than is possible with manual methods. Each update will contain scientific and technological neologisms, as well as more established terms.

In elaborating a term, it is important that the relations between terms and concepts should be clearer. A corpus-based approach will be of some help here in that the corpus which is to be used for extracting terms and their context can be structured so that terminologists know the type of data they are dealing with; in this way, the information can be more meaningful, as its origins and its target (i.e. user) are clear.

Recommendations


Phraseological units in subject field texts as a resource for termbases

Present Situation


In the fields of terminology research and terminography, the present situation regarding LSP phraseological units is characterised by deficiencies in the recognition of problems and the formulation of suitable solutions to phraseological units.

Specialists are usually familiar with phraseological units of specialised language, i.e., phrases and more or less fixed multi-word expressions. However, these units cause serious problems for non-specialised translators, technical writers and machine-aided translation systems and machine translation systems.

It must be kept in mind that these units are relevant in understanding, creating and translating specialist texts and for successful subject-oriented communication in general.

When extracting units of knowledge and subject field information from special language texts, extensive knowledge of the theories and methods of LSP phraseology is essential. On the one hand, there are a few attempts to represent phraseological knowledge in terminological databases. On the other hand, intensive research and development work in computational lexicography is going on with the goal of better analysis and translation of LSP phraseologisms.

Problems


Solutions


Recommendations


TERMINOLOGY DISTRIBUTION AND EXCHANGE


Terminology Distribution

Present situation


The distribution and dissemination of terminology had been identified as one of the major bottlenecks to efficient terminological activities even before the POINTER Project commenced - indeed, it was a major reason for setting up the project in the first place. However, although the problem is so obvious, it still needs differentiated analysis if concrete, viable suggestions for improvement are to be made.

The results of the POINTER investigations suggest that, in broad terms, distribution mechanisms can be divided into two distinct modes (that may sometimes interact):

Active terminology distribution covers situations in which terminology resources are consciously and specifically exported to one or more target users or groups in response to perceived or actual market demand. This process may take a number of forms, including:

In theory at least, the objective of active terminology distribution is to ensure the provision of the right terminology, at the right time, at the right place and at an acceptable price.

The POINTER Project has been able both to highlight a number of problems for both forms of terminology distribution, and to identify potential solutions.

Problems


Many of the problems hindering effective distribution are generic. These are described elsewhere in this report, for instance:

Above and beyond this, one of the most important problems is the fragmentation of resources and comparative difficulty in locating them. Even within the dictionary publishing industry, the provisions for sourcing works (e.g. via "Books in Print") are limited, especially on a European or international scale; outside this sector, the situation is much worse. In addition, the problem of "grey literature" (i.e. terminology lists, etc. without ISBN or other reference and cataloguing numbers, for instance because they appear as part of another work) is viewed by many as particularly acute. The increasing use of on-line processes and services adds to the problem, since the potential for creation and dissemination now vastly exceeds the capacity for sorting and registering data. Within individual enterprises, terminology is often not made available beyond departmental boundaries, e.g. because of lack of funding, status problems and generally poor communication flows within some organisations.

To some extent, though, some of the problems are offset through the availability of terminological resources available on-line on a "user collect" basis. This occurs both internally and externally (in the latter case it was pioneered by leading IT sector companies), and may be free of charge or based on a pay-per-view or fixed charge concept. In the case of the provision of terminology free of charge to external users, the primary objective of the resource owner is to make a market for this terminology, encouraging developers and users to standardise on the vendor's own terminology for generic applications.

Despite this encouraging development, however, it is clear that very often, the "right" terminology is not being provided "at the right time, at the right place and at the right cost". Since the underlying problem is essentially one of organisation, it is possibly best tackled at a structural level. Thus there is a distinct need - expressed by a wide range of resource owners, service providers and users - for the establishment of an infrastructure to catalogue and distribute terminological resources available either to the general market or to specific user groups. Equally, the European Commission, DGXIII, has identified an urgent need for a mechanism to validate and distribute what it terms "generic" resources, resulting from publicly (co-)funded projects, to facilitate and improve information transfer and resource re-usability.

This leads on to the problem of the quality of terminology. The terminology market appears to be characterised by what is in classic business terms a most unusual phenomenon. The relative scarcity of high-quality terminological resources would normally result in a seller's market, but in fact the opposite is the case: buyers appear to enjoy a disproportionate level of power on the market, and frequently choose to dispense with external terminology, either developing their own, despite often inadequate subject-area knowledge, or turning to sources they know are unreliable. Apart from cost factors - a problem which can be overcome by a more professional approach to cost-benefit analysis by resource providers, on the one hand, and customer/user education, on the other (cf. Chap. 2.2: "Economic Aspects of Terminology and Terminology Work") - quality appears to be a key consideration.

There are also problems involved with the passive distribution methods described above. The lack of terminology-related skills of many authors and users means that they are frequently unable to make the best possible use of the terminology incorporated in the various texts and other dissemination media they are exposed to, a situation which hampers communication and prevents re-usability.

In addition, as far as innovative forms of distribution - both active and passive - are concerned, there is a lack of awareness in the terminology community of the opportunities now available through global communication networks, combined with a lack of skills and know-how needed to exploit these networks.

Solutions


The proposals to establish a European Association for Terminology (EAFT) and a European Terminology Information Server (ETIS) are given in detail in Section 6. They are supported by the POINTER Consortium and other leading members of the terminology community (both producers and users), with a broad readiness among all groups to join an efficient terminology distribution network. If such an infrastructure can also help ensure the quality assurance and validation of the terminology provided, the likelihood of its long-term success will be significantly greater.

In addition, solutions are already under development for the identification, creation, cataloguing and evaluation of resources. These include innovative national initiatives, such as the DIT in Germany, CRAIE in France and the DTG in Denmark, which in many cases aim to provide terminology information and brokerage services, among other things. However, in all cases these institutions are underfunded and understaffed, being able to perform only a fraction of their necessary role. National and regional authorities are therefore urged to provide a variety of support measures for the establishment of such centres, which will accelerate terminology distribution and provide tangible economic benefits in the various regions. In addition to addressing the urgent problem of generic terminology distribution and re-use, ELRA, working together with the EAFT and all other relevant organisations should create as part of the proposed European Terminology Information Server (ETIS) a "catalogue of catalogues", directing those seeking access to particular terminology and or related skills to the right address. The terminology members of ELRA are fully aware of all the problems affecting the distribution of terminology, and ELRA intends co-operating fully with the future EAFT to establish effective distribution channels tailored not only to the various types of terminological resource, but also to specific user groups. Moreover, ELRA will also play a major role in seeking solutions to problems such as IPR, copyright, costing and billing, as one of ELRA's major strengths - and attractions - is its ability to pool the experience and expertise of all three key sectors in the field of language resources.

In addition, the development and implementation of a standardised system for the validation of terminology quality would help considerably to enhance the effectiveness and efficiency of terminology distribution. The ELRA concept ensures that those who are willing to play an active role will be in a position to help mould the concepts and policies for implementing, amongst others, mechanisms for the effective validation (including in collaboration with the INTERVAL project) and distribution of all forms of terminological resources in close co-operation with the EAFT and its members at national and regional level. This will result over time in the emergence of a set of validated resources with a "semi-standardised" function.

In the longer term, EAFT and ELRA will also allow the identification and preparation for collation and distribution of "grey literature", thus meeting demands from many quarters for greater transparency and perhaps the re-usability of the various forms of terminological resource.

However, these extremely valuable initiatives will only be successful if they succeed in attracting as active members as many terminology creators, service providers, resource holders and users as possible, and it is to be hoped that the outstanding opportunity offered by these infrastructures will be given due recognition by all sectors of the terminology community.

The lack of terminology-related skills amongst authors and users, the need to enhance awareness of the Internet and similar networks in the terminology community, as well as the transfer of the skills and know-how to make the most effective use of these opportunities, can best be solved through selective training measures, including seminars and workshops, and demonstrator projects (cf. Chap. 5.2: "Training").

One of the most effective tools for overcoming bottlenecks and breaking down barriers in both active and passive terminology flows is the dissemination of terminology using electronic media. A variety of options is already available, and new developments are a frequent occurrence. For off-line terminology distribution, CD-ROMs have proven their robustness for supporting large databases, and the continued improvement in data transfer speeds for CD-ROM drives is making this option increasingly attractive.

For on-line resources, the Internet, together with the World Wide Web and service provider networks such as CompuServe, offers a wide variety of options for the rapid transport of small to medium-sized terminology collections. However, on-line costs vary widely from country to country in Europe, and in many countries still dominated by public-sector PTT monopolies, on-line users may still suffer serious price penalties.

Despite this, there is evidence that individuals and institutions seeking terminology are increasingly turning to on-line services. One of the best known is CompuServe, where mono- and multilingual specialist terminology questions are fielded in near-real time in the Foreign Language and Education Forum (FLEFO). Questions on specialist terminology are also welcome on many other professional forums. Terminology collections, including substantial data from leading IT vendors, are also available for downloading.

As far as "classic" character-oriented Internet services are concerned, mailing lists such as "lantra-l" and "NETGLOS" (a continuously updated, multilingual glossary of the Internet), and discussion groups such as "sci.lang.translation" on Usenet, are established sources of terminology, in particular in response to specific queries. Easy access to the Internet has also encouraged interest groups to communicate with each other across the globe. Taken overall, they represent a vast repository of terminology knowledge, which is, however, often unstructured and difficult to locate. They are supplemented by organised terminology collections, varying from large-scale termbases such as the European Union's EURODICAUTOM, to expert-produced glossary lists, for example NASA's aeronautics and aerospace glossary. These collections have often been produced for or by large organisations that employ terminologists and documentation experts, although smaller specialist groups have also published (or rather broadcast) their terminology collections on the Internet, for instance the "Free Dictionary of Computing" from the Imperial College of Science and Technology in the UK. Many of these resources are now available via the World Wide Web (WWW), a much more user-friendly graphical interface for Internet knowledge access and dissemination. In addition, a wide range of more "amateur" terminology available.

The number of WWW sites now offering terminological information appears to be growing on an almost daily basis, and is certainly far too great to be described in the context of this Report. There is an increasing number of sites dedicated to translation, terminology and language engineering - examples of the latter include the "NLP/CL Universe" operated by the US Association for Computational Linguistics(3) and the RELATOR Linguistic Resources Server(4). These resources represent a significant step towards the objective of a free, unimpeded and bi-directional flow of terminology. It is vital, however, that both providers and users acquire the skills to be able to use these technologies, a process of continuous learning that should also be encouraged and supported at European, national and regional level.

There are also other networks operating or designed to be operated on a commercial basis with direct or indirect links to the Internet. These include the LINGO project in Germany (a joint venture by, amongst others, Siemens-Nixdorf AG, Deutsche Telekom AG and Eutelis Consult GmbH) which, in a future development phase, plans supporting a terminology service (provided by external service suppliers) in addition to its core translation and distance learning services. There are also projects supported by the European Commission, such as TELELANG, which has completed the initial analysis phase and is now preparing to submit a proposal for implementation. The TELELANG consortium has stated that it too plans including terminology among the services to be offered, but no concrete proposals backed by effective actors appear to have been publicised so far.

An example of one of the innovative solutions to many problems is the "Term Bazaar", which demonstrates how terminology can be treated as an electronic commodity. The Term Bazaar, which can be accessed via the World Wide Web, was developed during the POINTER Project as an example of how to facilitate the exchange of terminology across the World Wide Web. Users can search through various termbases, and use hyperlinks to jump between terms, definitions of terms, and sources (of definitions). By its very nature, the process must be bi-directional, thus allowing users to not only access terminology, but to also enter terminology into the system too. The relatively simple entry structure can be expanded as required to meet future market demand.

Figure 4 : The "Term Bazaar" page

A sample of text-based terminology resources is given in Appendix 8: "The POINTER Home Page and Term Bazaar".

The POINTER WWW Home Page aims to disseminate a wide range of information pertaining to the project. Currently there is information about the various members of the POINTER Project, and their organisations. Furthermore, the WWW page provides access to the various reports and surveys produced for the project. The Term Bazaar can also be accessed. The current address for the page is:

Once the project has ended, it is expected that the pages will continue growing to include all the major reports published from the project, and will serve as a focal point for further discussions and projects concerned with an infrastructure for terminology in Europe.

Terminology Exchange

Present Situation


Targeted, efficient communication employing LSPs (Languages for Special Purposes) is inconceivable without correct terminology. This is why domain experts, technical writers, archivists and information brokers require access to principally monolingual dictionaries containing definitions and explanations. Where communication using LSPs involves a multilingual element, translators and interpreters must be enable to transmit the information into the target language in an appropriate form for the target readership or audience. The ability to search through multilingual terminology resources is a condition for high-quality translation products. Language planners, standardisation specialists, specialist lexicographers and terminologists support these terminology users by preparing and documenting mono- and multilingual LSP vocabularies.

Problems


Traditional media for the elaboration, transmission and use of terminology, such as (specialist) dictionaries, glossaries or card indexes have been increasingly supplanted by developments in the field of electronic data processing. The result of these traditional media being supplemented or replaced, a trend which started in the mid-sixties with the development of termbanks on mainframe systems, has been that a variety of mostly PC-based terminology management programmes is now being employed by the user groups described above.

With language and information category requirements varying widely by user group and organisational environment, the various terminology resources also display a wide structural diversity. This applies not only to traditional termbanks held by large institutions, such as EURODICAUTOM, TEAM, LEXIS, NORMATERM and TERMIUM: the wide variety of smaller terminology resources available in electronic form is also characterised by heterogeneous data categories and terminological entry structures. Even the underlying system design of MULTITERM for Windows (user-defined entry structure), one of the most widespread terminology management systems currently in use, as good as encourages user-specific structures for terminological data. This complicates the exchange of terminological resources between various users and/or systems enormously, requiring the creation of custom conversion programmes for each system or user combination. In many cases, an additional, labour-intensive process of intellectual post-editing is unavoidable if the objective of transferring all terminological information from one resource to another is to be achieved. Section 5.3 discusses the origins of heterogeneous data categories. Some of the heterogeneity arises from the use of ad-hoc linguistic categories and features, and the similarly ad-hoc approach to semantic/knowledge categories.

Concept-oriented models

Despite the apparent success in achieving an internationally accepted standard interchange format (cf. 5.2.3 below), one critical area where no workable solution has yet been found concerns the many legacy termbanks currently installed at enterprises and institutions world-wide. Many of these systems feature structures and ordering systems which make data export an extremely lengthy, and expensive - and thus unattractive - burden, with a result that valuable terminological resources are frequently unavailable in re-usable form outside the organisation. A solution to this problem would also provide valuable input for the design of new terminology database structures and more sophisticated and effective methods of data interchange.

The ordering system of TRs is the result of individual domain-oriented thinking processes (individuals or groups of persons) aiming at the primary purpose for which it is intended. These processes are influenced by factors such as organisational environment, cognitive aspects and external parameters. The purpose of the ordering system reflects both the primary task and the nature of the TR's use (e.g. support for technical writing, documentation and translation, standardisation, specialised communication).

It can safety be assumed that the vast number of diverse ordering systems are not structured on a concept-oriented basis in accordance with the principles of terminology work and thesaurus construction. Such deficiencies characterise a majority of TRs for all fields of speciality and disciplines. A further major shortcoming of these TRs is the fact that only a small percentage of them contain definitions of the concepts. Thus, even at the level of individual terms and records, the most essential criterion for conceptual orientation is lacking.

One main objective of current efforts to create a European terminology infrastructure must be to allow content-based, conceptually-oriented access to vast numbers of TRs of all European countries and language regions. Access to world-wide TRs is an equally urgent goal. There is no sense in making TRs available in fast, large and powerful computer networks if no aids, tools, or systems are provided for assisting knowledge-based retrieval. The results would not advance the goal of promoting special language communication. It should be remembered that the terminological information must be equally accessible to persons with varying levels of education in all fields, including specialists. The large numbers of homonyms and synonyms generated by uncontrolled merging of TRs would create massive problems, making the system impractical. Due to intellectual and financial considerations, it is not feasible to revise all ordering systems of TRs. The magnitude of the problem becomes evident when one takes into consideration the large number of ordering systems used, the number of major subject fields involved - estimated at between several hundred and several thousand - and the number of terms in each domain (commonly estimated at around 50 million for each of the highly developed languages, not including product names).

Solutions


This problem has been recognised by standardisation institutions at both national and international (ISO - International Organisation for Standardisation) level. A standard for the exchange of terminological data was defined in the early 1980s (ISO 6156: Magnetic tape exchange format for terminological/lexicographical records (MATER), also published as DIN 2341-1). However, this standard proved to be poorly suited to data exchange between modern terminology management systems, due in part - but not only - to the already outmoded choice of magnetic tape for file transfer. For this reason, ISO will shortly be concluding the definition phase of a new, up-to-date exchange format based on SGML (Standard Generalised Markup Language), an internationally accepted documentation description language (ISO 12200: Terminology - Computer applications - Machine-readable Terminology Interchange Format (MARTIF)). The MARTIF standard makes use of the data categories defined in the parallel standard ISO 12620.

One of the advantages of using SGML as the basis for terminology exchange is that the structural potency of the language also allows complex terminology resource structures to be transferred successfully to an exchange format. Another is that SGML has evolved into a standard for the interchange of all forms of documents, so that terminological resources exported in MARTIF can be transferred without problem into print processing systems, and terminology resources available (only) in document form can be imported into terminology management systems with minimal effort. Moreover, because HTML (HyperText Markup Language), the language used for displaying information in the World Wide Web (WWW), is not dissimilar to SGML, the transfer of terminological data encoded in MARTIF to the WWW should present few problems.

However, awareness of the existence and capabilities of both MARTIF and SGML needs to be reinforced, both amongst resource providers and the market, to allow the potential offered by these two solutions to be realised to the greatest possible extent.

Although the two standards ISO 12200 and ISO 12620 define the exchange of terminological data with a high degree of precision, it is probable that additional information will have to be supplied about the content of the data categories of the terminology management systems involved in the exchange process, so that interchange can be implemented without data loss and corruption. Even with the implementation of the MARTIF standard, "blind interchange" - without details of the content and meaning of the data categories of the terminology management systems involved in the exchange process - is not likely to be friction-free between all systems. There is therefore a need for extensive tests with MARTIF, in which terminological data is exchanged between a wide variety of heterogeneous systems, to indicate what needs to be defined and/or standardised above and beyond the MARTIF standard.

Another area where a solution is urgently required is that of user-controlled terminology interchange processes. As it stands, the specification of the MARTIF standard is all-embracing, but the level of detail available could make its application less attractive to users of less complex terminology management systems wishing to populate and exchange termbanks with a relatively simple structure. ISO 12200 is addressed to software developers or system engineers, and database managers with advanced computer know-how, and ISO 12620 is designed for end users wishing to structure their own database with categories of terminological information, in the hope that others will use the same, or at least compatible categories. There is now growing pressure in the marketplace for the implementation of a less complex tool, more readily understandable to average users without comprehensive knowledge of computing and encoding, to allow the exchange of terminology data across systems and platforms. A less comprehensive protocol ("MARTIF Light") with a limited number of standardised entry fields, and with integrated compatibility with HTML, would overcome these problems. As a subset of the MARTIF standard, it would not in itself be a standard, but rather a MARTIF-compliant set of tools available to end users. In particular, it would encourage greater interchange of terminological resources between users (in particular translators) employing various low-level terminology management and glossary systems. Such a tool - or set of tools - could be developed quickly and at very little cost by an ad-hoc working party, including users and application developers, which would guarantee MARTIF conformity in its application.

Concept-oriented models

As far as the problems associated with data export from legacy termbases are concerned, a proposed concept-oriented model could provide a potential solution. The current situation can certainly not be changed retroactively, however it can be improved step-by-step on a medium- to long-term basis.

All of those involved should examine the need for increased awareness of the need for content-based representations of concepts and concepts systems. Large-scale integration of many heterogeneous TRs can only be accomplished on a conceptual basis. The introduction of standard ordering systems for all purposes is unrealistic. Rather, solutions must endeavour to provide access to the contents of TRs despite their diverse and partially inconsistent ordering systems. Hypermedia can be used to relate the ordering systems of TRs by means of knowledge-based intellectual tools and systems. It must be determined to what extent technologies and systems for analysing terminology-relevant text corpora can be used for automatic classification of terminology records and groups of records on the basis of special-language texts corresponding to the subject fields covered to the TR.

In this context, all research and development projects should be reviewed which deal with methods of representing and relating items of knowledge, automatically analysing the contents of terminology databases and individual records, conceptually-oriented retrieval, equivalence analyses, semantic analyses of phrasal units, chaotic ordering systems and knowledge-based systems. The proposed European Association for Terminology (EAFT) would be an excellent platform for fostering co-operation and openness in this area to achieve workable, cross-system solutions. Together with ELRA and other interested bodies, it should design a medium-term research product to develop tools and methodologies to achieve workable concept-oriented prototypes.

Recommendations

STANDARDISATION


De jure standards

Present situation


General Remarks

The standardisation of terminology is often crucial to many domains and levels of communication (particularly scientific, technical, medical, legal and expert-to-expert), but is not necessarily appropriate, or even feasible elsewhere (e.g. social sciences). The vertical layering of terminology according to the level of communication is also an integral part of the structure of Languages for Special Purposes (LSPs).

In general, there are descriptive approaches and prescriptive approaches in terminology work (cf. 4.1: "Terminology Resources"). In both cases, a more or less strict methodology can be followed. Although the results may often look quite similar, the purpose and objectives for descriptive and prescriptive terminology work may differ considerably - not to mention the degree of authoritativeness of the outcome.

There are several kinds of prescriptive terminology which have more or less legally effective (formal or informal) authority. Legally prescribed determinations have inherent de jure authority, although they often deviate considerably from the terminology usage of the experts in the respective subject field. Standardised terminologies can have different authoritative status, depending on the field of standardisation in which they occur. If they are from a field of "basic standards" they only represent strong recommendations. If they are from "normal" technical standards they also represent the scientific-technical state-of-the-art which is considered as the authoritative level immediately below legal regulations. Some standards, however, are taken over into European or national law and then acquire the nature of statutory provisions. For the sake of completeness it has to be mentioned that there are terminological determinations at all levels of law and in technical regulations of all sorts and kinds.

Standardised terminologies (i.e. standardised systems of concepts to which standardised terms or graphical symbols are assigned) are created in many working groups (WGs) or sub-committees at the national, European and international level. An estimate by DIN, the German Standards Institute(5), of the amount of terminology work by standardisation bodies arrives at a figure of

Standards on terminological principles and methods developed by ISO/TC 37 "Terminology (principles and co-ordination)" constitute the theoretical and methodological basis of terminology standardisation at international level.

The majority of standardised terminologies are contained in subject standards. Only a minority of standardised terminologies are contained in standards exclusively devoted to terminology.

Standardised terminology is a rich source of terminological information and is still frequently underestimated as a useful tool for extracting individual statements in standards. An estimate of terminological entries in terminology standards shows that as a rule about 30-50% of all standardised terminology is contained in terminology standards, while the majority of the rest is contained in subject standards. In some countries substantial volumes of quasi-standardised terminology is contained in "technical rules" (or technical regulations) issued by authorities (other than the official standards bodies) whose authority can be equivalent to or - from a legal point of view - even higher than that of standards institutes which are members of ISO (International Organisation for Standardisation). A more detailed description of the standardisation activities of ISO is contained in Chapter 3.4.

The process of harmonisation takes place after parallel standardisation work has already been done, either on the national level in several countries, or within several disciplines of science and branches of industry and commerce. Thus, harmonisation is usually carried out by international organisations or by special harmonising committees that are set up by several institutions established precisely for this purpose. In order to harmonise such terminologies, their conceptual structures must first be made explicit and compared to each other. After identifying the differences, they can be levelled out in a co-ordinated and consensus-based way, by adapting each concept system to the other.

Terminology harmonisation may have varying legal status, ranging from international, regional, national or local binding regulations (e.g. in the case of hazardous waste terminology) to mere recommendations without any legal implication.

Terminology standardisation can be viewed from a number of perspectives, which not only demonstrate dependencies with each other, but also with standardisation in general:

Usage of Standardised Terminologies

In many cases, official standards such as those issued by ISO, the United Nations, the European Community and the national standardisation bodies(6) are most frequently and systematically used by the user groups most closely associated with their authorship.

Particularly in those sectors requiring an extremely high degree of conformity and reproducibility for reasons of safety, functionality and product liability, for example mechanical, electrical and nuclear engineering, the use of nationally or internationally prescribed terminology is generally required in all enterprise processes. Although such terminology may differ to a certain extent from preferred in-house nomenclatures, it also facilitates supplier communication and allows the preparation of bids, tenders and bills of material which are readily understood outside the organisation. This is particularly evident, for instance, at European level, where legislation requires most public-sector tenders above a certain value to be published (in the "Tenders Electronic Daily" system - TED) on an open market. If tendering agencies and bidders were unable to speak a "common language" referring to the terminologies standardised in national and international standards, the market mechanisms would be ineffective.

The importance of standardised terminologies for master parts data management and MRPII (Manufacturing Resource Planning), with consequent implications, for instance for technical writing, has also been well documented(7).

In addition, more recent developments are changing the way in which users work together to create standards - and associated standardised terminology - with widespread acceptance. One example of this is "STEP", the STandard for Exchange of Product model data, now published as "ISO 10303, Industrial automation systems - Product data representation and exchange."(8) STEP is essentially a common language for computer interchange of product-related information, and it appears that the results of the standardisation work are being used to influence terminology creation itself.

Problems


There seem to be four categories of problems: cost, accessibility, co-ordination, and a gap with the "real world".

Cost and Accessibility

Co-ordination

Gap with "Real World"

Solutions


State-of-the-art methodology for terminology work, terminology project management and terminological tools introduced into prescriptive terminology work would considerably improve the quality level as well as (re)usability of existing standardised and other authoritative terminologies in the short to medium term.

Co-operation and co-ordination between standardising terminology WGs and standards bodies at international, European and national levels would not only save human resources, but also result in improvements to the quality of existing and future terminologies in the medium to long term.

In principle, national standards bodies could take over (or at least consider) standardised terminology from sister organisations. But this would require

Requirements for making authoritative terminologies multifunctional, i.e. (re-)usable for multiple purposes, could lead - whenever possible - to "default values" in terminological tools that can be applied in the preparation of authoritative terminologies.

Finalised legal terminology at European and national level which is supplemented by references to the respective legislations and jurisdictions, and thereafter made available via networks to the general public under as few restrictions as possible, would greatly improve the situation of legal uncertainty felt both by legal experts and the general public in Europe.

There is a requirement for the development and preparation of practical guidelines on terminological principles and methods designed to meet specific needs (e.g. quality management systems).

The preparation of new standards in the field of terminology (e.g. on criteria for terminology validation, computer-assisted co-operative and distributed terminology work) should be encouraged.

The adoption of a common standard for the storage and interchange of lexical terminology data should be promoted (cf. 5.1: "Terminology Management and Extraction Tools").

De facto standards

Present situation


In a large number of sectors, de facto standards, many of them well-known as "industry standards", enjoy much broader application and usage than de jure standards. In many cases, this is because de jure standards do not cover the area involved, or because suppliers and users consider the prescriptive standards to be inadequate or deficient. Many de factor standards start life as in-house standards at a particular enterprise, and then spread rapidly throughout the relevant industry. A topical example of this is "Java", the cross-platform programming language developed by Sun Microsystems for the World Wide Web and which at the time of writing (1995/96) is rapidly establishing itself as the standard programming language for Internet and intranet applications. De facto standards may also be amended and adopted by standardisation bodies, e.g. the ISO Open Systems Interconnection (OSI) 7-layer reference model, which was derived from IBM's proprietary Systems Network Architecture (SNA).

Other examples of de facto sector standards include:

Cross-border payments

Data on a comprehensive range of international financial transactions and services, including customer and financial institution transfers, foreign exchange and securities transactions, documentary credits and guarantees is transmitted between financial institutions world-wide using the harmonised message types (MTs) adopted by SWIFT, the Society for World-wide Interbank Financial Telecommunication. The terminology defined for these messaging standards has been standardised (in English) throughout the global financial services industry, facilitating telecommunications and largely eliminating scope for errors and misinterpretation, a particularly important factor given the crucial importance of such transactions to the global economy.

EDIFACT

Although EDIFACT (Electronic Data Interchange For Administration, Commerce and Transport) itself is an internationally standardised set of rules and directories (ISO 9735, ISO 7372 and other directories and guidelines) for Electronic Data Interchange (EDI), an increasing number of sectoral and cross-sectoral industrial user groups, including banks, the insurance industry, wholesalers, textiles and many others, are developing and implementing additional directories to cover specific requirements.

Enterprise processes

The growing world-wide success of R/3, the client-server supply chain management system developed by Germany's SAP AG, has prompted many other software vendors to tailor their applications for seamless integration into R/3. One of the consequences of this has been the need to ensure that the terminology employed in these third-party systems is R/3-compliant. SAP itself continues to invest heavily in terminology creation and management, and approved third-party vendors are encouraged, or even required, to adopt the terminology developed by SAP for their own applications.

Although the examples given above are of very limited scope, it is evident in general terms that those industries and sectors relying to the heaviest extent on de facto standardised terminology are those which depend on networked information interchange. The rapid pace of change in these sectors, and the associated need to produce reliable, up-to-date terminology in the shortest possible time, are the prime factors driving the development of "industry standard" terminologies. The lead time on the development of de jure terminology by formal standardisation bodies is often years, or even decades. Such processes are ill-equipped to deal with the fast-moving technological changes affecting many sectors of the economy.

Some corporations (such as the aircraft, automobile, software and pharmaceutical industries) have also been using controlled language to standardise and increase the effectiveness of document production. The following factors have led to this development:

The recognised advantages of using controlled language are increased quality and reduced cost:

Problems


Most of the potential problems associated with the development of de facto standardised terminology are generic, being related to the methodologies applied and the means of disseminating the final product. The problems relate principally to the quality and consistency of the terminology:

Solutions


The solutions to the problems outlined above are also generic, in that they are related to many other aspects of terminology resources and terminology work covered in this document.

As part of the awareness and education programmes put forward in this document, the emerging terminology services industry, with the backing and support of the proposed European Association For Terminology (EAFT), should be in a position to highlight the problems involved and broadcast the message that solutions are, or soon will be, available to eliminate many problem areas.

Recommendations

The overall conclusions to be drawn from this analysis are:

(duly taking into account aspects of reusability and multiple purpose aspects) under contract to the European Commission.

QUALITY AND VALIDATION OF TERMINOLOGICAL RESOURCES


Present Situation and Problems

Quality


General

Quality is emerging as one of the most important issues to be addressed by the new European terminology infrastructure. Firstly, the quality of resources which could be made available in a future terminological infrastructure is seen by potential users as a knock-out criterion for participation. Only if customers are able to access terminology which is not only accurate but which has been created, documented and stored according to state-of-the-art principles, will they be prepared to join and continue as "members". In addition, of course, the quality of the (particularly multilingual) resources made available influence the quality of the products and services to which they contribute. This takes on particular significance given the general and ongoing preoccupation with quality assurance and management, and with conformance to product liability legislation (cf. Chap. 2.3: "The Legal Framework").

In practice, two types of quality are relevant to terminology, both of which need to be addressed if high-quality resources are to be made available:

In classic quality management teaching, the former category is concerned with meeting the "stated or implied needs"(9) of the customer (as opposed to any concept of innate excellence). This is an important and often underestimated point in relation to terminology, which is normally produced for a specific purpose and sought for another (not always identical) one. The second category, on the other hand, is concerned with increasing the reproducibility and repeatability of the production process, i.e. getting the same results twice either sequentially or in parallel.

If these two types of quality are described on a matrix in which both axes vary from high to low, the result is four cells, each of which represents a certain characteristic (Fig. 5).

Figure 5 : The Quality Matrix (cf. [Fry 95])

In practice, the quality of existing terminology resources is, on the whole, very low throughout Europe, and therefore must be allotted a place in the left-hand side of the matrix. Of course, the precise position on the left-hand axis will vary according to the domain and language competence of the creators in question, and the extent to which they meet the needs and expectations of their users. What is clear, however, is that there is a widespread lack of training in formal methodologies for terminology production and quality. This applies - with notable exceptions, of course - both to monolingual terminology and, to an even greater extent, to multilingual terminology, the main focus of the POINTER Project.

Multilingual Resources

A particular problem is posed by multilingual and bilingual resources, which represent both a pressing need for the emerging European society and a particularly resource-intensive investment (adding another language to a bilingual collection trebles rather than doubles the effort involved, for example, since equivalents need to be researched and added for each language). Thus although there are a huge number of terminological dictionaries, databases and lists of equivalents available for major subject fields (e.g. economics and finance), it is extremely difficult to find first-rate or even high-quality collections as most of them are incomplete or take a lexicographic approach. In addition, they are generally not reusable, since there are as many approaches to the selection and presentation of data as there are terminologists or authors.

The first step towards objectifying any judgement on the quality of terminological resources is to perform a sound analysis of the methodologies employed in creation, the expertise of the authors and the completeness of the information. For this reason, an analysis of more than 200 existing collections in selected subject fields was made as part of the POINTER Project. The results show clearly that a lot of work still needs to be done in order to ensure that resources being produced comply with the minimum quality required in the field of terminology and, more precisely, terminography.

The study made no attempt to evaluate the (domain) expertise of the authors, but rather assumed that they had the competence necessary to perform their task (compiling collections or dictionaries from existing data). The only way to proceed, and one which proved extremely significant and effective in practice, was to analyse the indirect quality of their work. This complex analysis detected many faults and a general lack of reliability, with not even the most self-evident and basic methods of processing terminology being present or, alternatively, adhered to - a point which is shown clearly in the Quality Matrix in Appendix 6. In addition, the representative nature and size of the sample allows the results to be extrapolated to other existing resources.

In this context, we should point out that in Canada, the Office de la Langue Française (OLF) in Quebec and the Canadian federal government's Office of Translations are facing the same quality problems in a "merely" bilingual environment, despite the consistently large amounts of money and human resources they have dedicated to this work. Despite these problems, however, they can still be considered as the major producers of high-quality bilingual terminological resources.

Efforts in Europe are greatly complicated by the scale of multilingualism. Despite the huge amount of multilingual data available in EURODICAUTOM, there is room for improvement in quality in some areas (cf. Chap. 3.3: "European Aspects of Terminology Work").

Another aspect that needs to be taken into account concerns the terminology produced by companies and translation offices across Europe. At first sight, one might consider that the terminology produced internally by a company will always be of good quality. Unfortunately, this is rarely the case, because of the lack of a clear methodology and the poor integration of domain experts. In many cases, translators and technical writers merely compile simple lists of equivalents. As a general rule, though, the source language can be taken to be sufficiently accurate. In the case of the majority of translation agencies, the terminology they produce (mostly for their clients) is generally not reliable. In addition, few agencies are willing to create terminology on a systematic basis, since this is seen as an additional cost factor to be avoided in a highly competitive environment.

To the extent that terminology management systems are used, they are too often considered as providing an automatic solution both as regards terminology management and as regards any data that might be populating them. In fact, the general lack of good terminologies sold together with such tools is an indication of vendors' unwillingness to provide unverified data. In addition, a number of these systems are not actually intended for use in terminology creation, while still others allow great freedom in the way they are used, thus relying heavily on users' (often non-existent) methodological expertise.

Validation


The generally low level of both output and process quality in terminology has important implications for the next point that needs to be addressed - validation. This is a formalised process (and sometimes) output control mechanism applied to existing resources. In the case of terminology, validation would mean, for example, checking that resources have been created in accordance with a predefined standard for terminology work or state-of-the-art methodological principles, or preferably both. Alternatively, their fitness for a specific purpose could be evaluated according to a predefined set of criteria. In present general practice, though, no standard formats are used to create resources and there is frequently no technical or linguistic information with which to identify sources and authors. In many cases, it is even difficult to determine the source language of a collection! The different practices and approaches used make the validation of resources - and multilingual resources in particular - more complicated and, in practice, often impossible. Where it is performed it is very costly - indeed, in some cases, it is more expensive than the creation of the data itself.

Solutions

Given the importance of multilingual terminology for Europe, standard practices and strict methodologies for the creation of terminological resources have to be established and widely publicised in order to lower costs and provide a basis for future reusability and compatibility. The aim at this level should be to encourage quality awareness and adherence to standards by terminology creators. In addition, there is a need for the creation and implementation of uniform, EU-wide validation standards and procedures for terminological resources ("European label"), and for the design and implementation of procedures for establishing a European network of certification centres to validate terminology and award the European label. In both cases, these standards and procedures should be based on the initial work on a quality matrix which was developed as part of POINTER, on the INTERVAL project, and on considerations common to all three colleges of ELRA.

Resource Identification and Awareness


Many inventories of existing terminological resources are already available, but these are not widely accessible and do not include any qualitative evaluation. It will be possible to identify clearly and to catalogue terminological resources which are potentially of a sufficiently high quality for redistribution, and hence to improve the awareness of all users and producers. This work should be user- and domain-oriented, in order to provide a rapid response to market needs.

Adding Value


After potentially high-quality terminological resources have been identified, it should be possible for terminology creators and owners to add value to them, e.g. by merging identical concepts extracted from different sources, by facilitating linguistic and textual evaluation (e.g. by domain and language experts, and by comparison with terms extracted from text corpora), and by enriching the data (e.g. by adding linguistic elements in order to ensure easy reusability).

This process must be supported by the development or adaptation of tools allowing the automatic, indirect evaluation of terminological resources, and their subsequent validation. Furthermore, since the validation of resources obtained from original texts is a key aspect, both for achieving quality and for demonstrating the utilisation and the correctness of the concepts extracted, there is a demand for the development and/or further improvement of tools for text analysis and extraction in all European languages. Although some systems now offer at least limited compound recognition functions, more work is needed on this area. Another area that must be addressed is extending the range of languages covered by such systems to take advantage of the multilingual corpora (e.g. European legislation) already available. In future, such systems must be able to analyse texts in a particular domain in a number of languages at once.

The quality assurance and evaluation models and methodologies developed must be widely distributed to creators (including private companies, universities, standards organisations and international bodies), in order to ensure their application in practice.

If these conditions are fulfilled, it will be possible, over time, to:

Recommendations

(10)


1. See: [BdM]

2. It should be stated, though, that such "passive" sources are increasingly used for terminology extraction.

3. Available at http://www.cs.columbia.edu/~acl/home.html and http://www.cs.columbia.edu/~radev/cgi-bin/universe.cgi

4. Available at a number of sites, including the University of Saarbrücken at http://www.de.relator.research.ec.org

5. Verbal information provided by DIN

6. E.g. AFNOR (Association Française de Normalisation), BSI (British Standards Institution), DIN (German Institute for Standardisation), ON (Austrian Standards Institute).

7. e.g.: [Düs/Toft]

8. See: [STEP]

9. [ISO 8402]

10. See also Section 1: "Terminology Resources" and Chap. 6.2 "The European Terminology Infrastructure - Main Recommendations".