Virtual Corpus Manager

The Virtual Corpus Manager (VCM)is a corpus management tool that implements a novel approach for the storage and retrieval of texts, efficient navigation of the corpus, and strict integrity and security checks.

VCM implements a text typology, whereby texts are categorised on the basis of a set of 'pragmatic attributes' that are divided into seven broad categories: text, authorship, publication, language, domain, copyright status and related texts.

The values of these pragmatic attributes are stored in the VCM database, and the text file is stored in the corpus directory. It allows you to organise texts hierarchically in the available corpus based on a user-defined profile. You can modify this hierarchical organisation of texts in the corpus as required, enabling re-use of a single corpus for multiple purposes.


Virtual Corpus Manager window


Virtual Corpus Manager Tutorial

Open the Virtual Corpus Manager by clicking on the Virtual Corpus icon in System Quirk's main menu. The following window will appear:


Main window Virtual Corpus Manager

In order to view/edit a profile, double click on the ML Terminology folder, which is the default profile.

Virtual Corpus Manager window with default profile

In the default profile, languages are at the top level. When you double click on 'ne', for example, four domains appear.

Virtual Corpus Manager Window

By double clicking on one of the domains six types of texts appear.

Virtual Corpus Manager window with default profile

When you select one of them by double clicking, one or several texts will automatically appear under Title.

Virtual Corpus Manager window with default profile

You can view the titles of the texts at any level by pressing the Collect Texts button. When you go further down, the amount of texts is generally reduced. If you press Collect Texts before you have clicked on any of the folders and all you see is a yellow folder saying ML Terminology or New Profile, you will see all the texts that are in that profile.

You can close a folder by double clicking on it. You can then open another one.

To change the way in which the corpus is organised press the New Profile button.

Then, New Profile dialog boxes will come up.

First, you press OK to delete the current profile and then press Yesto use Profile Wizard.

New Profile dialog boxes

The Profile Wizard helps you to create a new profile by giving you several options. You click on one of the available attributes and press add>>. The attribute you have chosen will then appear under profile. You can enter as many attributes as you want and you can also remove them by pressing <<remove. After you have chosen the attributes you want, you press OK for the profile to take effect.

Profile Wizard dialog box with new profile

The Virtual Corpus Manager window will then come up automatically. By double clicking on the folders you can see how the texts are organised in the new profile.

Virtual Corpus Manager window with new profile

As you can see, the new profile now contains the gender and nationality of the author and the domains. You can now try to enter other profiles yourself.

You can view the profile by clicking on Profile (Next to File in the main VCM window) and selecting View. The Corpus Profile dialog box will appear. You then press save as and the Save As window will appear. You type the name of your profile under File Name. The example here is called "author.vcp" (Virtual Corpus Profile) because the profile contains the gender and nationality of the author (and the domain).

Save As dialog box

If you want to use the profile you saved in a later stage you press the Open Profile button and a window saying 'delete current profile' appears. Press OK. A dialog box similar to the one above will appear. You double click on the profile you want to use and then you press OK. The profile you selected will then be the current profile.

In order to enter a new text, press the New Text button in the main VCM window.

Select Text dialog box

Select the demo.txt and press OK. You can also enter your own texts.

VCM will then come up with a Text Header dialog box. You can change the General into Admin and Author by clicking on the words at the top of the window.

General Text Header dialog box

Admin Text Header dialog box

In Admin enter title of text in Textand enter the date, then press OK.

Author Text Header dialog box

First, you type in all the information about the text you want to enter. The text is then entered into the virtual corpus.

In order to view it, click Collect Textsand select the one you want to see by highlighting it.

Virtual Corpus Manager window with 'collect texts' output

Then click on View text in the main VCM window and the text you selected will appear.

File Viewer dialog box with demo text


Edit Map

The Corpus Profile dialog box allows you to edit the allowed values of attributes used by the Virtual Corpus Manager.

Attributes shows the attributes which are used by the Virtual Corpus Manager profiles, when you click in the profile hierarchy.

Values shows the allowed values for the selected attribute. You can add new Values using the Addbutton.

Corpus Profile dialog box


Profiles

Profiles are the description of attributes by which the texts are arranged, e.g., language, domain, etc. The default profile supplied with the Virtual Corpus Manager is organised by language, domain, definition, etc.

Open Profile uses the standard Windows Open File window for the selection and loading of a pre-defined profile.

You can use the Profile Wizard to create a New Profile. If declined, an empty new profile is created and you are taken to the Edit Profile dialog.


Profile Wizard

Allows you to add available attributes to the New Profile.

Available Attributes shows the attributes which can be added to the current profile. These are the fields in the headers of the texts currently available in the corpus.

The Add button adds the selected attribute in the Available Attributeslist to the current profile and the Removebutton removes the selected attribute from the Profile section.

The Profile section shows the attributes selected for the current profile.

New Profile Wizard dialog box

Profile Wizard dialog box


View/Edit Profile


Allows you to edit attributes and values associated with each level of a profile.

The Profile Hierarchy section shows the levels (or branches) of the profile which have been defined. You can add branches below the selected level of the hierarchy pressing the Addbutton, which asks for a name for the new branch. The Delete button removes the selected branch from the Profile Hierarchy. Copy and Paste can be used to insert selected branches.

The attributes and values of attributes associated with each branch are displayed to the right of the Profile Hierarchy section in the Filters section. You can change the Filters using the Editbutton which opens a window with attributes and you can select their values for the named branch.

You can save the profile using the Savebutton or you can store it as a new profile with theSave As button. The Set as Default button makes the current profile the default for further sessions.

Corpus Profile dialog box


Corpus

Collect Texts lists all text titles and their unique identifiers, matching the selected level of the current profile.

Select Fields allows you to choose which fields should be displayed in the main Virtual Corpus Manager window.

All selects all fields from the list, None deselects all fields.

The following fields may be selected for display:
 
Field Description
language Language of the text
domain Domain of the text
text type Type of text (e.g., newspaper, journal)
word count Number of words in the text
char count Number of characters in the text
path name Full path name of the file containing the text
entry date Date the text was entered
copyright status Whether copyright has been granted
terminologist Name of the terminologist who added the text
author name Name of the author
author gender Gender of the author
author nationality Nationality of the author

Select Fields dialog box


Texts

A Text is any text file. These can range from newspaper articles to journals to advertisements.

New Text allows you to add a new text to the corpus via a standard Windows Open File dialog box.

Edit Header allows you to edit the header information for the text which is stored in the database. The editing is via a multiple-page dialog box with pages for Author, Admin and General.

Remove Text removes the selected text from the corpus.

View Text allows you to view the selected text.



 

Text Header Fields

Allows you to edit the header information for the text which is stored in the corpus. The editing is via a multiple-page dialog box with pages for Author, Admin and General.

The Author fields are:
 
Field Description
author name Name of the author
author gender Gender of the author
author nationality Nationality of the author

Author Text Header dialog box

The Admin fields are:
 
Field Description
entry date Date the text was entered
copyright status Whether copyright has been granted
terminologist Name of the terminologist who added the text

Admin Text Header dialog box

The General fields are:
 
Field Description
title The title of the text
language Language of the text
domain Domain of the text
text type Type of text (e.g., newspaper, journal)
word count Number of words in the text
char count Number of characters in the text
path name Full path name of the file containing the text

General Text Header dialog box