KonText (Knowledge on Text) is a text analysis program that allows you to select one or more text files from a corpus, and to view, edit, print or analyse the text file(s) by performing tasks such as Key Word In Context (KWIC), Weirdness, wordlists, and indexes within user defined constraints. KonText can scan texts, looking for specified patterns, consolidating and collating the results and display these in a variety of formats. KonText can operate on texts in a variety of different languages and is extendible through add-on services.
Commands available from the KonText window are:
Source Documents - Selection allows you to select files for subsequent text processing.
Search Constraints allows you to include or exclude certain words from the search tasks.
Options allows you to change the output of the search tasks.
Task allows you to choose any one of the search tasks.
Qclip allows you to save certain (parts of) texts.
Start starts the selected search task.


When you have pressed OK you will return automatically to the main KonText window.
If you press the Search Constraints button in the main KonText window, the Set Constraints window will appear. You can now enter text patterns to search for. You can type them in the Include dialog box, each followed by [RETURN].

From this dialog, press the lowermost Lists button. Another dialog will appear showing the list of available, pre-defined text patterns for English; select the 'closed.en' patterns by clicking on 'closed.en' and then pressing OK or [ENTER].

The Exclude dialog box should now have a list of text patterns, each pattern on a separate line. Press OK for these patterns to take effect.

The Set Constraints dialog box with the selected words from the list
Now, one or more text files need to be selected for scanning. The easiest way to do this is to press Selection in the main KonText window, this will open a file dialog. Select the file named 'demo.txt' by clicking on "demo.txt" and then pressing the OK button.

The Select Source Texts window will then appear. Click on the text you want to select and press OK.

All that remains is to select the task to perform. You can do this by selecting the required task from the pull down list of Task. Then press the Start button in the main KonText window.
Concordance output includes the frequency of the matched patterns, the context in which they were found (KWIC list) and line reference. Before starting this task, a dialog box will appear, asking for the "width". When you enter, for example, 5, five words will appear on either side of the matched pattern. These words form the context.

Index output includes the matched patterns, their frequency and line references. If Index was the selected task, these would be the results:

Wordlist output includes the matched patterns and their frequency.
Weirdness output includes the matched patterns, their frequency and the relative frequencies of those words compared to general language corpora.

Selecting Index is a good first step. After that, you can try the others yourself and see what happens.
The results of the tasks are shown in a results window:

You can save the results as a whole by pressing Save to file or copy the results to a Qclip (= clipboard). You can also see where some words are in the source text by highlighting the line reference you want to see. You then press Show source.

The source documents (i.e. texts) to be analysed by KonText may be collected in two ways:

When you have selected Virtual Corpus from the pull down menu of source documents in the main Virtual Corpus window and pressed selection, the Virtual Corpus Browser will appear.

NOTE: Virtual Corpus Browser is different from the Virtual Corpus Manager. Virtual Corpus Browser allows text selection and viewing only.
The Search Constraints allows you to include or exclude certain words from the output of KWIC, wordlists, indexes, weirdness and collocations with the direct use of pre-stored lexica in search patterns. The lexica are created with Save List from the Set Constraints window.
Fuzzy searches are possible through the use of Wildcards.
If the Collocation check-box is
checked, the items in the include
list can be used to identify collocation patterns. In other words, if the
Collocation check-box is checked, the
result window will show the output. If you don't check it, there will be
no output whatsoever. If you check the box it will also find capital letters,
otherwise it will not. The items should include the following symbols for
use in collocations:
| Symbol | Meaning | Example Use | Example Output |
| ^ | Any number of words | the ^ of | the book of
the fastest car of the first and the last of |
| ^~n^ | Maximum n words | in ^~2^ of | in the event of
in the first of |
The Include List allows you to specify words, phrases or other strings to be searched for in the selected text(s). It is possible to specify any number of words manually by typing them into the list (each followed by [RETURN] ), copying and pasting from another program, or by inclusion of previously created lexica. The resulting lexica may be stored in the KonText library. Wildcards may be used to broaden the search.
The Lists button allows you to add a pre-defined lexicon (generated by Save List) into the current list. The Save List button enables the words and patterns in the current list to be saved to create a new lexica for future inclusion by the Lists button. You are prompted for a name to identify the lexica before saving. Pressing the Reset button removes the current Include lexicon.
The Exclude List provides complementary functionality.
The Exclude List allows you to specify words, phrases or other strings to be excluded from output when scanning the selected text(s). You can specify any number of words manually by typing them into the list, copying and pasting from another program, or by inclusion of previously created lexica. The resulting lexica may be stored in the KonText library. Wildcards may be used to broaden the search.

This facility can be used in conjunction with the Include List for example, using the include patterns to select a general pattern, with specific exclusions given in the exclude patterns.
The lists button provides a file selection dialog for inclusion of existing lexica.

Collocation
You can specify word patterns that will be searched for in the text by checking the Collocation check box in the Set Constraints window. A pattern may consist of any number of words, including single words. You add words to a pattern by entering them in the text field in the Include or Exclude window (separating each word by ^ which stands for "any number of words"). You can use compounds in a pattern by entering the constituent words, separated by space or hyphens.
For some collocation analysis it is important to be able to limit the number of words between elements in a collocation pattern. You can do this by using the tilde (~) and a number which expresses the maximum allowable number of words between the match. The asterisk includes go, going, gone etc, whereas without the * it would only find 'go'. For example:
will find "to boldly go", but not "to think before you go"
The dialog box below gives an example of the use of collocation patterns.

This is the result of the KWIC search with the collocation patterns included:

Normally, matched collocations are reported together, ignoring the number of matched words between the actual matches. You can report the numbers of words found in a match by checking the keep collocation gaps check box in the Options dialog box.
Punctuation marks may also be used as components of word patterns. Each line is taken to be a different text pattern. The patterns are active until they are Reset. Lists allows you to use a set of pre-created patterns. By pressing Save List you can store the current patterns in a file of your choice. The OK button must be pressed for the patterns to take effect.
When you specify strings for the Include and Exclude lists in the Search Constraints, you can use the following wildcards.
| Wildcard | Description | Example | Matches |
| * | Any number of letters | comput* | compute, computes, computer, computers, computing |
| % | Any one letter | comput%%% | computers, computing |
| [ab] | Optional letters a,b or neither | compute[rs] | compute, computer, computes |
KonText can also work with compound words. Wherever a single word may be specified as a constraint, so may a compound. Compounds are taken to be any number of words that are entered with blank spaces between them, or words that are hyphenated. The words that make up a compound may also use wildcards. You can use compounds such as those below.
catalytic converter matches 'catalytic converter' or 'catalytic-converter'
cat*con* matches any two words together that start 'cat' and 'con'
cat*-con* has exactly the same effect as 'cat* con*'
catalytic %* finds two word compounds with 'catalytic'
You can also use wildcards within patterns in lexica. For further use, see Options.
The predefined tasks supplied with KonText are:
Index
KWIC
Wordlist
Weirdness

The task Index creates a list of all words in the vocabulary of the source documents (unless specific include or exclude lists are used) along with their frequency of occurrence and a line reference to each occurrence in the source document. If you check the Character XRef box in the Options window, you get a character reference instead of a line reference.
The diagram below shows the word count and the vocabulary at the top, followed by the source, which is the file name of the source text.
The output of the task shows a frequency match on the left and the line references on the right.

The task KWIC creates a list of all tokens - exemplars of the word, phrase or other string you selected - in the source document (unless specific include or exclude lists are used) and the token's surrounding text (the width of which you can specify). This gives the Key Word In Context (KWIC). You will also see a line reference to each occurrence in the source document, unless the Character XRef is checked; it is then a character reference.
The diagram below shows the word count and the vocabulary at the top, followed by the source, which is the file name of the source text.
The output of the task shows the line reference on the left and the context of the words in order of appearance.

The task Wordlist creates a list of all words in the vocabulary of the source document (unless specific include or exclude lists are used) along with their frequency of occurrence. The output is the same as that for the predefined Index task without any references to the occurrences in the text.
The diagram below shows the word count and the vocabulary at the top, followed by the source, which is the file name of the source text.
The output of the task shows the frequency match on the left and the words of the text on the right.

The task Weirdness creates a statistical approach to terminology extraction from text by comparing the relative frequencies of words that occur in specialist texts with their relative frequencies in general language corpora.
The diagram below shows the word count and the vocabulary, followed by the source, which is the file name of the source text.
The output of the task shows a frequency match on the left, the words of the text next to that, the frequency ratio of the word in this text and on the right, the Weirdness of the words compared to frequency in the general language corpus.

Start Task
If you press Start Task, it runs the selected task on the source document(s). A progress bar shows how much of the task has been completed. When the task is complete, the results are displayed in the Results Window.
User File Selection
If User Files is selected from the pull-down menu under Source Documents in the main KonText window an Open File dialog box is displayed. This allows you to create a list of files to be used as source documents by KonText.

You select a text by highlighting it and press the OK button. You then see the following window:

The following commands are available:
Add displays the standard Windows Open File dialog box (see below) enabling the user to browse local or networked drives for files to be added to the list of source documents.
Drop removes the selected file(s) from the list of source documents.
Drop all removes all files from the list of documents.
View allows you to view the selected text.
Details displays file properties for the selected file(s) such as name, read/write status and size.
Help displays windows help file for the Select Source Text window.
When Virtual Corpus is selected from the pull-down menu under Source DocumentsSourceDocuments (see main KonText window) Virtual Corpus Browser is started allowing you to select files to be used as source documents by KonText.
The Virtual Corpus Browser has restricted functionality as compared to Virtual Corpus Manager. It allows text selection and viewing only, whereas in VCM you can also edit the organisation of the texts.
Virtual Corpus Browser dialog box
KonText has the following groups of options: Input; Keep; Sort; Output.
Input: you may specify whether KonText should consider source document input based on:
paragraph: (for texts generated from a text editor or word processors saved with line breaks) or
words: (a block of).
language: may be specified into English or German.
Punctuation,
Numbers,
Hyphens at end of line,
Hyphens in text,
Collocation gaps or
Letter case in the selected text(s).
Sort: enables the results to be sorted according to the:
Ending (alphabetically from the end of the match backwards) or
Left Context (alphabetically on the first word to the left of the match for each occurrence of the match in KWIC output).
Output: if you use multiple input files, these may be merged for output by checking Merge Files option. Use Tabs option: should be checked when you want the results to be used in a table or spreadsheet and when spaces between columns of output are to be replaced with tabs.
Character Xref: when checked, shows the character numbers rather than line references in the index output. Often Wildcards can be useful to specify punctuation as constraints to be processed.
Simply include the punctuation mark as if it were a word and check the punctuation check box in the keep group of the Options dialog box.
Comments may be placed anywhere in a pattern and are completely ignored by KonText. Comments must be contained between the starting characters /* and the terminating characters */.

Set Up allows you to change how System Quirk is configured
OK when pressed allows the selections you made to take effect.
Help displays Windows Help File for System Quirk.
The QClip button allows you to copy from the Results window to a Clipboard viewer. Whereas with the Windows Clipboard you would only be able to copy single items at a time, you can use QClip for copying multiple items to external editors or word processors.

Displays the result of the executed task in a scrollable window.
Close closes the window. Show Source opens the source document in a window and highlights the line referenced, if the results include references to the source document.
Save to File saves the contents of the Results window to a file named by the user.
If you don't enter a full path name, the current working directory is assumed.
Copy to QClip copies the selected text from the Results window into the Clipboard (Qclip). For each file processed you will see a header in the output giving the filename, word count and vocabulary for words that matched the processing constraints:
Vocabulary: indicates the number of discrete words matched in the text.
You can also use KonText to process texts that have been marked up with HTML/SGML. You can use HTML/SGML markers within the Include Words and Exclude Words constraints. Text between HTML/SGML markers can be included or excluded in the same way as words, using wildcards in the HTML/SGML markers as well. KonText will include or exclude text for a given HTML/SGML marker until its end marker is found. The following are some examples:
<h1> includes/excludes main header text
<h*> includes/excludes all header text (* = wildcard)
With this wildcard you can remove almost every HTML marker. When there are several words between the brackets you type as many asterisks (followed by a space) as there are words. For instance: <* * *>.