|
School of EEITM University of Surrey Guildford, Surrey GU2 5XH, UK |
Tel: +44 (0)1483 259823 Fax: +44 (0)1483 876051 |
Definitions
Grammar formalisms
Phrase Structure
Grammar:
Generative
Grammars
X-bar Grammars
Grammar
as knowledge representation
Generalised Phrase
Structure Grammars
References
Definitions
Let us recall these definitions
| Formally, a language is a set of sentences, where each
sentence is a string of one or more symbols (words) from
the vocabulary of the language.
A grammar is a finite, formal specification of the set of sentences in a language: Most interesting languages have an infinite number of sentences. Recognizer program. A language can be specified by writing a program which can input a series of words in a sentence and can then declare the input as being a sentence of the language or not. Production Grammar or Phrase Structure Grammar: A grammar which
enables a human (or a machine) to rewrite one sequence of symbols
into another.
|
Grammatical categories known to most competent speakers of English:
|
|
|
|
Adjective
|
big, little |
|
Adverb
|
slowly, beautifully |
|
Complementizer
|
that, which |
|
Determiner
|
the, this , a , an |
|
Noun
|
Johnny, Mary, book |
|
Preposition
|
to, in |
|
Pronoun
|
he, she, him, her, it |
|
Quantifier
|
all, every |
|
Verb
|
return, give |
|
Verb (Auxiliary)
|
will, have |
More Definitions:
Phrases of many different types:
| Phrase | A term used in grammatical analysis to refer to a single element of structure typically containing one or more words, and lacking the subject-predicate structure of typical clauses; a part of a structural hierarchy falling between a word and a clause. |
|
|
|
|
| Adjective Phrase (AdjP or AP) | A phrase exhibiting a distribution similar to that of a lexical adjective and semantically acting as a modifier of a noun or a noun phrase. | very big; proud of her achievements, more expensive than the previous one |
| Adverb Phrase (AdvP) | A phrase whose lexical head is an adverb | very lightly, right here |
| Noun Phrase (NP, aka nominal groups) | Regarded as one of the most important syntactic category, and one which appears to be present in all languages. An NP may be defined as any category which can bear some grammatical relation within a sentence, for instance, as subject and object (direct, indirect or oblique). Noun phrases contain nouns as their head. | the hotel in the city; Johnny, Sarah; the book |
| Prepositional
Phrase (PP) |
A phrase consisting of a preposition and a noun phrase acting as its object | in the garden; to the library; in front of the hotel |
| Verb phrase (VP) | There are two senses in which VP is used: First (I), a syntactic category consisting of a verb and its complements, and usually, its adjuncts. VP's generally function as a predicate. Second (II), a VP refers to a group of verbs: one main verb (or lexical verb) and others are subordinate to it (auxiliary verbs) | (I):bought her car;
(II): is coming; may be coming; get up to |
| Criteria for the design and choice of grammar formalisms for NLP:
1. Linguistic Naturalness 2. Mathematical Power 3. Computational Effectiveness |
1. Linguistic Naturalness
The notation for the formalism should allow and encourage to encode
their linguistic descriptions in a manner that is easy to understand and
modify. The notation should have some empathy with the predominant paradigms
in syntax analysis.
2. Mathematical Power
Notational restrictions on grammar formalisms can seriously limit the class of grammars that can be expressed. Conversely, minor changes in notation may render the formalism more powerful - capable of representing a larger class of grammars.
| A grammar which enables a human (or a machine) to rewrite one sequence of symbols into another. |
A Phrase Structure Grammar has four components:
|
|
the terminal vocabulary | the words (or symbols) of the language being defined |
|
|
the non-terminal vocabulary | the symbols which are used in specifying the grammar |
|
|
is the vocabulary of the language | the union of sets T and N |
|
|
a set of productions. | Each production is of the form a ® b, where a is a sequence of one or more symbols from V and b is a sequence of zero or more symbols |
Inadequacy of Phrase Structure Grammar
| Observation: English is neither
regular
nor
context
free
Some common constructions in English cannot be generated by PSG Even if more powerful grammars could be written the following "problems" will still persist: |
Problem Categories
conjunctions, aux. verbs, passives
Meaning and derivation trees
Apparently similar structures have different meanings
Apparently dissimilar structures have same meanings.
Thesis An utterance is characterized as the surface manifestation of a "deeper" structure representing the "meaning" of the sentence.
Generative Grammar: An adequate theory of a language like
English must be a statement of finite length which can
(a)account for the infinite number of possible sentences
(b) assign to each a structural description which captures
the underlying knowledge of a an idealized native speaker.
A formal system of rules is just such a statement - a device for
producing the sentences of a language.
A formal statement is a model of abstract knowledge and not a model
of human behaviour
A Generative Grammar is said to be concerned with competence, as opposed to performance
A comparison with phrase-structure grammar
Consider the following description of the verb phrase:
return her book to the library in the afternoon

X-bar Grammars
A comparison with phrase-structure grammar
The above structure indicates, perhaps, that it is unique to a verb phrase. A number of linguist argue that the same verb phrase can be represented with the help of a much richer structure: a structure that encompasses at least all four major phrasal categories.
The top node, VP, whose subordinate nodes are all and only the words that constitute the entire verb phrase.
In between the top node, VP, the maximal projection
, and the head node, V, there are a series of nodes,
,
v-bar.
Each
,connects exactly
one noun phrase or prepositional phrase to the main line between the head
node and the verb's maximal projection. Thus, the sentence tree is a binary
tree in which no node has more than two branches.
X-bar Grammars
Prepositional Phrases
Most prepositional phrases also exhibit the same structure as the
verb phrases: the prepositional phrase's maximal projection acting as the
superordinate node and a head node connected to the instance word. The
intermediate connector tying in a phrase from the right. Therefore, one
can use the same structural pattern as was used above for parsing a verb
phrase:

X-bar Grammars
Noun Phrases
The case of the noun phrases, at least superficially, appears different to the VP and PP in that many a noun phrases having nothing to their right and are preceded on the left by a determiner or a pronoun: like the or his or her. Words that appear on the right are called specifiers.
Consider the phrase the library in the city:

X-bar Grammars
Prepositional and Verb Phrases
However, for the structure of the NP to be similar to that of PP or
VP, NP's with branches to the right should exist, and, correspondingly,
verb phrases and prepositional phrases with branches to the left to exist.
Consider the following prepositional and the verb phrases:
PP = precisely at Noon;

and VP = all return their books.

X-bar Grammars
Some theoretical notes
X-bar grammar has been developed as an alternative to the traditional accounts of phrase-structure and lexical categories.
The argument here is that the rules of phrase structure grammar need to be more constrained and that phrasal categories need to be recognised. This is particularly relevant to the identification of the so-called intermediate categories that are larger than the noun but smaller than the phrase, like very fast or very fast car in the phrase the very fast car. These categories are easily identified formally in the X-bar syntax by a system of X-bars: Each bar, a superscript on the phrasal categories, identifies a level of phrasal expansion:
For example, the following binary tree shows two levels of phrasal expansion, double-bar, single bar and zero bar
X-bar Grammars
Inflection Phrase
| Phrase | Definition | Examples |
| Inflection Phrase (IP) | Inflection is the variation in form of a single
lexical item as required by its various grammatical roles.
An inflection phrase is like the lexical categories N, V, A and P, in that it is a zero-level category
|
Sarah will return ... Sarah returned .... |
Consider the following inflection phrases: Sarah will return her book

and Sarah returned her book
X-bar Grammars
Complementizer Phrase
| Phrase | Definition | Examples |
| Complementizer Phrase (CP) | A complementizer is a grammatical formative
which serves to mark a complement clause.
A complementizer phrase contains a complementizer as its head. |
In the NP, the report that John Smith died, the clause that John Smith died is a complement of the noun report. |
Consider the CP, that Sarah will return her book:
X-bar Grammars
The binary tree hypothesis states that a binary tree is the best kind
of tree for representing sentence structure at all levels. It can be shown
that a number of (English) verb phrases, noun phrases and prepositional
phrases, conform to the general pattern:

XP stands for the four principal phrases:
X stands for the four major heads:
Note that all phrases do not need to exhibit all the properties. However,
all phrases are instances of the X-bar schema. Words or phrases brought
from the right are called complements and those brought from the left are
called specifiers. The superordinate node of an X-bar tree is called the
maximal projection and intermediate nodes are denoted with a superscript
bar on them. The leaves are instances of the phrasal class.
X-bar Grammars
Maximal Projection, Head and a note on the binary tree representation
Maximal Projection: In the X-bar system the largest syntactic category which is formally related to some lexical category by the relation of projection , represented by the largest available value of the feature bar, and identified with the traditional (full) phrasal category associated with that lexical category.
Head: That element of a constituent which is syntactically central in that it is primarily responsible for the syntactic character of the constituent. Currently almost all constituents are generally regarded as projections of lexical heads.
A binary X-bar tree is a representation that is a tree, in which there
are three general types of nodes: X,
,
and XP. Each node has, at most, two branches and leaves are words.
In English, specifiers enter XP nodes from one side; complements enter
X nodes from the other. The direction from which specifiers and complements
enter is language specific. The kind of phrase that can serve as a specifier
for a particular kind of XP or complement for a particular kind of
depends on the occupant of the head position, X.
X-bar Grammars
Sentences as complementizer phrases
Sentences can be viewed as complementizer phrases, but most have empty
specifier and head positions:

X-bar Grammars
Noun Phrases, Noun Cases and Governors
Consider the variation in the use of pronouns with case in English that
can relate case to person and number:
|
|
|
||||
|
|
|
|
|||
|
|
|||||
|
|
|
|
|
||
|
|
|
|
|
||
|
|
|
|
|
||
|
|
|||||
|
|
|
|
|
||
|
|
|
|
|
||
|
|
|
|
|
||
The substitution of one pronoun by another is permissible in the following two sentences in that both are grammatically acceptable:
I gave them your book
X-bar Grammars
Noun Phrases, Noun Cases and Governors
Why one substitution is grammatically acceptable whilst the other is not?
The previous table shows that whilst one pronoun can generally be substituted
by another pronoun, it is the head of the noun phrase actually determines
the case of each pronoun. The case is determined by the governing head
and government is a property determined by the structure of the X-bar tree.
X-bar Grammars
Noun Phrases, Noun Cases and Governors
A binary X-bar representation.
| A binary X-bar tree is a representation that is
a tree in which:
there are three general types of nodes: X, X and XP. each node has at most two branches Leaves are words In English, specifiers enter XP nodes from one side; complements enter -X nodes from the other. The direction from which specifiers and complements enter is language specific. The kind of phrase that can serve as a specifier for a particular kind of XP or complement for a particular kind of -X depends on the occupant of the head position, X. |
X-bar Grammars
Noun Phrases, Noun Cases and Governors
The case determining constraint in terms of X-bar schemas for determining
a pronoun's case assignment.
| To determine a pronoun's case assignment,
Move upward from the pronoun until you arrive at either an IP node for which the head has tense information, or at an NP node, a PP node or a CP node. Announce that the pronoun is nominative if the head belongs to an IP node and carries tense information. Announce that the pronoun is accusative if the head is a preposition, verb or complementiser. Announce that the pronoun is genitive if the head is a noun. Otherwise, the pronoun cannot be assigned case, and the sentence is unacceptable. |
X-bar Grammars
Noun Phrases, Noun Cases and Governors
Pronouns
Consider the following sentence:

Grammar
as knowledge representation
| The study of grammar can be viewed as a branch of knowledge representation:
A grammar is a way of representing certain aspects of what we know about
a language that is explicit and formal enough to be understood
by a machine.
A language can be regarded as a set whose membership is precisely specifiable by rules. The set of compound linguistic expressions in a natural language is not finite and, hence these expressions cannot be listed. (As far as is known, no natural language is finite). |
| Formal systems are required such that they define the membership
of the infinite sets of linguistic expressions and assign a structure to
each member of these sets.
a) Formalisms that contains information about which (linguistic) objects combine together and what the properties of resultant objects are. and (b) Formalisms that transparently provide each legal string with an implicit structural description. [Transition networks use extraneous procedural information about (a) and ATNs necessitate the use of explicit structure building annotations for (b)] DECLARATIVE GRAMMAR FORMALISM satisfy criteria (a) and (b) without recourse to prescriptive and ad-hoc devices. |
Generalised
Phrase Structure Grammars
Generalised Phrase Structure Grammar (GPSG) is a theory of grammar developed in the 1980's.
The class of grammars permitted by GPSG is strongly equivalent to the context-free grammars.
GPSG attempts at characterising natural languages in terms of context free grammars while at the same time the theory does not make any claims at psychological reality.
GPSG is characterised by the separation of grammatical statements
into a metagrammar and an object grammar. The
metagrammar contains metarules and other generalisations
about the rules of grammar. The object grammar directly licenses local
subtrees.
Generalised Phrase Structure Grammars: (CF-PSG, Type 2)
Grammars should characterize the order of elements in a string as opposed to attempting to reconstruct some hypothesized underlying order.
Grammars discussed are declarative and generally based on a decomposition of syntactic categories into components known as features.
An RHS element may then be a category or a particular symbols of the language. When a CF-PSG rule specifies that an LHS category can be realised as a particular RHS , this realisation is deemed to be possible regardless of the context in which the LHS category appears. The rule does not make any restriction on the context in which this can happen - hence the term 'context free' (Gazdar and Mellish 1989:106).
The most popular framework for formalising CF-PSG, and indeed most
other NL grammars, is directed acyclic graph. - specifically the
tree structure.
Generalised Phrase Structure Grammars
Grammars used in computational linguistics employ the following:
(i) A representation for syntactic categories or parts of speech
(ii) A Data type for:
a. words [and hence a lexicon, wordlist etc.].
b. syntactic rules
c. syntactic structures
Three data types------->parser-------> Instantiated syntactic structures
[words/lexicon/wordlist] [syntactic rules]
- a language for representing lexical entries - a language in which to write rules - a language for exhibiting syntactic structures |
Definitions
Category (categorisation, categorical):
Categorisation in the field of grammar refers to the establishment of a set of classificatory units or properties used in the description of language. These units or properties have the same basic distribution and which occur as a structural unit throughout the language.
Category in some approaches refer to the classes themselves e.g.. NOUN, VERB, SUBJECT, PREDICATE, NP, UP, etc. Grammatical categories include N, V, VP, NP Grammatical functions or functional categories include SUBJECT, OBJECT, COMPLEMENT. A categorical rule is a rule which expands a category into other categories.
Feature
A term used in linguistics (and phonetics) to refer to any typical
or noticeable property of spoken or written language.
In Generative Grammatical Analysis the term is associated with the way in which words are classified in the lexicon in terms of their grammatical properties, such as
[countable]
In Generalised Phrase Structure Grammar grammatical categories are
defined in terms of feature specifications - ordered pairs containing
a feature and feature value which rules can access.
Generalised Phrase Structure Grammars: Exemplar grammars
Grammar:
Name;
Syntactic Rule-base
Lexical Data-base
EXAMPLE
Rule sentence formation
S-->NP, VP
Rule verb
VP-->V, NP
Rule verb
VP-->V
word Ken Clarke:
<cat(egory)>=NP
word StLukes:
<cat>=NP
word clients:
<cat>=NP
word died:
<cat>=V
word employed:
<cat>=V
Generalised Phrase Structure Grammars
Consider the following sentences:
(ii). Michael employed car-workers and car-dealers flourished.
The following schema introduces the co-ordinate construction and suggests
that a given category can consist of two further instances of the same
category separated by an item of category 'C', which will turn out
to be realized as 'and' or 'or'.
An example parser is given below:
Generalised Phrase Structure Grammars
Grammar: Name
Syntactic Rule-base
Lexical Data-base
EXAMPLE
GRAMMAR 2 (CF-PSG)
Rule {simple sentence formation}
S-->NP VP
Rule {intransitive verb}
VP-->V:
<V arg1> = 0
Rule {single complement verb}
VP-->V:
<V arg1> = <X cat>
Rule {co-ordination of identical categories}
X0 = X1 C X2:
<X0 cat> = <X1 cat>
<X0 cat> = <X2 cat>
<X0 arg1> = <X1 arg1>
<X0 arg1> = <X2 cat>
word Michael:
<cat(egory)>=NP
word car-workers:
<cat>=NP
word miners:
<cat>=NP
word car-dealers:
<cat>=NP
word flourished:
<cat>=V
<arg1>= 0
word employed:
<cat>=V
<arg1> = NP
word and:
<cat> = C
Generalised Phrase Structure Grammars
Grammar: Name
Syntactic Rule-base
Lexical Data-base
EXAMPLE
GRAMMAR 2 (CF-PSG)
The important points to be highlighted here that there are major grammatical categories, like S(entence), NP (Noun Phrase) and V(erbs); that subcategorisation has been used to classify transitive and intransitive verbs (e.g. employed and flourished) not in an ad-hoc manner but through the use of rules; and, that the relationship between categories and features can help in expressing complex co-ordination relationships, like the use of the conjunction and .
Generalised Phrase Structure Grammars
Grammar: Name
Syntactic Rule-base
Lexical Data-base
EXAMPLE
GRAMMAR 3 (CF-PSG)
Rule {simple sentence formation}
S-->NP VP
Rule {intransitive verb}
VP-->V:
<V arg1> = 0
Rule {single complement verb}
VP-->V:
<V arg1> = <X cat>
Rule {prepositional attachment}
PP --> P X:
<P arg1> = <X cat>
Rule {co-ordination of identical categories}
X0 = X1 C X2:
<X0 cat> = <X1 cat>
<X0 cat> = <X2 cat>
<X0 arg1> = <X1 arg1>
<X0 arg1> = <X2 cat>
word Trevor:
<cat(egory)>=NP
word Celia:
word Larry:
<cat>=NP
<cat>=NP
word flourished:
word hated:
word hated:
<cat>=V
<cat>=V
<cat>=V
<arg1>= 0
<arg1>= NP
<arg1>= NP
word disapproved:
word thought:
word looked:
<cat>=V
<cat>=V
<cat>=V
<arg1>= PP
<arg1>= S
<arg1>= AP
word and:
word handsome
<cat> = C
<cat> = AP
GRAMMAR 3 (CF-PSG)
Gazdar, Gerald., and Mellish, Chris. (1989). Natural Language Processing
in PROLOG - An Introduction to Computational Linguistics.
Wokingham (England): Addison Wesley Publishing Co.
Introduces computational linguistics and discusses the generalised phrase structure grammars in some considerable detail (Chapter 1 for introduction and Chapters 4 and 5 on grammars)
Winston, Patrick Henry (1992). Artificial Intelligence (Third Edition). Reading (MASS., USA): Addison Wesley Publishing Co.
See chapter on natural language processing, especially on X-bar grammars, entitled 'Expressing Language Constraints' (Chapter 28).