University of Surrey
School of EEITM
University of Surrey
Guildford, Surrey
GU2 5XH, UK

Tel: +44 (0)1483 259823
Fax: +44 (0)1483 876051

 
 
 

Definitions

Grammar formalisms

Phrase Structure Grammar:

Generative Grammars

X-bar Grammars

Grammar as knowledge representation

Generalised Phrase Structure Grammars

References










Definitions
Let us recall these definitions
 
 

Formally, a language is a set of sentences, where each sentence is a string of one or more symbols (words) from the vocabulary of the language.

A grammar is a finite, formal specification of the set of sentences in a language: Most interesting languages have an infinite number of sentences.

Recognizer program. A language can be specified by writing a program which can input a series of words in a sentence and can then declare the input as being a sentence of the language or not.

Production Grammar or Phrase Structure Grammar: A grammar which enables a human (or a machine) to rewrite one sequence of symbols into another.
 

Grammatical categories known to most competent speakers of English:
 

Category
Examples
Adjective
big, little
Adverb
slowly, beautifully
Complementizer
that, which
Determiner
the, this , a , an
Noun
Johnny, Mary, book
Preposition
to, in
Pronoun
he, she, him, her, it
Quantifier
all, every
Verb
return, give
Verb (Auxiliary)
will, have

 


More Definitions:

• Phrases of many different types:
 
 

Phrase A term used in grammatical analysis to refer to a single element of structure typically containing one or more words, and lacking the subject-predicate structure of typical clauses; a part of a structural hierarchy falling between a word and a clause.

 
 
Phrase
Definition
Examples
Adjective Phrase (AdjP or AP) A phrase exhibiting a distribution similar to that of a lexical adjective and semantically acting as a modifier of a noun or a noun phrase. very big; proud of her achievements, more expensive than the previous one
Adverb Phrase (AdvP) A phrase whose lexical head is an adverb very lightly, right here
Noun Phrase (NP, aka nominal groups) Regarded as one of the most important syntactic category, and one which appears to be present in all languages. An NP may be defined as any category which can bear some grammatical relation within a sentence, for instance, as subject and object (direct, indirect or oblique). Noun phrases contain nouns as their head. the hotel in the city; Johnny, Sarah; the book
Prepositional

Phrase (PP)

A phrase consisting of a preposition and a noun phrase acting as its object in the garden; to the library; in front of the hotel
Verb phrase (VP) There are two senses in which VP is used: First (I), a syntactic category consisting of a verb and its complements, and usually, its adjuncts. VP's generally function as a predicate. Second (II), a VP refers to a group of verbs: one main verb (or lexical verb) and others are subordinate to it (auxiliary verbs) (I):bought her car
 
 

(II): is coming; may be coming; get up to


Grammar formalisms
 

Criteria for the design and choice of grammar formalisms for NLP:
        1. Linguistic Naturalness
        2. Mathematical Power
        3. Computational Effectiveness

1. Linguistic Naturalness

The notation for the formalism should allow and encourage to encode their linguistic descriptions in a manner that is easy to understand and modify. The notation should have some empathy with the predominant paradigms in syntax analysis.
 

2. Mathematical Power

Notational restrictions on grammar formalisms can seriously limit the class of grammars that can be expressed. Conversely, minor changes in notation may render the formalism more powerful - capable of representing a larger class of grammars.

Phrase Structure Grammar:
 

A grammar which enables a human (or a machine) to rewrite one sequence of symbols into another.

A Phrase Structure Grammar has four components:
 

the terminal vocabulary the words (or symbols) of the language being defined
N
the non-terminal vocabulary the symbols which are used in specifying the grammar 
V
is the vocabulary of the language the union of sets T and N
P
a set of productions. Each production is of the form a ® b, where a is a sequence of one or more symbols from V and b is a sequence of zero or more symbols



 

Inadequacy of Phrase Structure Grammar
 

Observation:        English is neither regular nor context free
                           Some common constructions in English cannot be generated by PSG 
                           Even if more powerful grammars could be written the following 
                           "problems" will still persist:

Problem Categories
conjunctions, aux. verbs, passives

Meaning and derivation trees
Apparently similar structures have different meanings
Apparently dissimilar structures have same meanings.
 

Generative Grammars

Thesis An utterance is characterized as the surface manifestation of a "deeper" structure representing the "meaning" of the sentence.

Generative Grammar: An adequate theory of a language like English must be a statement of finite length which can
(a)account for the infinite number of possible sentences
(b) assign to each a structural description which captures the underlying knowledge of a an idealized native speaker.
 
 

A formal system of rules is just such a statement - a device for producing the sentences of a language.
A formal statement is a model of abstract knowledge and not a model of human behaviour
 

A Generative Grammar is said to be concerned with competence, as opposed to performance



 

X-bar Grammars

A comparison with phrase-structure grammar








Consider the following description of the verb phrase:
return her book to the library in the afternoon









X-bar Grammars

A comparison with phrase-structure grammar

• The above structure indicates, perhaps, that it is unique to a verb phrase. A number of linguist argue that the same verb phrase can be represented with the help of a much richer structure: a structure that encompasses at least all four major phrasal categories.

• The top node, VP, whose subordinate nodes are all and only the words that constitute the entire verb phrase.

• In between the top node, VP, the maximal projection , and the head node, V, there are a series of nodes, , v-bar. Each ,connects exactly one noun phrase or prepositional phrase to the main line between the head node and the verb's maximal projection. Thus, the sentence tree is a binary tree in which no node has more than two branches.


X-bar Grammars

Prepositional Phrases

• Most prepositional phrases also exhibit the same structure as the verb phrases: the prepositional phrase's maximal projection acting as the superordinate node and a head node connected to the instance word. The intermediate connector tying in a phrase from the right. Therefore, one can use the same structural pattern as was used above for parsing a verb phrase:
 
 


X-bar Grammars

Noun Phrases

• The case of the noun phrases, at least superficially, appears different to the VP and PP in that many a noun phrases having nothing to their right and are preceded on the left by a determiner or a pronoun: like the or his or her. Words that appear on the right are called specifiers.

Consider the phrase the library in the city:
 
 









X-bar Grammars

Prepositional and Verb Phrases

• However, for the structure of the NP to be similar to that of PP or VP, NP's with branches to the right should exist, and, correspondingly, verb phrases and prepositional phrases with branches to the left to exist. Consider the following prepositional and the verb phrases:
 

PP = precisely at Noon;









and VP = all return their books.



 
 








X-bar Grammars

Some theoretical notes

• X-bar grammar has been developed as an alternative to the traditional accounts of phrase-structure and lexical categories.

• The argument here is that the rules of phrase structure grammar need to be more constrained and that phrasal categories need to be recognised. This is particularly relevant to the identification of the so-called intermediate categories that are larger than the noun but smaller than the phrase, like very fast or very fast car in the phrase the very fast car. These categories are easily identified formally in the X-bar syntax by a system of X-bars: Each bar, a superscript on the phrasal categories, identifies a level of phrasal expansion:

For example, the following binary tree shows two levels of phrasal expansion, double-bar, single bar and zero bar


 


X-bar Grammars

Inflection Phrase
 
 

Phrase Definition Examples
Inflection Phrase (IP) Inflection is the variation in form of a single lexical item as required by its various grammatical roles.

An inflection phrase is like the lexical categories N, V, A and P, in that it is a zero-level category

 


 
 
 
 

Sarah will return ...

Sarah returned ....

Consider the following inflection phrases: Sarah will return her book








and Sarah returned her book


 


X-bar Grammars

Complementizer Phrase
 

Phrase Definition Examples
Complementizer Phrase (CP) A complementizer is a grammatical formative which serves to mark a complement clause. 
 
 

A complementizer phrase contains a complementizer as its head.

In the NP, the report that John Smith died, the clause that John Smith died is a complement of the noun report.

Consider the CP, that Sarah will return her book:


 


X-bar Grammars

The binary tree hypothesis states that a binary tree is the best kind of tree for representing sentence structure at all levels. It can be shown that a number of (English) verb phrases, noun phrases and prepositional phrases, conform to the general pattern:
 
 

• XP stands for the four principal phrases:

XP = NP; AdjP; VP; PP.

• X stands for the four major heads:

X = N; V; P; Adj.

• Note that all phrases do not need to exhibit all the properties. However, all phrases are instances of the X-bar schema. Words or phrases brought from the right are called complements and those brought from the left are called specifiers. The superordinate node of an X-bar tree is called the maximal projection and intermediate nodes are denoted with a superscript bar on them. The leaves are instances of the phrasal class.
 


X-bar Grammars

Maximal Projection, Head and a note on the binary tree representation

• Maximal Projection: In the X-bar system the largest syntactic category which is formally related to some lexical category by the relation of projection , represented by the largest available value of the feature bar, and identified with the traditional (full) phrasal category associated with that lexical category.

Head: That element of a constituent which is syntactically central in that it is primarily responsible for the syntactic character of the constituent. Currently almost all constituents are generally regarded as projections of lexical heads.

• A binary X-bar tree is a representation that is a tree, in which there are three general types of nodes: X, , and XP. Each node has, at most, two branches and leaves are words.

• In English, specifiers enter XP nodes from one side; complements enter X nodes from the other. The direction from which specifiers and complements enter is language specific. The kind of phrase that can serve as a specifier for a particular kind of XP or complement for a particular kind of  depends on the occupant of the head position, X.
 


X-bar Grammars

Sentences as complementizer phrases

• Sentences can be viewed as complementizer phrases, but most have empty specifier and head positions:
 
 


X-bar Grammars

Noun Phrases, Noun Cases and Governors

Consider the variation in the use of pronouns with case in English that can relate case to person and number:
 
 

Person
Case
 
Nominative
Accusative
Genitive
 
Number =singular
First
I
Me
My
Second
You
You
Your
Third
He, She, It
Him, Her, It
His, Hers, Its
 
Number =plural
First
We
Us
Our
Second
You
You
Your
Third
They
Them
Their

The substitution of one pronoun by another is permissible in the following two sentences in that both are grammatically acceptable:

She gave me his book

I gave them your book

but the following substitution is not permissible: Me gave his she book

 
 

X-bar Grammars

Noun Phrases, Noun Cases and Governors

Why one substitution is grammatically acceptable whilst the other is not?

The previous table shows that whilst one pronoun can generally be substituted by another pronoun, it is the head of the noun phrase actually determines the case of each pronoun. The case is determined by the governing head and government is a property determined by the structure of the X-bar tree.
 


X-bar Grammars

Noun Phrases, Noun Cases and Governors

A binary X-bar representation.
 

A binary X-bar tree is a representation that is a tree in which:

• there are three general types of nodes: X, X and XP.

• each node has at most two branches

• Leaves are words

• In English, specifiers enter XP nodes from one side; complements enter -X nodes from the other. The direction from which specifiers and complements enter is language specific.

• The kind of phrase that can serve as a specifier for a particular kind of XP or complement for a particular kind of -X depends on the occupant of the head position, X.


X-bar Grammars

Noun Phrases, Noun Cases and Governors

The case determining constraint in terms of X-bar schemas for determining a pronoun's case assignment.
 
 

To determine a pronoun's case assignment,

• Move upward from the pronoun until you arrive at either an IP node for which the head has tense information, or at an NP node, a PP node or a CP node.

Announce that the pronoun is nominative if the head belongs to an IP node and carries tense information.

Announce that the pronoun is accusative if the head is a preposition, verb or complementiser.

Announce that the pronoun is genitive if the head is a noun.

Otherwise, the pronoun cannot be assigned case, and the sentence is unacceptable.


X-bar Grammars

Noun Phrases, Noun Cases and Governors

Pronouns

Consider the following sentence:

He presented me her book the pronoun case is determined by the governing head by drawing an X-bar schema in which the above phrase fits.



 
 

Grammar as knowledge representation
 
 

The study of grammar can be viewed as a branch of knowledge representation: A grammar is a way of representing certain aspects of what we know about a language that is explicit and formal enough to be understood by a machine.

A language can be regarded as a set whose membership is precisely specifiable by rules. The set of compound linguistic expressions in a natural language is not finite and, hence these expressions cannot be listed. (As far as is known, no natural language is finite).


 
 
• Formal systems are required such that they define the membership of the infinite sets of linguistic expressions and assign a structure to each member of these sets.

a) Formalisms that contains information about which (linguistic) objects combine together and what the properties of resultant objects are.

and

(b) Formalisms that transparently provide each legal string with an implicit structural description.

[Transition networks use extraneous procedural information about (a) and ATN’s necessitate the use of explicit structure building annotations for (b)]

DECLARATIVE GRAMMAR FORMALISM satisfy criteria (a) and (b) without recourse to prescriptive and ad-hoc devices.


Generalised Phrase Structure Grammars
 

• Generalised Phrase Structure Grammar (GPSG) is a theory of grammar developed in the 1980's.

• The class of grammars permitted by GPSG is strongly equivalent to the context-free grammars.

• GPSG attempts at characterising natural languages in terms of context free grammars while at the same time the theory does not make any claims at psychological reality.

• GPSG is characterised by the separation of grammatical statements into a metagrammar and an object grammar. The metagrammar contains metarules and other generalisations about the rules of grammar. The object grammar directly licenses local subtrees.
 
 


Generalised Phrase Structure Grammars: (CF-PSG, Type 2)

Grammars should characterize the order of elements in a string as opposed to attempting to reconstruct some hypothesized underlying order.

Grammars discussed are declarative and generally based on a decomposition of syntactic categories into components known as features.

[This can support a compositional approach to meaning: each well-formed expression has a meaning of its own, a meaning that has been composed from the meaning of subexpressions that make it up]. The key features of a CF-PSG are the employment of a finite set of grammatical categories and a finite set of rulesfor specifying how LHS categories can be realised as sequences of RHS elements.

An RHS element may then be a category or a particular symbols of the language. When a CF-PSG rule specifies that an LHS category can be realised as a particular RHS , this realisation is deemed to be possible regardless of the context in which the LHS category appears. The rule does not make any restriction on the context in which this can happen - hence the term 'context free' (Gazdar and Mellish 1989:106).

The most popular framework for formalising CF-PSG, and indeed most other NL grammars, is directed acyclic graph. - specifically the tree structure.
 


Generalised Phrase Structure Grammars

Grammars used in computational linguistics employ the following:

(i) A representation for syntactic categories or ‘parts of speech’

(ii) A Data type for:

                                        a. words [and hence a lexicon, wordlist etc.].
                                        b. syntactic rules
                                        c. syntactic structures

Three data types------->parser-------> Instantiated syntactic structures
[words/lexicon/wordlist] [syntactic rules]
 

  •  A complete grammar formalism provides, at least,
- a language for specifying syntactic categories

- a language for representing lexical entries

- a language in which to write rules

- a language for exhibiting syntactic structures



Generalised Phrase Structure Grammars: Categories, subcategories, features

Definitions

Category (categorisation, categorical):

Categorisation in the field of grammar refers to the establishment of a set of classificatory units or properties used in the description of language. These units or properties have the same basic distribution and which occur as a structural unit throughout the language.

Category in some approaches refer to the classes themselves e.g.. NOUN, VERB, SUBJECT, PREDICATE, NP, UP, etc. Grammatical categories include N, V, VP, NP Grammatical functions or functional categories include SUBJECT, OBJECT, COMPLEMENT. A categorical rule is a rule which expands a category into other categories.

Feature
A term used in linguistics (and phonetics) to refer to any typical or noticeable property of spoken or written language.

• In Generative Grammatical Analysis the term is associated with the way in which words are classified in the lexicon in terms of their grammatical properties, such as

[animate], [common], [intransitive],

[countable]

(such features are binary and analysed as e.g. [+animate]/[-animate]

• In Generalised Phrase Structure Grammar grammatical categories are defined in terms of feature specifications - ordered pairs containing a feature and feature value which rules can access.
 


Generalised Phrase Structure Grammars: Exemplar grammars

Grammar:                     Name;
                                    Syntactic Rule-base
                                    Lexical Data-base
 
 

EXAMPLE

GRAMMAR 1 (CF-PSG)

Rule sentence formation
S-->NP, VP

Rule verb
VP-->V, NP

Rule verb
VP-->V

word Ken Clarke:
<cat(egory)>=NP
word StLukes:
<cat>=NP
word clients:
<cat>=NP
word died:
<cat>=V
word employed:
<cat>=V


Generalised Phrase Structure Grammars

Consider the following sentences:

  •  The 'correct' parser for the above sentences should contain a description of a rule-base, e.g. syntactic rules, and a lexical data-base, i.e. the words/phrases together with the appropriate grammatical categories and features.

  •  

     

    • The following schema introduces the co-ordinate construction and suggests that a given category can consist of two further instances of the same category separated by an item of category 'C', which will turn out to be realized as 'and' or 'or'.
     

    Rule {co-ordination of identical categories}
    X0 = X1 C X2:
    <X0 cat> = <X1 cat>
    <X0 cat> = <X2 cat>
    <X0 arg1> = <X1 arg1>
    <X0 arg1> = <X2 cat>
    • This aspect of the grammar provides an example of recursion in syntactic rules. A grammar with co-ordinate construction rule admits infinitely many strings of words as grammatical instances of sentences. A grammar without co-ordinate construction will, in principle, admit only a finite number of strings.

    An example parser is given below:


    Generalised Phrase Structure Grammars

    Grammar:         Name
                            Syntactic Rule-base
                            Lexical Data-base

    EXAMPLE



    GRAMMAR 2 (CF-PSG)

    Rule {simple sentence formation}
    S-->NP VP
    Rule {intransitive verb}
    VP-->V:
        <V arg1> = 0
    Rule {single complement verb}
    VP-->V:
        <V arg1> = <X cat>
    Rule {co-ordination of identical categories}
    X0 = X1 C X2:
    <X0 cat> = <X1 cat>
    <X0 cat> = <X2 cat>
    <X0 arg1> = <X1 arg1>
    <X0 arg1> = <X2 cat>

    word Michael:
        <cat(egory)>=NP
    word car-workers:
        <cat>=NP
    word miners:
        <cat>=NP
    word car-dealers:
        <cat>=NP
    word flourished:
        <cat>=V
        <arg1>= 0
    word employed:
        <cat>=V
        <arg1> = NP
    word and:
        <cat> = C


    Generalised Phrase Structure Grammars

    Grammar:         Name
                              Syntactic Rule-base
                              Lexical Data-base
     


    EXAMPLE

    GRAMMAR 2 (CF-PSG)

    The important points to be highlighted here that there are major grammatical categories, like S(entence), NP (Noun Phrase) and V(erbs); that subcategorisation has been used to classify transitive and intransitive verbs (e.g. employed and flourished) not in an ad-hoc manner but through the use of rules; and, that the relationship between categories and features can help in expressing complex co-ordination relationships, like the use of the conjunction and .



    Generalised Phrase Structure Grammars

    Grammar:         Name
                              Syntactic Rule-base
                              Lexical Data-base
     


    EXAMPLE

    GRAMMAR 3 (CF-PSG)

    Rule {simple sentence formation}
    S-->NP VP
    Rule {intransitive verb}
    VP-->V:
        <V arg1> = 0
    Rule {single complement verb}
    VP-->V:
    <V arg1> = <X cat>
    Rule {prepositional attachment}
    PP --> P X:
    <P arg1> = <X cat>
    Rule {co-ordination of identical categories}
    X0 = X1 C X2:
    <X0 cat> = <X1 cat>
    <X0 cat> = <X2 cat>
    <X0 arg1> = <X1 arg1>
    <X0 arg1> = <X2 cat>
    word Trevor:
    <cat(egory)>=NP
    word Celia:                                                    word Larry:
    <cat>=NP                                                     <cat>=NP
    word flourished:                                             word hated:                                     word hated:
    <cat>=V                                                       <cat>=V                                         <cat>=V
    <arg1>= 0                                                     <arg1>= NP                                   <arg1>= NP
    word disapproved:                                         word thought:                                   word looked:
    <cat>=V                                                        <cat>=V                                         <cat>=V
    <arg1>= PP                                                   <arg1>= S                                      <arg1>= AP
    word and:                                                       word handsome
    <cat> = C                                                      <cat> = AP

     

    Generalised Phrase Structure Grammars

    GRAMMAR 3 (CF-PSG)


     


    References

    Gazdar, Gerald., and Mellish, Chris. (1989). Natural Language Processing in PROLOG - An Introduction to Computational Linguistics.
    Wokingham (England): Addison Wesley Publishing Co.

    Introduces computational linguistics and discusses the generalised phrase structure grammars in some considerable detail (Chapter 1 for introduction and Chapters 4 and 5 on grammars)

    Winston, Patrick Henry (1992). Artificial Intelligence (Third Edition). Reading (MASS., USA): Addison Wesley Publishing Co.

    See chapter on natural language processing, especially on X-bar grammars, entitled 'Expressing Language Constraints' (Chapter 28).