Ling 001     Syntax

Syntax is the study of how words are organized into sentences. It has certain parallels to morphology, which you'll remember is the way morphemes are organized into words. The difference is that some languages have very little morphology -- their words are mostly one morpheme in size -- but all languages have a great deal of syntax.


Let's start with a question: why bother with syntax at all?

We can communicate a lot without words, by the expressive use of eyes, face, hands, posture. We can draw pictures and diagrams, we can imitate sounds and shapes, and we can reason pretty acutely about what somebody probably meant by something they did (or didn't do).

Despite this, we spend a lot of time talking. Much of the reason for this love of palaver is no doubt the advantage of sharing words; using the right word often short-cuts a lot of gesticulating and guessing, and keeps life from being more like the game of charades than it is.

Given words, it's natural enough to want to put them together. Multiple "keywords" in a library catalog listing can tell us more about the contents of a book than a single keyword could.

We can see this effect by calculating the words whose frequency in a particular book is greatest, relative to their frequency in lots of books. Here are a few sample computer-calculated lists of the top-10 locally-frequent words, for each of a range of books on various topics:

College: the Undergraduate Experience

undergraduate faculty campus student college academic curriculum freshman classroom professor

Earth and other Ethics

moral considerateness bison whale governance utilitarianism ethic entity preference utilitarian

When Your Parents Grow Old

diabetes elderly appendix geriatric directory hospice arthritis parent dental rehabilitation

Madhur Jaffrey's Indian Cooking

peel teaspoon tablespoon fry finely salt pepper cumin freshly ginger

In understanding such lists, we are making a kind of semantic stew in which the meanings of all the words are stirred up together in a mental cauldron. We get a clear impression of what the book is about, but there is a certain lack of structure.

For example, the last word list gives us a pretty good clue that we are dealing with a cookbook, and maybe even what kind of cuisine is at issue, but it doesn't tell us how to make any particular dish.

Just adding more words doesn't help: the next ten in order from Jaffrey's cookbook are:  

stir lemon chicken juice sesame garlic broth slice sauce chili

This gives us some more information about ingredients and kitchen techniques, but it doesn't tell us how to make a vindaloo. To understand a recipe, we need more exact information about how the words (and ingredients!) combine.


The principle of (recursive) compositionality

We don't normally communicate by trading lists of keywords. Children at a certain age (perhaps a year and a half or so) often create discourses by stringing together one-word sentences:

 "Juice. Shirt. Off!"

(rough translation: "I spilled juice on my shirt. I need a new shirt")

However, when adults (and older children) communicate with words, they just about always put the words together in a hierarchical or recursive way, making bigger units repeatedly out of smaller ones. The meanings combine in a way that is not like the way that ingredients combine in a stew, but more like the combination of ingredients in an elaborate multilayered pastry, where things must go together in a very precise and specific way, or we get not a sachertorte but a funny sort of pudding.

This is the principle of compositionality: language is intricately structured, and linguistic messages are interpreted, layer by layer, in a way that depends on their structure.


This strict sort of compositionality permits what is called "syntax-directed translation" in the terminology that computer scientists use to talk about compilers for computer languages. It means, for instance, that  

(1 + 2)  x 3

can be understood by first adding 1 and 2 and then multiplying the result by 3, whereas to understand  

1 + (2 x 3)

we first multiply 2 and 3, and then add 1 to the result. In this way the interpretation of arbitrarily complex expressions can be computed layer by layer, combining the interpretations of simpler parts.  A simple recursive calculation will determine the value of an arbitrarily complex arithmetic expression as a function of the values of its parts:  

This line of thinking gives us a first answer to the question "why bother with syntax?" The layered (recursive) structures of syntax allow us to communicate an infinity of complex and specific meanings, using a few general methods for building phrases with more complex meanings out of phrases with simpler ones.

This system is based on a marvelous foundation -- the tens of thousands of basic morphemes, words, and idiomatic phrases of each human language. These are the "atoms" of meaning that syntactic combination starts with. Using syntax, we can specify not only the ingredients of a recipe, but also the exact order and method of their combination.


Let's  suppose that we know what words mean, and a lot about how to put meanings together, but we have no particular constraints on syntactic structure. In this imaginary condition, we don't care about the order of adjectives and nouns, nor where verbs should go relative to their subjects and objects. There are some general principles that might help us, such as that semantically-related words will tend to be closer together than semantically-unrelated words are, but otherwise, we are back with the "word stew" we imagined earlier.

Under these circumstances, we could still probably understand a lot of everyday language, because some ways of putting words together make more sense than others do.

We are in something like this condition when we try to read a word-for-word gloss of a passage in a language we don't know. Often such glosses can be understood: thus consider the following sentence in Kashaya, an American Indian language of northern California.

tíiqa mito taqhma c'ishkan shaqac'qash
I wish you dress pretty might wear

"I wish you might wear a pretty dress"

In this case the meaning of the parts makes it reasonable to figure out the whole. But in other cases, the interrelations are less obvious based on their individual meanings, and knowledge of the syntax is essential.

muukín' tito 'ama dút'a' dihqa'khe' dúucic'iphi t'o daqaac'i'ba
he him job will give if know would like

A speaker of English might be inclined to interpret this as:

"He will give him a job if he knows that he'd like it."

That makes sense based on English syntax: notice that the order of the verbs is the same in the translation.

But in fact it means:

"He would like it if he knew someone was going to give him a job"

because in Kashaya, the main verb occurs at the right, and the more subordinate verbs precede that final main verb. The only way to be sure about this is to know Kashaya syntax.


Neurological agrammatism

The task of trying to interpret glossed passages in a language we don't know may give us some appreciation for the situation of people whose ability to process syntactic structure is neurologically impaired, even though other aspects of their mental functioning, including their understanding of complex propositions, may be intact.

As we saw previously, when there is a lesion in the frontal lobe of the left cerebral hemisphere, the result is often a syndrome called Broca's aphasia.

The most important symptom is an output problem: people with Broca's aphasia cannot speak fluently, tend to omit grammatical morphemes such as articles and verbal auxiliaries, and sometimes can hardly speak at all. Their comprehension, by comparison, seems relatively intact.

However, under careful study their ability to understand sentences turns out to be deficient in systematic ways. They always do well when the nouns and verbs in the sentence go together in a way that is much more plausible than any alternative:

It was the mice that the cat chased

The postman was bitten by the dog

If more than one semantic arrangement is equally plausible:

It was the baker that the butcher insulted

or if a syntactically wrong arrangement is more plausible than the syntactically correct one:

The dog was bitten by the policemen

then they do not do so well. Clearly Broca's aphasia has a negative impact on the processing of syntactic structure.


Syntactic knowledge

What is it that we know when we construct or interpret a grammatical sentence? There are two fundamental aspects of sentence organization.

Linear order (= "precedence")

The dog chased the cat

The cat chased the dog

The cat the dog chased (i.e. the cat that the dog chased)

*Chased the cat the dog

Constituency

[ The dog ] chased the cat

[ The brown dog ] chased the cat

[ The dog with rubber toy ] chased the cat

[ My neighbor's dog that's always barking at squirrels when they're in the yard ] chased the cat

We'll talk about both of these today.


Word order

A basic observation about English word order is that it generally follows the pattern subject + verb + object (or "SVO").

  The dog chased the cat
  (subject) (verb) (object)

If we change the word order, it changes the relation of the nouns to the verb.

  The cat chased the dog
  (subject) (verb) (object)

This is because English uses word order to mark the role of nouns in the sentence: normally the subject precedes the verb, and the object (if any) follows the verb.


In many languages, a morpheme (such as a suffix or preposition) is used to perform the same function. Latin is such an example: the case marker is a suffix that indicates whether the noun is functioning as a subject or object, or in some other role. ("Case" refers to the noun's relationship to the verb or some other element, such as a preposition.)

  Canis fêlem vîdit
  dog
(subject)
cat
(object)
saw
(verb)

We can change the relations to the verb while keeping the nouns in place, simply by modifying the case endings. As you can see, the suffix -em marks objects (the "accusative case"), while -is, for this class of nouns at least, is used for subjects (the "nominative case").

  Canem fêlis vîdit
  dog
(object)
cat
(subject)
saw
(verb)

Given the existence of case-marking suffixes, word order in Latin is not used the way it is in English: instead, it typically functions to provide information about what the speaker is focusing on, and whether the participants described are already known to the listener. This is the sort of thing that English does by more complicated phrasings.

The general rule is that putting anything but the subject first foregrounds it as a topic of the discourse:

Fêlem canis vîdit

"(Remember,) the dog saw a cat"

And putting anything after the verb emphasizes it very heavily:

Canis vîdit fêlem

"(No,) it was a cat that the dog saw (not a bird)"

Fêlem vîdit canis

"(No,) it was the dog that saw a cat (not the night watchman)"

It's important to remember that the relation of each noun to the verb is unchanged by the new word order: only the conversational emphasis differs.


It's not just dead languages like Latin (and Old English) that do this. Russian works in a very similar way. Here, since these are both feminine nouns, the nominative is -a and the accusative is -u.

  Sobaka uvidela koshku
  dog
(subject)
saw
(verb)
cat
(object)

  Sobaku uvidela koshka
  dog
(object)
saw
(verb)
cat
(subject)

Though the word order is the same, the case marking in these two sentences shows that they refer to different situations (according to who is seeing, and who is being seen).

As in Latin, word order in Russian serves other purposes, including what noun serves as the topic of the discourse (the main item under discussion).

The most typical word order in Russian is the same as English, SVO.

  Sobaka uvidela koshku
  dog
(subject)
saw
(verb)
cat
(object)

If you change the order, the relations of the nouns to the verb are unchanged, but it's most natural to use another order if you're trying to emphasize one of the nouns (which one is determined by which word you stress).

"The dog saw the cat"; "It's the cat that the dog saw"

  Koshku uvidela sobaka
  cat
(object)
saw
(verb)
dog
(subject)

"The dog saw the cat"; "It's the dog that saw the cat"

  Koshku uvidela sobaka
  cat
(object)
saw
(verb)
dog
(subject)

The thing to notice for all these languages is that word order matters. Not very surprising, of course, but just remember that linear order is one of the basic aspects of syntax, regardless of what purpose it gets put to by a particular language.


Even a language with case markers can have a relatively fixed order. In Japanese, for example, the normal word order is subject + indirect object + direct object + verb.

  Tarô ga Hanako ni sono hon o yatta
  Taroo
(subject)
Hanako
(indirect object)
that book
(direct object)
gave
(verb)

"Taroo gave that book to Hanako."

Thanks to the case markers (which here are postpositions: like prepositions, but they occur after the noun), the nouns can be reordered without changing meaning. Essentially, any one of the nouns can be moved to the front.

  Hanako ni Tarô ga sono hon o yatta
  Hanako
(indirect object)
Taroo
(subject)
that book
(direct object)
gave
(verb)

  sono hon o Tarô ga Hanako ni yatta
  that book
(direct object)
Taroo
(subject)
Hanako
(indirect object)
gave
(verb)

Since English has very minimal case marking -- just in pronouns such as we and us -- this kind of reordering would lead to confusion of meaning. Some reordering is possible, however, generally with clear intonation to make the unusual situation obvious.

Pat I like, but Chris I can't stand.

In this contrastive situation, the stress on the fronted nouns (objects of the verbs) helps the listener identify the special construction.


Constituency

A simple (and inadequate) rule for creating a question from an English sentence is that you take the auxiliary verb and move it to the front of the sentence. (The notion "auxiliary verb" is discussed below.)

the dog is in the yard

is the dog __ in the yard?

But what if there's more than one such verb in the sentence? You can't just take the first one you find.

the dog that is in the yard is named Rex

*is the dog that __ in the yard is named Rex?

Rather, we need to identify the verb which is serving as the head of the main clause of the sentence:

the dog that is in the yard is named Rex

is the dog that is in the yard __ named Rex?

We can't understand this distinction without knowing how the pieces of the sentence fit together, i.e. their constituency.

In this case, we need to understand that the subject of the sentence might be complex, potentially containing one or more verbs of its own (in relative clauses that modify a noun).

[ my dog ] is named Rex

[ that dog ] is named Rex

[ the dog you just saw ] is named Rex

[ the dog that is in the yard ] is named Rex

[ the dog whose owner was arrested yesterday by the police for using him in a drug-running scheme ] is named Rex

All these sentences have the same structure except for the contents of the subject; for operations that ignore the internal structure of the subject, such as inversion of the subject and the auxiliary, they all behave the same.


Lexical categories and phrases

A noun is a single word (or compound).

dog

dog food

A noun phrase normally contains at least one noun (the head of the phrase), possibly with other elements such as determiners and adjectives, or a relative clause or other modifier.

the dog

a big dog

the dog food that you bought in the store

Determiners fall into several subtypes.

  articles a, the
  demonstratives this, that, these, those
  quantifiers some, many, few, all
  possessives my, your, his, her, our, their
Pat's, my sister's, ...

Notice that a maximum of one can occur for a particular noun phrase. Quantifiers are actually quite a bit more complicated than this classification implies, and they can be treated as determiners only in simple cases. Notice such co-occurring instances as a few and all the. They're best treated as a separate class, or as some other type such as adjective or adverb; which analysis is best depends on the individual item.

a dog

that dog

my dog

Pat's dog

*the my dog

*Pat's this dog

In the plural, the indefinite article is often null: a dog, (some) dogs.


An adjective is a word that modifies a noun, and can occur as a comparative and superlative.

a big dog

a bigger dog

the biggest dog

An adjective phrase is not a prominent category (not nearly as important as a noun phrase or verb phrase), but we can use the term to describe an adjective that itself is modified or takes a complement.

a [ very big ] dog

she's [ proud of herself ]

I'm [ happy to meet you ]

The adverb class is more of a hodgepodge. These words modify a verb, an adjective, or another adverb.

talk loudly

very big

quite loudly

Adverbs tend to function on the periphery of most of the processes we'll be examining -- often they have no necessary role, and don't much affect the outcome.


A preposition normally takes a noun phrase as a complement. The result is a prepositional phrase.

she gave the book [ to me ]

the dog is [ in the yard ]

a book [ with a red cover ]

Some prepositions are complex.

I saw her [ in front of the store ]

the cat pushed the toy [ out of the box ]

Prepositions in English can also occur without a complement; they're often called adverbs or "particles" (when associated with a verb).

the dog ran in the house
the dog ran in

I'd seen him before that encounter
I'd seen him before

There is a parallel between these intransitive prepositions (i.e. without an object) as the intransitive use of a verb, as in I already ate (something).


A verb is (in most languages) the locus of such distinctions as tense (past, present, future) and aspect (progressive, perfect).

they see

they saw

seeing

Very often it's necessary to talk about the verb phrase, which includes the complements of the verb such as a direct object, indirect object, and even a sentence.

they [ saw me ]

she [ gave the book to me ]

you [ said that you would arrive on time ]

Much of the interest in syntax is centered on the analysis of verbs and their complements.


Tests for constituency

There are several ways to determine whether a string of words is a constituent, i.e. a coherent grouping of words in a syntactic unit.

All the subjects in the sentences about Rex can stand alone as answers to the question, Who is Rex?

Who's Rex? My dog!

That dog!

The dog you just saw!

The dog that's in the yard!

The dog whose owner was arrested yesterday by the police for using him in a drug-running scheme!

In contrast, many strings of words are not able to stand alone, and they are therefore not constituents.

*The dog you just!

*The dog that's in the!

*The dog whose owner!

The names of books and such are generally constituents as well, since they stand alone. (Examples are from the latest New York Times bestseller list.) Very often they're noun phrases.

A painted house

The bonesetter's daughter

A darkness more than night

The cat who smelled a rat

Other types are found, such as prepositional phrase or full sentence.

From the corner of his eye

What if God were the sun?

These are all constituents. Very occasionally, however, one finds a book title that is not a constituent. One example is the following, a 1978 book by Andrew Holleran.

Dancer from the dance

It sounds like a noun phrase ("the dancer who is from the dance"), but actually it's a subpart of the verb phrase in this quote from Yeats:

How can we know the dancer from the dance?

Such stand-alone use of a non-constituent is permitted by poetic license.


A good test for constituency is whether a pro-form -- that is, a pronoun such as it or them, or the proverb do, do so, do it -- can replace the string of words.

[ the dog you just saw ]NP is named Rex

[ he ] is named Rex

I gave [ the book ]NP to Pat

I gave [ it ] to Pat

I [ gave the book to Pat ]VP

Yes, I [ did ]

Yes, I [ did it ] already

Yes, I [ did so ] yesterday

The proform replaces an entire constituent.


Another test for a constituent is whether it can move as a unit. An example is the construction called a cleft sentence.

It's [ my dog ] that's named Rex

It's [ the dog that is in the yard ] that's named Rex

Examples for the sentence I gave the book to Pat:

It's [ the book ] that I gave __ to Pat

It's [ to Pat ] that I gave the book __

It's [ Pat ] that I gave the book to __

*It's [ the book to Pat ] that I gave __

*It's [ gave the book ] that I __ to Pat

*It's [ gave to Pat ] that I __ the book

It's [ gave the book to Pat ] that I did

In the last example, we see the use of a pro-form to substitute for the entire verb phrase, which of course is another test.


Structural ambiguities

Some sentences are ambiguous because they contain words with more than one meaning, and either one makes sense in the context. An example from Groucho Marx in Animal Crackers.


-- One morning I was sittin' in front of the cabin smoking some meat, when...

-- Smoking some meat?

-- Yes, there wasn't a cigar store in the neighborhood.

Click for more Marx Brothers Sound Bites.

These are usually called lexical ambiguities, since they depend on the individual words (i.e. the polysemy of smoke), rather than how they fit into sentence structure.

Newspaper headlines are a popular source of such examples, particularly since the "telegraphic" syntax (omission of many function words) increases ambiguity.

Doctor testifies in horse suit

Defendant's speech ends in long sentence

Caribbean islands drift to left

Queen Mary having bottom scraped

20-year friendship ends at altar

Iraqi head seeks arms

The relevance of these examples in the present context is that they depend solely on lexical meaning, and not on syntactic structure, which is the same for either meaning. Thus Queen Mary refers to a ship or a person, but either way it's a noun serving as subject of the verb.


Other sentences, however, are ambiguous because they can be analyzed according to more than one syntactic structure, so they're called structural ambiguities.

Another example from Groucho Marx in Animal Crackers, though this one is structural.


One morning I shot an elephant in my pajamas.

How he got into my pajamas I dunno.

Click for more Marx Brothers Sound Bites.

 

The ambiguity here centers on the prepositional phrase in my pajamas: does it modify the noun elephant, or the entire verb phrase? The simple linear order is consistent with either:

  I shot an elephant in my pajamas
  subject verb noun phrase prep phrase

The more reasonable meaning is "I shot an elephant while (I was) in my pajamas." This is parallel to a sentence like I [bought a book [with my credit card]].

  I shot an elephant in my pajamas
  subject verb noun phrase prep phrase
    verb phrase

Or, as a tree:

     S
    / \
  NP   VP
  /   /  \
 I   VP   PP
    /  \   \
   V    NP  in my pajamas
   |    |
  shot  an elephant

Also possible, though, is "I shot an elephant that was in my pajamas." This is parallel to a sentence like I bought [a book [with a red cover]].

  I shot an elephant in my pajamas
  subject verb noun phrase prep phrase
      complex noun phrase
    verb phrase

And as a tree:

     S
    / \
  NP   VP
  /   /  \
 I   V     NP  
    /     /  \
  shot   NP     PP
         /         \
       an elephant  in my pajamas

Later on we'll see more about how these verb phrases are constructed.


As with lexical ambiguity, newspaper headlines are a good source of amusing structural ambiguities.

Dr. Ruth to talk about sex with newspaper editors

Enraged cow injures farmer with ax

Killer sentenced to die for second time in 10 years

Police discover crack in Australia

British left waffles on Falkland Islands

Lawyers give poor free legal advice

Think about the source of these ambiguities, in general terms: which syntactic relation is the source of the problem?

syllabus   schedule

gene@unagi.cis.upenn.edu