Occasional Papers in Autolexical Grammar #2
 

© by Eric Schiller. All Rights Reserved. This document may be freely distributed, if unaltered and complete, in HTML format.

The preferred reference for this document is:

Schiller, Eric (1997). The Lexicon in Autolexical Grammar. Published on the Internet by Linguistics Unlimited, Moss Beach CA.

The Lexicon in Autolexical Grammar

This document describes the formal organization of the lexicon. The descriptions of each item are brief and not intended to act as a full explanation of each item, though it may serve as a handy reference.

Version: 0.1 (1-Jul-97) Initial document. Very much subject to change!
Version: 0.2 (7-Jul-97) Changes to semcase listing. O.1 had the wrong information.

Version 0.8 (10-Feb-99) Substantial revisions to semantic component

Contents


Formal description of the Lexicon

A lexeme may be specified for a large number of attributers. For the purposes to exposition most properties are associated with particular dimensions, which form the first three letters of the variable name:

 dsc = discourse

 lex = lexicon

 mph = morphophonology

 msx = morphosyntax

 phn = phonology

 sem = semantics

 syn = syntax

 No particular significance should be attached to this organization. It is simply easier to discuss the model in this way.

The remainder of the variable name, either three or four letters, describes the variable itself. It is certainly possible top standardize on either a three-letter or four-letter tag, but the meaning of the variable seems clearer with the tags chosen below, and these are the ones used in the ALGAE project.

In the February 1999 revision I standardized all attributes to 6 characters, for ease in building computational tools.

Lexicon

The Lexicon is the collection of lexical items. Each lexical item is specified for the categories and features applicable at each dimension. Items may be underspecified, or even unspecified, for attributes, receiving values for them by default from the prototype for categories at other levels. The following are attributes which are directly specified in the lexicon. Some of these are external attributes which are not inherently part of the languge, for example the record number in our database.

lexeme

The citation form of the lexical item.

lexidx       

This is an integer used as the primary key to the lexemes in the database

lexgls

For the purposes of translation or paraphrase, a string is used. In most cases this will be the same as the citation form.

lexprb

This isthe lexical probability used by stochastic parsers and indicates, without taking into account context, how likely this entry applies when ambiguity is present

lexfxd

This is a scale indicating how flexible or inflexible a phrasal lexeme or idiom is. +2 indicates that the phrase must be exactly as entered, for example so to speak. +1 indicates that some flexibility exists, for example The cat has been let out of the bag. 0 allows substitution rather freely, but in accordance with a template. One can include the Miss America idiom here: I believe in equality for everyone, be they black, white, yellow red or green. Negative numbers can be used when even the template is flexible.

lexsub

This is a string which is used when a lexical item subcategorizes for (demands) a complement in a certain case or a specific lexeme. I have not tied this to any particular dimension for the simple reason that I am not sure where it belongs, or whether case subcategorization and lexical complements even belong together.

Example:

 I walked toward him.

 toward [DIR]

  I ducked so that the flying saucer would not hit me.

 so [(that) S]

The second example brings up an interesting question as to whether so subcategorizes for a semantic proposition or syntactic sentence, given that the two are usually associated. I choose syntax here for no particular reason except that syntax is generally more concrete and specific than semantics.

Syntax

synhdf

This specifies the syntactic head feature of the lexeme, if any. The choices are

{V,N,P,A,M,0}. Note that we do not decompose heads into + or - V and N, as the four-way distinction N,V,A,P does not include headless items (adverbs) or items with no syntax at all.

Examples:

 N mouse

 V eat

 A smelly

 P in

 M probably

 nil Oh!

synbar

An integer indicating the Syntactic bar-level {-1...2}. I am not sure that -1 is needed, but if there is an item which demands a complement to make a zero-level category, it might be useful.

Examples with various heads (phrasal and lexical):

 N2 the tall ape; he

 N1 tall ape; one

 N0 ape

 V2 The tall ape loved the actress;

V1 loved the actress; do so

 V0 loved

 P1 in the autumn; then

 P in

 A1 fond of the actress; ?

 A0 big

  There are two gaps, A2 and P2. There are some reasons to consider complement constructions as P2, but this remains an open question.

synmat

This is used when a lexical item demands a particular syntactic category as a complement.

Examples: He is such a nice guy.

In this case, such demands an N2 complement. The syntactic specification is: M [N2>>N2]. The formula in square brackets has the synmate on the left and the synresult on the right, separated by >>. The synmate is N2 here, as is the synresult (see next).

synres

This is a string which represents the syntactic result of concatenation with complement. See synmate for details.

Morphosyntax

msxfrm

The type of morpheme is represented by a character {a,r,s,i} representing affix, root, stem and inert form.  

msxhdf

A character is used to represent the morphosyntactic head. This is a language specific list. For some isolating languages this is always nil. Often the list reflects the syntactic heads of a language. The morphosyntactic head determines the applicability of affixes.  

Examples:

a slow

 v walk

 n dog

msxnum

Lexemes can be morphologically specified for number (singular, plural, dual) independently of their semantic number. Most of the time morphological and semantic number will agree, but not always, for example, media and data are morphologically plural in some dialects of English and singular in others. Group is morphologically singular but triggers plural (semantically based) agreement for many speakers.

Morphophonology

mphafx

The type of affix: {p,s,i,r}, representing prefix, suffix, infix or reduplicator is a closed set. It is not clear to me whether partial reduplication and full reduplication should be specified distinctly here.

mphpdm

A string used to indicate morphological paradigmatic class, drawn from a language-specific list.

mphgen

Morhpological gender is limited to {m,f,n} for masculine, feminine and neuter and in English only is robust in the pronouns.

Phonology

phnfrm

A string representing the phonological form. We'll use the international phonetic alphabet for this when a web-friendly version exists.

phnsyl

The syllable structure is represented here. I leave the representation to phonologists, and adopt a sort of autosegmenal view in my own work. But this is a task for more qualified scholars.

phnssg

The suprasegmental aspects of the phonology are represented here. I leave the representation to phonologists, and adopt a sort of autosegmenal view in my own work. But this is a task for more qualified scholars.

Discourse

dscfoc

This is a scale to handle the Focus property. Focus determines word order in English. We have a variety of mechanisms for marking focus. Some of these are lexical, some grammatical and some lie elsewhere, for example intonation or stress. We will use a scale of {-2...+2} where -2 is something which is an anti-focus (rare if even extant in English) to +2 for an item which is obligatorily in focus, such as WH-words. We use the Focus property to account for the fronting of WH-words, question words, and perhaps even liberated items like `such' in He's such an idiot. In the lexicon, items which have values for Focus are rare and part of a closed set. Focus plays a great role in English grammar, but is more position-based, i.e., grammatical, than lexical.

Examples (italicized element is lexically +foc):

 What are you doing?

 He is such a fool!

 Boy is he stupid.

 Gee, that's brilliant!

dscinf

This is a scale used to indicate the degree of Information Passing, arbitrarily set at {-2...+2}. It is a closed set. Words used to elicit information are marked -2, words used to indicate information is being conveyed are +2. Default is 0. In English, there are many non-lexical means of expressing information passing, but there are lexical items at both ends of the scale, and we might include some items as 0.  

Examples:

 Why is Rush so popular? -2

 Y'know, I think I'll take a bath today. +2

 Suppose Al really does build a clipper chip? 0

dscpsn

This is a scale of integers used for anaphoric reference to person within the pronominal system. Typically it is 1, 2 or 3 for first, second or third person, i.e., speaker, addressee, others. If we are going to integrate inclusivity (`we' includes or excludes addressee) it may be necessary to make this more elaborate. English does not distinguish these, but many other languages do. It is a closed set, including pronouns and agreement markers.

Examples:

 I am going out. 1

 You are going out. 2

dscref

This is a scale used to indicate the degree of anaphora. Not only does it include garden variety pronouns which demand a referent in the Context Register, but also generic property terms. Languages vary with regard to the degree of lexicalization here.+2 indicates an anaphor which must be filled from the context register. +1 indicates a more general reference. 0 is the default. A negative value would indicate information that is obligatorily new, and I haven't found a lexical form in English yet.

Examples:

 They won yesterday. +2

 Teach ordered us to get the homework done on time or else. +1

 I wanted to do so, but the system was password-protected. +2

Someone is coming. 0

How about somebody else? -2 (?)

dscsoc

This scale indicates the social acceptability of a word. I have started with a scale of {-2...+2} but the gradations are probably finer and it is a hierarchy. This may be highly idiolectal.

am not > ain't

 excrement > feces > doody = poop= number two > shit

dscspk

This is a scale which indicates the degree of speaker responsibility, with a closed set marked {-2...+2}.

I hereby sentence you to two years at MIT. +2

 I suppose I will go to bed now. +1

 Should I go to bed now? -1

dsctop

A scale which marks item as topic or comment. Topic markers are +1. Anti-topics (rare) are -1.

Semantics

semcat

The logico-semantic function of a lexeme is represented by a category and a valence (see semval). The notation scheme is not yet standardized, but most of us use operators, formulas (propositions) and arguments. Note that this section was substantially revised in February 1999.

F formula (The boxer bit the ear)

f property (red)

k entity (dog)

Q argument (he)

O operator (probably, not)

x variable, usually bound to context register

semval

Semantic valence is a scale from {-3...0}, where the negative number reflects the number of "missing" elements needed to complete the category. A predicate is F-1, that is, a predicate requires an argument (usually a subject) to form a complete proposition or formula F.  

semtns

Tense is represented on the semantic dimension by a character. The situation is perhaps not as clear as {p, n, f} for past, present (now) and future.  

aspect

Semantic tense is represented by a character. This is language specific, though most languages will encode such aspects as progressive.

semasp

Animacy plays a grammatical role in many languages, and is usually scalar. At present we have no specific implementation ready.

semcas

Semantic case (thematic roles, semantic roles, lexicase etc.) is the relationship born by an item (usually an argument) to other arguments and predicates in a semantic structure. Not all of these relations are grammaticized in any specific language. We use three-letter abbreviations to mark the features. Among some more common examples of cases which are often marked grammatically are:

AGT agent
BEN benefactive
CAU causative
COM comitative
DIR directional
HAS possessive
INS instrumental
LOC locational

POS possessive
PUR purpose
SRC source
THM theme
VOC vocative

The semantic boundaries are sometimes a bit fuzzy. For example, in "Give me the lid for the jar", what is the relationship between ‘lid’ and ‘jar’? I think that the case roles depend to a large degree on protoypes, as argued by Croft, inter alia.

semcnt

A boolean which indicates whether a lexeme is countable (i.e., count vs. mass nouns). Countable is the true (1) value.

semdef

Definiteness is a scale that plays a role in lexical selection and in some grammatical phenomena. We will use the {-2...+2} scale though perhaps finer gradations may be needed when enough languages have been explored.

semdeg

This is a scale used for degree terms, {-2...+2}. Examples of the positive values are very and extremely while hardly lies at the other end.

semgen

Semantic gender is the usual {m,f,n} for masculine, feminine and neuter, though perhaps we will need others (x for Dennis Rodman?)

semneg

A boolean used to indicate semantic negation {0,1}.

semnpi

A boolean used to indicate a Negative Polarity Item {0,1}. Relevant for agreement and tags. Example: deny

semqnt

Semantic quantity is a scale. This is most easily handled as a floating point number, though we are mixing math and linguistics here.