|
|
|
Occasional Papers in Autolexical Grammar #2 © by Eric Schiller. All Rights Reserved. This document may be freely distributed, if unaltered and complete, in HTML format. The preferred reference for this document is: Schiller, Eric (1997). The Lexicon in Autolexical Grammar. Published on the Internet by Linguistics Unlimited, Moss Beach CA. The Lexicon in Autolexical GrammarThis document describes the formal organization of the lexicon. The descriptions of each item are brief and not intended to act as a full explanation of each item, though it may serve as a handy reference. Version: 0.1 (1-Jul-97) Initial document. Very much subject to change! Version 0.8 (10-Feb-99) Substantial revisions to semantic component Contents
Formal description of the LexiconA lexeme may be specified for a large number of attributers. For the purposes to exposition most properties are associated with particular dimensions, which form the first three letters of the variable name: dsc = discourse lex = lexicon mph = morphophonology msx = morphosyntax phn = phonology sem = semantics syn = syntax No particular significance should be attached to this organization. It is simply easier to discuss the model in this way. The remainder of the variable name, either three or four letters, describes the variable itself. It is certainly possible top standardize on either a three-letter or four-letter tag, but the meaning of the variable seems clearer with the tags chosen below, and these are the ones used in the ALGAE project. In the February 1999 revision I standardized all attributes to 6 characters, for ease in building computational tools. LexiconThe Lexicon is the collection of lexical items. Each lexical item is specified for the categories and features applicable at each dimension. Items may be underspecified, or even unspecified, for attributes, receiving values for them by default from the prototype for categories at other levels. The following are attributes which are directly specified in the lexicon. Some of these are external attributes which are not inherently part of the languge, for example the record number in our database. lexemeThe citation form of the lexical item. lexidxThis is an integer used as the primary key to the lexemes in the database lexglsFor the purposes of translation or paraphrase, a string is used. In most cases this will be the same as the citation form. lexprbThis isthe lexical probability used by stochastic parsers and indicates, without taking into account context, how likely this entry applies when ambiguity is present lexfxdThis is a scale indicating how flexible or inflexible a phrasal lexeme or idiom is. +2 indicates that the phrase must be exactly as entered, for example so to speak. +1 indicates that some flexibility exists, for example The cat has been let out of the bag. 0 allows substitution rather freely, but in accordance with a template. One can include the Miss America idiom here: I believe in equality for everyone, be they black, white, yellow red or green. Negative numbers can be used when even the template is flexible. lexsubThis is a string which is used when a lexical item subcategorizes for (demands) a complement in a certain case or a specific lexeme. I have not tied this to any particular dimension for the simple reason that I am not sure where it belongs, or whether case subcategorization and lexical complements even belong together. Example: I walked toward him. toward [DIR] I ducked so that the flying saucer would not hit me. so [(that) S] The second example brings up an interesting question as to whether so subcategorizes for a semantic proposition or syntactic sentence, given that the two are usually associated. I choose syntax here for no particular reason except that syntax is generally more concrete and specific than semantics. SyntaxsynhdfThis specifies the syntactic head feature of the lexeme, if any. The choices are {V,N,P,A,M,0}. Note that we do not decompose heads into + or - V and N, as the four-way distinction N,V,A,P does not include headless items (adverbs) or items with no syntax at all. Examples: N mouse V eat A smelly P in M probably nil Oh! synbarAn integer indicating the Syntactic bar-level {-1...2}. I am not sure that -1 is needed, but if there is an item which demands a complement to make a zero-level category, it might be useful. Examples with various heads (phrasal and lexical): N2 the tall ape; he N1 tall ape; one N0 ape V2 The tall ape loved the actress; V1 loved the actress; do so V0 loved P1 in the autumn; then P in A1 fond of the actress; ? A0 big There are two gaps, A2 and P2. There are some reasons to consider complement constructions as P2, but this remains an open question. synmatThis is used when a lexical item demands a particular syntactic category as a complement. Examples: He is such a nice guy. In this case, such demands an N2 complement. The syntactic specification is: M [N2>>N2]. The formula in square brackets has the synmate on the left and the synresult on the right, separated by >>. The synmate is N2 here, as is the synresult (see next). synresThis is a string which represents the syntactic result of concatenation with complement. See synmate for details. MorphosyntaxmsxfrmThe type of morpheme is represented by a character {a,r,s,i} representing affix, root, stem and inert form. msxhdfA character is used to represent the morphosyntactic head. This is a language specific list. For some isolating languages this is always nil. Often the list reflects the syntactic heads of a language. The morphosyntactic head determines the applicability of affixes. Examples: a slow v walk n dog msxnumLexemes can be morphologically specified for number (singular, plural, dual) independently of their semantic number. Most of the time morphological and semantic number will agree, but not always, for example, media and data are morphologically plural in some dialects of English and singular in others. Group is morphologically singular but triggers plural (semantically based) agreement for many speakers. MorphophonologymphafxThe type of affix: {p,s,i,r}, representing prefix, suffix, infix or reduplicator is a closed set. It is not clear to me whether partial reduplication and full reduplication should be specified distinctly here. mphpdmA string used to indicate morphological paradigmatic class, drawn from a language-specific list. mphgenMorhpological gender is limited to {m,f,n} for masculine, feminine and neuter and in English only is robust in the pronouns. PhonologyphnfrmA string representing the phonological form. We'll use the international phonetic alphabet for this when a web-friendly version exists. phnsylThe syllable structure is represented here. I leave the representation to phonologists, and adopt a sort of autosegmenal view in my own work. But this is a task for more qualified scholars. phnssgThe suprasegmental aspects of the phonology are represented here. I leave the representation to phonologists, and adopt a sort of autosegmenal view in my own work. But this is a task for more qualified scholars. DiscoursedscfocThis is a scale to handle the Focus property. Focus determines word order in English. We have a variety of mechanisms for marking focus. Some of these are lexical, some grammatical and some lie elsewhere, for example intonation or stress. We will use a scale of {-2...+2} where -2 is something which is an anti-focus (rare if even extant in English) to +2 for an item which is obligatorily in focus, such as WH-words. We use the Focus property to account for the fronting of WH-words, question words, and perhaps even liberated items like `such' in He's such an idiot. In the lexicon, items which have values for Focus are rare and part of a closed set. Focus plays a great role in English grammar, but is more position-based, i.e., grammatical, than lexical. Examples (italicized element is lexically +foc): What are you doing? He is such a fool! Boy is he stupid. Gee, that's brilliant! dscinfThis is a scale used to indicate the degree of Information Passing, arbitrarily set at {-2...+2}. It is a closed set. Words used to elicit information are marked -2, words used to indicate information is being conveyed are +2. Default is 0. In English, there are many non-lexical means of expressing information passing, but there are lexical items at both ends of the scale, and we might include some items as 0. Examples: Why is Rush so popular? -2 Y'know, I think I'll take a bath today. +2 Suppose Al really does build a clipper chip? 0 dscpsnThis is a scale of integers used for anaphoric reference to person within the pronominal system. Typically it is 1, 2 or 3 for first, second or third person, i.e., speaker, addressee, others. If we are going to integrate inclusivity (`we' includes or excludes addressee) it may be necessary to make this more elaborate. English does not distinguish these, but many other languages do. It is a closed set, including pronouns and agreement markers. Examples: I am going out. 1 You are going out. 2 dscrefThis is a scale used to indicate the degree of anaphora. Not only does it include garden variety pronouns which demand a referent in the Context Register, but also generic property terms. Languages vary with regard to the degree of lexicalization here.+2 indicates an anaphor which must be filled from the context register. +1 indicates a more general reference. 0 is the default. A negative value would indicate information that is obligatorily new, and I haven't found a lexical form in English yet. Examples: They won yesterday. +2 Teach ordered us to get the homework done on time or else. +1 I wanted to do so, but the system was password-protected. +2 Someone is coming. 0 How about somebody else? -2 (?) dscsocThis scale indicates the social acceptability of a word. I have started with a scale of {-2...+2} but the gradations are probably finer and it is a hierarchy. This may be highly idiolectal. am not > ain't excrement > feces > doody = poop= number two > shit dscspkThis is a scale which indicates the degree of speaker responsibility, with a closed set marked {-2...+2}. I hereby sentence you to two years at MIT. +2 I suppose I will go to bed now. +1 Should I go to bed now? -1 dsctopA scale which marks item as topic or comment. Topic markers are +1. Anti-topics (rare) are -1. SemanticssemcatThe logico-semantic function of a lexeme is represented by a category and a valence (see semval). The notation scheme is not yet standardized, but most of us use operators, formulas (propositions) and arguments. Note that this section was substantially revised in February 1999. F formula (The boxer bit the ear) f property (red) k entity (dog) Q argument (he) O operator (probably, not) x variable, usually bound to context register semvalSemantic valence is a scale from {-3...0}, where the negative number reflects the number of "missing" elements needed to complete the category. A predicate is F-1, that is, a predicate requires an argument (usually a subject) to form a complete proposition or formula F. semtnsTense is represented on the semantic dimension by a character. The situation is perhaps not as clear as {p, n, f} for past, present (now) and future. aspectSemantic tense is represented by a character. This is language specific, though most languages will encode such aspects as progressive. semaspAnimacy plays a grammatical role in many languages, and is usually scalar. At present we have no specific implementation ready. semcasSemantic case (thematic roles, semantic roles, lexicase etc.) is the relationship born by an item (usually an argument) to other arguments and predicates in a semantic structure. Not all of these relations are grammaticized in any specific language. We use three-letter abbreviations to mark the features. Among some more common examples of cases which are often marked grammatically are: AGT agent POS possessive The semantic boundaries are sometimes a bit fuzzy. For example, in "Give me the lid for the jar", what is the relationship between lid and jar? I think that the case roles depend to a large degree on protoypes, as argued by Croft, inter alia. semcntA boolean which indicates whether a lexeme is countable (i.e., count vs. mass nouns). Countable is the true (1) value. semdefDefiniteness is a scale that plays a role in lexical selection and in some grammatical phenomena. We will use the {-2...+2} scale though perhaps finer gradations may be needed when enough languages have been explored. semdegThis is a scale used for degree terms, {-2...+2}. Examples of the positive values are very and extremely while hardly lies at the other end. semgenSemantic gender is the usual {m,f,n} for masculine, feminine and neuter, though perhaps we will need others (x for Dennis Rodman?) semnegA boolean used to indicate semantic negation {0,1}. semnpiA boolean used to indicate a Negative Polarity Item {0,1}. Relevant for agreement and tags. Example: deny semqntSemantic quantity is a scale. This is most easily handled as a floating point number, though we are mixing math and linguistics here. |