Lemma (morphology)

The lemma (Greek λῆμμα, Lemma, actually " the Taken ", "The Adopted ", plural: lemmas ) is in lexicography and linguistics, the base form of a word, that is the one word form under which a term in a reference book can be found ( citation, basic form ).

Lemmas is the plural form correctly formed. However, the form of the lemma has now spread.

Lemma, lexeme and a citation

The lemma is the entry or the keyword in a dictionary ( dictionary, encyclopedia ). Also referred to both as the base form of a word as well as a citation or basic form of a lexeme. The process of determining the precise lemmas is called Lemmaselektion or lemmatization.

A lexeme - a basic linguistic form - could be addressed in principle in any manner, because it is abstracted as a linguistic unity of different forms, but itself has no particular form that distinguishes it from other forms. Usually lexemes are named after a particular conventional form, then the citation form (also: basic form, keyword ), this lexeme:

  • In German, the citation form for nouns is usually the nominative singular (eg dream ), for verbs in the present infinitive active (eg dreaming ).
  • In Latin, the citation form for verbs is the paradigm (example) that a series of specific modes ( infinitive, indicative, subjunctive ) and tenses ( present, perfect ... ) indicates that is very helpful especially for irregular verbs. This order is in most dictionaries: 1st person singular simple present indicative, 1st person singular perfect active indicative, active supine I or participle passive (PPP ) neuter and finally present infinitive active. For example, the paradigm is to " bring ( it )" in: fero, tuli, latum ferre.

On word -oriented linguistic reference works ( encyclopedias, thesauri, etymological works ) as a lemma all lexemes, while reference books that are more interested in conceptual Lemmaselektierung (property, dictionaries, glossaries, encyclopedias and the like) as a citation - especially in German - prefer the simplest noun: so we group as " the dream ", " dreaming ", " dreaming " and " Dreamt " under a common Lemma dream together, as far as it comes to the same facts. Here is mostly spoken by the lemma as a descriptor.

The choice of the citation on the type of reference work depends, the following example shows:

  • The word "mouse " is classified under the lemma mouse. This procedure selects a normal dictionary as "mouse" the basic form of the plural " mice " is.
  • In biology, the word " mouse " under the lemma mice is classified. In a biological textbook the genre of mice serves as an umbrella term. The taxonomic citation mice expresses that there are many different types of mice, and not simply " the mouse ". The perspective of biology differs from the colloquial language, everything that looks like a mouse, called the "mouse".
  • For computer mice in a textbook mouse the lemma; in a general purpose dictionary can be the entry such as the mouse ( computer). Although computer mice can look different and differ in details, but the similarities are in the classification in the dictionary perceived as more important than the differences. Therefore, the lemma - unlike in biology - out in the singular.

Lemmatization

The lexicographical reduction of inflectional forms of a word to a basic form, ie the determination of the basic form of a lexeme and the arrangement of the lemmas is also called lemmatization. A subset of directly successive lemmas forms a Lemmastrecke.

Under lemmatization also refers to the provision (or return ) of a full- form to the corresponding lemma. This process is depending on the application of speech technology is important. With the use of statistical models about the lemmatization of a very small corpus ' is sometimes used to increase the frequency of individual lexemes and thereby reduce the statistical noise. The full forms of the body are replaced before statistical analysis by their lemma. Was there previously, for example, the word forms " hit ", " meet ", " true " and "meet" each once in the corpus, there is, in the lemmatization "meet" only the lemma - but with a frequency of four. The lexeme "meet" thus has a potentially much higher weight in the corpus, as it had the single full forms before lemmatization.

Lemmaselektion

Before lemmatization a Lemmaselektion is performed, it is decided at the which species to be included in the lexicon of lemmas. The Lemmaselektion is necessary because a complete lemmatization of all words, parts of words and phrases of a language is tedious. A criterion for inclusion of a lemma in a lexicon is the period in which the term exists in the respective language.

Closely related is the Lemmaselektierung with the indexing of relied on texts - which is unnecessary for total language works because the full vocabulary is to be developed, at times and other group linguistic lexicons but is quite relevant, and with the question of synonymy, homonymy and the polysemic.

506137
de