Thesaurus

A thesaurus ( ancient Greek θησαυρός thesaurus, treasure, treasure house ', then Latin thesaurus, hence safe) or word network is in contrast to a dictionary, which concepts are linked by relations. The term is sometimes used also for linguistic thesauri or scientific vocabulary collections.

Generally

As a thesaurus, we describe a model that attempts to describe a topic accurately and to represent. It consists of a systematically arranged collection of terms that are thematically related. The thesaurus is a controlled vocabulary, also called attribute value range for each attribute to be described. There are primarily synonyms, but also manages the upper and lower terms. Often, however, no antonyms are listed ( opposite terms ).

Example: Portrait ( synonym: image, image, mirror image ), carpenters ( preamble: Craftsmen )

History

In general sense of the word, it referred to obtain a " store of knowledge " such as a dictionary or an encyclopedia. 1572 appeared the five -volume " Thesaurus Linguae Graecae " by Henricus Stephanus ( Henri Estienne ), the most comprehensive dictionary in his time, also mentioned in the diaries of Samuel Pepys ( December 1661 ). The influential, especially in the English-speaking world, in 1852 by Peter Mark Roget published Roget 's Thesaurus of English Words and Phrases adjusted the meaning of the term in the direction of a linguistic thesaurus. In the field of information retrieval, the term was first used in 1957 by Hans Peter Luhn, as in the 1950s, various systems have been developed for indexing. Among the first thesauri that have been used in practice to develop, include the system of Du Punt (1959 ) and the Thesaurus of Descriptors ASTIA ( 1960). A uniform format for thesauri in 1967 presented to the Thesaurus of Engineering and Scientific Terms ( TEST). From the developed right from the start rules for the construction of thesauri developed with time general standards that define the shape of the classic thesaurus for documentation. These include, designed by Derek Austin and Dale UNESCO 's Guidelines for the Establishment and Development of Monolingual Thesauri, whose contents were incorporated into the ISO standard 2788 (1986).

Thesaurus for documentation

In the documentation science, the thesaurus has proven to be a suitable tool for indexing and retrieval of documents. In this case, relations are between terms to find in indexing (assigning of key words ) and in the research. Unlike a linguistic Thesaurus A thesaurus is a controlled vocabulary contains documentation, that is unique terms ( descriptors ) for each term. Different spellings (photo / photo), synonyms or treated as equivalent quasi- synonyms, abbreviations, translations, etc. are set by equivalence relations to each other. Terms are also cross-linked by association relations and hierarchical relations.

The thesaurus is used as documentation language for indexing, storing and retrieving documents. The relationships make it possible to find suitable names for search terms for indexing and retrieval. When searching thesauri can be helpful by automatic query expansion to synonyms and narrower terms.

A thesaurus can thus also commonly used for disambiguation and assumes the role of an authority file at best. In contrast to a mono-hierarchical table or database the Thesaurus can be a polyhierarchical structure possess (ie a sub- term may have several generic terms ).

The thesaurus standards DIN 1463-1 or the international equivalent ISO 2788 will see the following relations types and appropriate abbreviations in front:

The most common relations in a thesaurus are equivalence, association and hierarchical relations.

In general, an element of an equivalence relation, so a designation set as preferred term. The non- preferred terms obtain a reference to the functionally equivalent preferred term.

See also: semantic network

Thesaurus as a collective work

Various forms of thesauri

It used to be understood by a thesaurus, a scientific collective work with the entire vocabulary of a language. Are known among other things the Thesaurus Linguae Graecae and the Thesaurus Linguae Latinae. In these works there are, strictly speaking, to dictionaries.

The first in the electronic text processing (EDP) thesauri used were also simple dictionaries, which entered Syndicate vocabulary with its entries and give the user feedback could. The feedback could be initially only for the detection of simple spelling errors and could with scans, but later determined in the background, which is the current standard. Originally, the databases required to manually converted into data format word collections that were first updated frequently for commercial programs by the manufacturer and delivered with updates to the customer incurred. With the advent of individually supplementable by the user word entries, the possibility to use large, quasi- collaborative user-based platforms for the collection of new entries created, the database located on a server briefly grew strongly by returning the individual working copies of thesauri of different users. Here, too, but a Handsichtung was necessary to prevent the entry often misspelled and therefore often mistakenly turned consummate incorrect vocabulary. Due to the limited vocabulary of any language almost complete records are today but for most languages ​​that reflect the language exhaustively. The entry of new words today only corresponds to the natural growth of the respective languages.

At the same time the electronic thesauri were developed into ever more complex programs, which can also control grammatical rules and style rules and offer synonyms. At their boundary regions enter modern thesauri today also translation tools and texts can be automatically viewings, where the user can select multiple options previously.

Intercultural thesauruses

A special form of thesauri served Accessibility for icon fonts such as Chinese writing using a western computer keyboard. These characters can be because of their variety often do not reflect on practical arrangement keyboards, so the thesauri propose to the user character, which can then be accepted or rejected by him. So there is for entering Japanese or Chinese characters numerous methods that transform syllables or abbreviations after thesauriden database entries in characters. Of these methods, however, has so far not enforce standardized, because the Asian written languages ​​are very complex and the meaning of the characters is often context-dependent. The learning curve to use these thesauri -based program solutions is extremely high for Asians, and native speakers usually use only one software solution that allows them to achieve acceptably high write speeds, which, however, behind the Latin alphabet remains far. Latin writers write much faster than Asian, although the reading speed with pictogram fonts for expert readers is higher than in Latin typography. A unified thesaurus for icon fonts preclude traditional, conceptual and syntactic problems.

Linguistic thesauruses

In a linguistic thesaurus of similar and related meaning are linked by references instead of words words. This type of lexical- semantically organized reference book can be used among other things as a formulation aid. There are reference books of its kind in print or in electronic form, here mostly as a background resource of word processing programs.

Examples

  • European Thesaurus on International Relations and Area Studies
  • EUROVOC Thesaurus of the European Union
  • Getty Thesaurus of Geographic Names
  • INFODATA Thesaurus
  • Medical Subject Headings ( MeSH)
  • Open Thesaurus - Project to create a German linguistic Thesaurus
  • Thesaurus Linguae Latinae - Dictionary of the entire Latin language from its beginnings to about 600 AD
  • Thesaurus Linguae Graecae - Project for the digital recording of the entire Greek literature from antiquity to modern times
  • UNESCO Thesaurus
514247
de