ISO 639

ISO 639 is an international standard by the International Organization for Standardization, which defines codes for the names of languages ​​( language codes ). The standard consists of six sub- standards. Five of these include identifiers with two letter (ISO 639-1 ), three letter (ISO 639-2, ISO 639-3 and ISO 639-5 ) and four letters (ISO 639-6 ), part provides guidelines for the application ( ISO 639-4 ).

  • 2.3.1 Macro languages

Application

The identifiers defined in the standard are used inter alia in lexicography, linguistics, in libraries, information services and data exchange. They are used to clear specification of languages ​​and their labeling in documents. They have not been introduced as abbreviations, since, inter alia, a similarity with the designated language is not present in any case.

A use may be any uppercase and lowercase letters, but there are as related standards that specify a certain case.

The language code of this standard include natural languages ​​and constructed languages ​​, but no languages ​​that have been created for machine processing, such as programming languages.

Part standards

The officially introduced part standards are:

  • ISO 639-1:2002 - Codes for the representation of names of languages ​​- Part 1: Alpha-2 code
  • ISO 639-2:1998 - Codes for the representation of names of languages ​​- Part 2: Alpha -3 code
  • ISO 639-3:2007 - Codes for the representation of names of languages ​​- Part 3: Alpha -3 code for comprehensive coverage of languages
  • ISO 639-4:2010 - Codes for the representation of names of languages ​​- Part 4: Implementation guidelines and general principles for language coding
  • ISO 639-5:2008 - Codes for the representation of names of languages ​​- Part 5: Alpha -3 code for language families and groups
  • ISO 639-6:2009 - Codes for the representation of names of languages ​​- Part 6: Alpha -4 representation for comprehensive coverage of language variation

ISO 639-1

Part 1 of the Standard has been prepared for use in terminology, lexicography and linguistics. Until its official adoption in 2002 he was made ​​under the name ISO 639. Precursors are the Requests for Comments ( RFCs) RFC 1766 ( March 1995) and RFC 3066 ( January 2001). ISO 639-1 shall not only cover the spread in the literature most languages, but also the most "developed" languages ​​with a " specialized" vocabulary record. Not only individual languages ​​, but also language families are included. Each language is a two-letter code represents ( Alpha -2 code). For example, de stands for the German language or fr for French. Overall, different identifiers are possible by use of the 26 Latin letters, of which 185 are occupied (as of January 2007). Is managed by the standard, which was founded by the UNESCO International Information Centre for Terminology ( Infoterm )

The inclusion of additional language code is provided, but only for identifiers that are simultaneously added to the standard ISO 639-2. For existing entries in the ISO 639-2 two-letter identifiers no longer be granted. This is to ensure compatibility.

ISO 639-2

The subsequent norm ISO 639-2 extends the ISO 639-1 by a larger amount of languages. Each language code defined in ISO 639-1 can be found with a three-letter code in this standard again (Alpha-3 code).

The second standard of the ISO 639 the identifier has been expanded to three points, so that language codes are theoretically possible. So far, more than 480 (as of January 2007) are recorded identifiers for individual languages ​​and language families. Objective of the standard is the use in " terminology and bibliography ", to meet the needs of the library system, among other things, and to allow the widest possible award of works in the world. Were recorded languages ​​, was issued for the one perceived as a suitable amount of literature. Since the focus is on the written language, has been waived distinction of languages ​​that although have great matches in the written form, but differ in their spoken form. There are, for example, no distinction for the Chinese languages ​​such as Mandarin Chinese and Cantonese.

The U.S. Library of Congress takes over the care of this part standard.

The standard ISO 639-2 ISO 639-1 expands and performs all the local language code. The two-letter identifiers to be continued in this standard with three letters, which largely for the respective designator taken only another letter and a similarity is thus guaranteed (see below for the special case of the identifiers ISO 639-2 / B). The basis for the language code of this standard was the MARC Code List for Languages ​​, which has been used since 1968 and also managed by the Library of Congress.

Under the accrued identifiers are historical languages ​​such as German, Middle High ( gmh for German, Middle High) or Old High German ( goh for German, Old High).

Collective language codes

A special feature are collective language code (English collective language codes) that are not provided in ISO 639-1. They allow identification of sets of languages ​​for which a mapping of identifiers to each language is not provided. This can be done for small languages ​​for which only a small number of literary works is available or for which no significant increase which is assumed. You take one hand language families together as the Iroquois languages ​​under the symbol iro or offer a collective name for all other single languages ​​of a family in which individual related languages ​​have their own entry. This is the case with the family of the Sami languages ​​( ID smi for others), in which the associated nordsamische language already has its own identifier ( sme ). In the table of language codes is for the former groups, usually the identifier languages ​​( German " languages ​​" ), for the latter, the identifier (other) ( German " other" ) appended to the name to distinguish collective language codes. If a language code for a single language available, this should be brought forward and that no assignment of a collective code. This can also affect language codes that are newly included in the standard.

A description of the assignment of individual languages ​​(without their own entry ) to one of the offered by collective ISO 639-2 language code is not found in the standard. The Library of Congress, however, refers to the above-mentioned list of MARC Code List for Languages ​​, which can fulfill this function.

Terminology and bibliographic language codes (T / B)

Another difference from ISO 639-1 and the other part of standards is the use of terminology (English terminology code) and bibliographic identifiers (English bibliographic code), referred to with ISO 639-2 / T and ISO 639-2 / B. This distinction is made for 22 entries and stirred largely from the fact that prior to the use of standard conventions already existed in librarianship for three-letter identifiers 639-1 differed greatly from the appointment of the already established standard ISO two letter. The German language is one of these cases, their B - Code is eng, deu the T code.

As in naming a continuation of the ISO 639-1 was sought, it has been decided in the cases deviating identifier introduce two codes. So the terminology identification leads the designation according to ISO 639-1 further, while the bibliographic identifier is performed for reasons of compatibility and the previous, extensive naming reflected. The Standard does not permit the mixture of T and B codes and called for a definition of the type used prior to data exchange by the parties at.

Changes

Adding and Changing Language Codes and changing their description is possible, and assuring stability in the described standard. Language codes ISO 639-2 / B, which are now to ensure compatibility are excluded from changes however. A discontinued after changing code to be reused before five years.

ISO 639-3

The ISO 639-3 was issued on February 5, 2007 and is based on allowing the first two sub- standards a comprehensive coverage of all languages ​​of the world. The identifiers of three letters from the previous standard ISO 639-2 will be continued and thus can also ISO 639-3 theoretically over 17,576 different identifiers have ( practically limited, inter alia, that ISO 639-5 also receives 3-alpha codes disjoint from those of ISO 639-3 are ). Be taken all known languages ​​, among which there are also all living, extinct, historical as well as constructed languages. More than 6,900 languages ​​have been added to the standard. Thought is the complete list primarily for use in information technology, where a complete listing of all languages ​​is desirable. This also includes items such as the Swiss German dialects ( gsw, German SWiss ) Kölsch (ksh ) and the Bairischen dialects ( bar).

Maintained it is from the organization SIL International, which already living with the Ethnologue Languages ​​recorded (with exceptions) and language codes. In the 15th edition of the Ethnologue the bisherig awarded by SIL codes were matched to those of ISO 639-2 to allow conformity. Other historical and artificial languages ​​originate from Linguist List.

Up to bibliographic identifiers ( ISO 639-2 / B) to find all identifiers for individual languages ​​ISO 639-2 again in this standard. Collective Voice tags are not guided. The codes with three letters are all over the standard clearly held, so that the identifiers of bibliographic and collective identities in ISO 639-3 can not be reassigned.

Macro languages

An extension is the use of so-called macro languages ​​(English macrolanguage, as an umbrella language, not to be confused with macro families). Several individual languages ​​are subsumed in a record, such as the Chinese languages ​​in the entry zho, which contains, among other things, the individual languages ​​Mandarin Chinese, Hakka, Min Nan and Wu. Formally, the macro more than 50 languages ​​in the standards ISO 639-1 (if recorded ) and -2 made ​​on an individual languages.

In contrast to languages ​​that are represented on collective language codes set out to summarize macro languages ​​individual languages ​​, if the consideration of these languages ​​appears as a single necessary in certain respects. These are the registrar examples:

  • There exists a single highly developed language used by speakers of related languages ​​, under the influence of a common identity ( Arabic language ),
  • There exists a common written form (Chinese languages ​​with Chinese writing ) or
  • Different groups evolve separately, so that a unique identifier is required, a common identity but still exists ( Croatian Language, Serbian Language, Bosnian Language ).

Macro languages ​​can bring together a concept the different approaches of the partial norms -2 and -3. A single entry from ISO 639-2, which subsumes several entries from ISO 639-3 is so inserted into the structure of the third part of the standard. Each macro language code has an equivalent in ISO 639-2 with the exception of the Serbo-Croatian language (as of August 2007), which originally featured a now obsolete entry in ISO 639-1.

Some individual languages ​​, which are summarized in macro languages ​​, also have their own entries in the standards ISO 639-1 or -2. Thus acts the Norwegian language with the code nor as a macro language, which included languages ​​Bokmål ( nb, nob ) but and Nynorsk ( nn nno ) also have corresponding entries in the other standards.

In the summary in macro languages ​​may occur naming conflicts as in the Malay language. While the code the single language called mly, msa stands for the entry of Malay as a macro language. To avoid errors, the names of these entries will receive a qualifying addition in the list of identifiers.

ISO 639-4

A statement on the use of the standards from ISO 639 can be found in ISO 639-4. This standard does not define any language codes. It was published in July 2010.

ISO 639-5

An extension of the collective identifiers from ISO 639-2 ISO 639-5 provides, which was issued on 15 May 2008. The existing identifiers from ISO 639-2 were taken. This part of the standard does not share language codes ISO 639-3, the quantities of the guided identifiers are disjoint.

This subclause provides a hierarchy of language families and allows a structuring of the code from the partial norms 1-3. This allows a different gradation in the generalization to award of voice data.

ISO 639-6

Published on 17 November 2009 Standard ISO 639-6 defines four-letter code (alpha -4) and provides an extension to the language code from parts 1-3.

Integration and relationships of the individual standards

The language codes defined in the various standards play together and allow an award with different granularity. This integration will only be completed with the publication of ISO 639-4 and ISO 639-6.

The standards of the ISO 639 series to each other in different relationship. ISO 639-3 defines the set of all individual languages ​​( supplemented by macro code ), while Part 5 defines a hierarchy of language families. These well-defined quantities can be found in part in the two older part standards -1 and -2 and their elements are there unstructured provided side by side. ISO 639-1 represents a subset of part 2, since there exist stronger criteria for inclusion as a two -letter code.

Management

The management of ID lists take over selected registries ( Registration Authorities ) whose mission is existing entries in the adoption and verification of requests to add new names, and changes. The registration authority for ISO 693-1 is Infoterm, ISO 639-2 for the Library of Congress and ISO 639-3 is administered by SIL International.

The naming of identities is to follow as possible to the vernacular name of the coded language. Exceptions are made under circumstances when countries in which the affected language is spoken, want a different designation.

Special identifiers

The two standards ISO 639-2 and ISO 639-3 have special identifiers in order to enable a flexible approach to the identification of texts.

The identifiers of qaa to qtz (including the alphabetical lying between identifiers ) are registered for local use and will not be assigned by the registrar.

For an identification document without linguistic content of the identifier has been introduced zxx later. It can be used for the designation of documents that do not contain text, such as printed music or photos.

With mul ( of English multiple languages ​​for " several languages ​​"), which is intended for the award of multiple languages ​​, when a marking is not applied by all individual identifiers, as well as and (of undetermined English for " unknown " ) for an unidentifiable language there are two special identifiers.

Name of the language according to RFC 4646

A combination of the language codes of ISO 639 standard with other standards for the identification of languages ​​and scripts is given by the Request for Comments 4646 (RFC 4646 ). There is the interplay of language codes (ISO 639), geographical codes (ISO 3166-1 ) and script codes (ISO 15924 ) described.

The standard ISO 3166-1 identifies geographic entities and may be used for the description of languages ​​and dialects of a particular region. How ISO 639-1 also used ISO 3166-1 two -letter abbreviation. There will be encouraged to present geographic codes in uppercase. Language and region codes overlap, so called de ISO 639-1 the German language and EN ISO 3166-1 country Germany, for the French language and the territory of the State FR analogous France. However, it is the same code in different standards also highlight different concepts, such as BE for Belgium and be for the Belarusian language (" Belarusian " ), EU for the European Union and eu other hand, for the Basque language ( " Euskara "). These overlaps play in practice not matter, as more of the language code in the first place, before the hyphen, is.

With ISO 15924 writing systems can be identified. Typically, they are presented with a four -letter code, the first letter is usually great. So Cyrl available for writing by the Cyrillic alphabet and Latn for writing in the Latin alphabet.

An example of a code according to RFC 4646 is fr- Latn -CA for French to the Latin alphabet as it is written in Canada.

RFC 4646 requires that no distinction is made between uppercase and lowercase. Thus, for example fr- Latn -CA same - latn fr -ca.

Example of the language identifiers according to ISO 639

This table shows ( sorted by language codes ) the different language entries and makes connections between the partial norms of ISO 639 dar. To be lively, historical and artificial languages ​​listed. Some codes do not exist in the other standards, or they exist in a different form.

Other precursors and related standards

  • In the German speaking countries, adopted in 1986, DIN 2335 has been used previously.
  • ISO 15924 (script codes) for identification of writing systems
  • The Library of Congress also leads the MARC Code List for Languages ​​.
  • The National Information Standards Organization leads with ANSI / NISO Z39.53 ( Codes for the Representation of Languages ​​for Information Interchange ) a standard for language codes, which is also managed by the Library of Congress.
419203
de