Latin script in Unicode

Latin letters, ie characters that are based on the Latin alphabet are included in Unicode in different blocks.

The 26 basic letters are - in addition to digits, punctuation marks and control characters - in the Unicode Basic Latin block, while the other blocks contain extensions to the basic alphabet:

  • Modified letter forms as ð ə ŋ or
  • Ligatures such as æ, œ or Ƕ
  • Borrowed from other writings, but in the Latin orthographies used as additional letter þ or ɛ
  • Diacritical marks, which can be combined with basic letters
  • For compatibility with older code pages a large amount of ready combinations of basic letter and diacritical marks such as ä, ç, C or U
  • Also for reasons of compatibility individual digraphs as ij, nj or ʧ
  • Representation of the Latin letters for CJK fonts ( full-width and half- width)
  • Ornamental and Calligraphic variants such Ⓐ, ⒜, ⒈, ℋ, ℳ, ℕ
  • Built on the Latin alphabet symbols such as $, ℃, ℅, ™

Coded character


Until the code point U 00 FF follows the Unicode Latin-1 character encoding, and thus also ASCII. Thus, the basic letters of the Latin alphabet are combined with other characters in the Unicode Basic Latin block, the next block Latin -1 Supplement contains, among other characters with diacritics letters and some special characters, in particular the German ß. In the next block Latin Extended -A are the other Latin letters from the ISO/IEC-8859-Kodierungen 2, 3, 4 and 9, as well as in ISO 6937 encoded characters. This block also contains the long, see The Unicode block Latin Extended -B contains mainly phonetic and non-European extensions of the Latin alphabet, including most of Africa still missing characters alphabet. Since Unicode 3.0, the Romanian letters Ş and Ţ are encoded in this block. The block Latin, further addition contains additional Latin characters, including the Vietnamese alphabet and the large ß. The Unicode block Latin Extended -C covers the Uighur alphabet and an extension of the Latin alphabet by Claudius. Other historical characters found in Unicode block Latin Extended -D.

The Unicode block Alphabetic Presentation Forms -coded for compatibility with other standards some ligatures Latin letters.

To draw letters with diacritical characters that are not encoded in Unicode, they can be written as a combination of a basic letter with a combining character. These are located in the blocks combining diacritics, combining diacritical marks, supplement and Combining Half Marks.

According to Scripture

According to scriptures such as the International Phonetic Alphabet and the Uralic Phonetic Alphabet use Latin and Greek letters, as well as some specific enhancements. These extensions are usually also in Unicode as Latin letters. These signs can be found in the blocks IPA Extensions, Spacing Modifier Letters, Phonetic Extensions, Phonetic Extensions, supplement and high - and subscript characters.

Full width characters

The Unicode block half-width and full-width forms includes Latin basic letters in a broad form in which they are used together with East Asian fonts in Unicode.


Unicode also encodes a set of symbols that are derived from the Latin alphabet. These lie in the blocks Similar letters symbols, Sealed alphanumeric characters and Mathematical alphanumeric symbols. Especially the latter are intended for use with the other mathematical symbols in Unicode. Also the sign of Roman numerals in Unicode block cipher considered as Latin characters.


  • Julie D. Allen et al.: The Unicode Standard. Version 6.2 - Core Specification. The Unicode Consortium, Mountain View, CA, in 2012. ISBN 978-1-936213-07-8. Chapter 7.1: Latin. (on-line PDF)