Numerals in Unicode

In addition to letters and other symbols, Unicode also encodes a series of digits for different number fonts. In addition to various forms of decimal digits and Chinese number characters and historical figures such as Roman numerals are encoded. Further, there are also fractures and various derived from numbers symbols.


To work with numeral represents the Unicode standard two properties: Numeric_Type specifies what it is for a type of number characters. The decimal value denotes a character here as decimal number, so that programs can easily determine the numeric value of a sequence of such number of characters. For different number characters more complex conversions may be necessary, such as in Roman numerals. The numeric value of a character can be read at the NUMERIC_VALUE property. The number coded characters include a range of values ​​from -1 ( 𒑖 and 𒑗, U 12456 and U 12457, cuneiform ) to 1,000,000,000,000 (兆, U 5146, Chinese).

Not be considered as a number sign characters that are only sometimes used to represent numbers. Thus, in a bulleted list, the letters used ( a) ... b) ... c ) ... ) the letters the values ​​1 to 3, but since this is not the main use, they are treated as Unicode characters, not as numbers.

Coded character

Decimal digits

The Indian decimal digits are used in many different fonts in different forms. Therefore, Unicode encodes the digits for the individual writing systems each separately. When doing the digit shapes are called " European", which was originally developed in Europe, but today are in use worldwide. Then there are the numbers in Arabic and various Indic scripts. N'Ko out of the ordinary, since numbers are written from right to left.

There are other blocks that contain digits derived from the European symbols, such as circled numbers.

Letter -based cipher

Many number systems use the usual letter of Scripture to represent numbers. Such letters are not considered in Unicode as the number of characters, even they are not double encoded in most cases. But there are also some number systems, though their numbers build up characters to the letter, but differ from them. So the Unicode block Ancient Greek numeral contains a number of ancient Greek acrophonic numeral for the number of Greek letters.

A special case are the Roman numerals. Here are the numbers from 1 to 12, and 50 (L ), 100 (C) 500 ( D) and 1000 ( M) in the Unicode character block number along with the characters for 5000 and 10,000 are specially coded. These are mainly intended for use with the characters from East Asian fonts in Unicode as they are shown not rotated as normal letters by 90 ° in the column layout. In other cases, however, Roman numerals from the ordinary Latin letters should be zusammenzusetzt.

Chinese number characters

The signs of the Chinese number font are encoded together with the other CJK characters in the Unicode block Unified CJK ideographs. Also encoded are also circled shapes as for the European decimal digits. The older staff numbers have a separate block with the Unicode block Zählstabziffern.

Other numerals

More number signs are usually coded together with the letters of a writing in the same block. Two blocks that are dedicated specifically numerals, the two blocks Aegean numerals and cuneiform numerals and punctuation.

In addition to characters for integers Unicode also includes a number of fractures of different number fonts. For European figures these are mainly in the Unicode block number sign. North Indian quarries are in Unicode block General Indic digits, ancient Greek with the other ancient Greek numerals. Again, there are a number of other numerals which are the letters of a writing together in one block.


  • Julie D. Allen et al.: The Unicode Standard. Version 6.2 - Core Specification. The Unicode Consortium, Mountain View, CA, in 2012. ISBN 978-1-936213-07-8. Chapter 15.3: Numerals. (on-line PDF)