Big5 is a character encoding for traditional Chinese characters. It encodes 13,062 Chinese characters (two characters, however, are double- coded), and is by far the character set most commonly used in the Republic of China ( Taiwan). The name Big5 derives from the fact that this standard was developed jointly by the five largest Taiwanese computer manufacturers.
Before Big5 existed, various mutually incompatible character sets such as the IBM 5550 were used in Taiwan. Big5 should replace these fonts and was introduced in 1984.
After the launch took Big5 widespread and was introduced among others in a modified form in Windows code page 950. Later CNS 11643 was introduced to replace Big5, this project failed. Because of Big5 himself was in 2003 declared the official standard in Taiwan.
Except in Taiwan Big5 is used in Hong Kong and Macao, which also use traditional characters.
For the encoding of Chinese characters byte pairs are used in Big5. The first byte in such a pair is called a lead byte ( lead byte ) and can take values from 0xA1 to 0xC6 or 0xC9 to 0xF9. The second byte is called Trail Byte (next byte) and can assume values 0x40 to 0x7e 0xa1 or to 0xfe. Unofficially, the bytes in which the most significant bit is not set ( 0x00 to 0x7F ) are interpreted as ASCII characters. This have characters in Big5 a variable length of 1 or 2 bytes.
Design and Structure
Big5 is divided into several areas:
- The range of 0x8140 to 0xA0FE is reserved for private use.
- The range of 0xA140 to 0xA3FF coded punctuation marks, the Greek alphabet and symbols.
- The range of 0xA440 to 0xC67E encoded Chinese characters that are sorted first by strokes and then radical.
- The range of 0xC6A1 to 0xC8FE is reserved for private use.
- The range of 0xC940 to 0xF9D5 encodes more Chinese characters, which are also then sorted first by strokes and after radical.
- The range of 0xF9D6 to 0xFEFE is reserved for private use.
Since Big5 lacks many needed characters, both companies and government institutions have developed their own extensions to Big5.
E-Ten has added some characters from the IBM 5550 character set for your operating system:
- The area 0xA3C0 - 0xA3E0 contains control characters.
- The area 0xC6A1 - 0xC875 contains circled and parenthesized numbers, radicals, Japanese Kana and the Cyrillic alphabet.
- The area 0xF9D6 - 0xF9FE contains seven additional Chinese characters and frame drawing.
Microsoft has created the Windows code page 950, which is virtually identical to Big5, but also contains the characters in the range 0xF9D6 - 0xF9FE the E-Ten extensions as well as the Euro symbol.
Hong Kong also uses Big5. Since this character set does not contain many characters needed for the Cantonese, Hong Kong has the Hong Kong Supplementary Character Set developed based on Big5, but contains many additional characters.