Extended Unix Code

Extended UNIX Coding ( abbreviation EUC) is an 8 -bit character encoding that is used primarily for Chinese, Japanese and Korean. EUC is a collective term for various encodings that can encode up to four different character sets depending on the country. Originally developed by the Open Software Foundation ( OSF), Unix International ( UI) and the Unix System Laboratories Pacific ( USLP ) as the default encoding for UNIX systems, this encoding is now used less and less, as they are often of more widespread local encodings (Shift -JIS, Big5, etc.) and / or Unicode ( UTF -8) was replaced.

Similarities

All EUC encodings have some similarities:

  • They support up to four different character sets, called in EUC terminology code sets. Code set 0 is always (7 -bit) ASCII, Code Sets 1-3 are different depending on the subspecies.
  • Code set 0 is always directly encoded by one byte.
  • There are two special characters ( escape character ) that are used to switch to Code Set 2 and Set 3 Code: SS2 ( 0x8e ) and SS3 ( 0x8f ).
  • The non-ASCII range of 0xa0 - 0xff is used for multi- byte characters.

For the code sets 1 to 3, there are several ways of coding (depending on the variant of EUC different). The following codes are possible:

EUC -JP

EUC -JP represents the variant used in Japan

Code set 0 is ASCII (actually JIS- Roman), and to 0x7e is directly encoded by a byte from the range 0x21.

Code Set 1 is JIS X 0208:1997 and is encoded by two characters (variant 2 in the table above )

Code Set 2 are half-width Katakana, which are encoded by two bytes ( Option 1 in the table). The second byte is here, however, only from the area 0xa1 to 0xDF, as it is only 56 Katakana ( and a handful of special characters) and these are then correspond to the 1- byte encoding from JIS X 0201:1997 ( just with the escape character 0x8e as a prefix ).

In Code Set 3 JIS X 0212:1990 is encoded in the three -byte variant.

EUC -KR

EUC -KR is the version of EUC used in Korea. It is similar to ISO -2022 -KR (or KS X 1001).

EUC -CN

EUC -CN is used in China and meets GB2312. It encodes the simplified Chinese characters.

EUC -TW

Actually developed for Taiwan, EUC -TW is very rarely used. Much more common is there Big5. Both encode the traditional Chinese characters.

  • Encoding
  • CJK
  • Character encoding for the Japanese writing
  • Encoding for the Chinese writing
  • Encoding for the Korean script
318505
de