Extended Unix Code

Extended UNIX Coding ( abbreviation EUC) is an 8 -bit character encoding that is used primarily for Chinese, Japanese and Korean. EUC is a collective term for various encodings that can encode up to four different character sets depending on the country. Originally developed by the Open Software Foundation ( OSF), Unix International ( UI) and the Unix System Laboratories Pacific ( USLP ) as the default encoding for UNIX systems, this encoding is now used less and less, as they are often of more widespread local encodings (Shift -JIS, Big5, etc.) and / or Unicode ( UTF -8) was replaced.

Similarities

All EUC encodings have some similarities:

They support up to four different character sets, called in EUC terminology code sets. Code set 0 is always (7 -bit) ASCII, Code Sets 1-3 are different depending on the subspecies.
Code set 0 is always directly encoded by one byte.
There are two special characters ( escape character ) that are used to switch to Code Set 2 and Set 3 Code: SS2 ( 0x8e ) and SS3 ( 0x8f ).
The non-ASCII range of 0xa0 - 0xff is used for multi- byte characters.

For the code sets 1 to 3, there are several ways of coding (depending on the variant of EUC different). The following codes are possible:

EUC -JP

EUC -JP represents the variant used in Japan

Code set 0 is ASCII (actually JIS- Roman), and to 0x7e is directly encoded by a byte from the range 0x21.

Code Set 1 is JIS X 0208:1997 and is encoded by two characters (variant 2 in the table above )

Code Set 2 are half-width Katakana, which are encoded by two bytes ( Option 1 in the table). The second byte is here, however, only from the area 0xa1 to 0xDF, as it is only 56 Katakana ( and a handful of special characters) and these are then correspond to the 1- byte encoding from JIS X 0201:1997 ( just with the escape character 0x8e as a prefix ).

In Code Set 3 JIS X 0212:1990 is encoded in the three -byte variant.

EUC -KR

EUC -KR is the version of EUC used in Korea. It is similar to ISO -2022 -KR (or KS X 1001).

EUC -CN

EUC -CN is used in China and meets GB2312. It encodes the simplified Chinese characters.

EUC -TW

Actually developed for Taiwan, EUC -TW is very rarely used. Much more common is there Big5. Both encode the traditional Chinese characters.

Encoding
CJK
Character encoding for the Japanese writing
Encoding for the Chinese writing
Encoding for the Korean script

Shift_JIS

318505