Han unification

The term Han unification (English Han unification ) is in the computer science unifying the Chinese Hanzi, Japanese Kanji and Korean Hanja (CJD ), rarely also the Vietnamese chu nom, referred to in a character set. The term is used mostly in connection with Unicode and carried out there Han unification.

The idea to unite the various Han fonts in a font is not new - back in 1980 existed with CCCII a character set of symbols, and Kanji characters long united. This idea has also been followed in the development of the Unicode standard. In February 1990, a specially specialized in the Han unification group, the CJK IRG, was founded. This group was renamed in IRG little later.

When China to develop a new character set, GB 13000 announcing to Unicode and China agreed to develop the Han character set in common.

Han unification in Unicode

For the Han unification in Unicode Ideographic Rapporteur the Group ( IRG) is responsible to review all proposals and encoding characters that can be combined, tracks down. The unification in Unicode is based on strict rules:

  • In order to make the switch from older character sets easier to Unicode, was for the 20,902 characters of the first Unicode version of the source separation rule used, stating that two ideograms that are differentiated in an older character set, also be distinguished in Unicode. For later coded CJK ideographs this rule is no longer used.
  • If ideograms are not used by the historical significance here, they are also not united. This is for instance the case of the character土(earth) and士(warrior ), which, although look similar but have completely different meanings and origins.

Subsequently, the ideograms are broken down into their individual lines. Thereafter, the number and position of the bars that structure the coding in an older character set as well as the radical of the characters can be determined. If everything matches, the characters are combined, but not otherwise.

Most of the characters are simplified if they look different only in the various writing styles of Chinese writing. So is written in the document either with one or with two upper points, for example, the radical辵( as radical辶). In the regular script and the manuscript of this character has all but only one point. Similarly, it is also at the示radical, which, although in the classical reference ( Ming) still written as a示in hand and regular script, however礻is written. Since it has been tried by the Scripture reforms in the People's Republic of China and Japan, adjust the print font to the manuscript, in Korea, however, not limited and in Taiwan, do these differences.

The following table shows the different representation of a character per line for various CJK fonts ( Chinese without notice, simplified Chinese characters as in PRC used Chinese traditional characters, as in the Republic of China ( Taiwan) used in Japanese, Korean), by the respective writing specific characteristics derived. These can be derived from the stroke order, the number of pulses or direction. For ordinary operation while the appropriate fonts must be installed and select the browser to the appropriate right.

On the other hand, however, each character variant were added separately in Unicode, which will be exemplified in the following table:

Criticism

In East Asia, the Han unification is mainly criticized for cultural, but also for technical reasons. Especially in Japan has Unicode therefore still in a difficult position

Historically, it was in Chinese, as in the Japanese not exact separation between glyph and character. The design of Unicode consortium had systematically introduce the choice of either this differentiation or completely without it and to encode each variation separately. This would have led to numerous semantically identical character to many variants, especially for variants that can not be clearly on speaking world ( classical Chinese, Simplified Chinese, Japanese, Korean), but only historically delimit.

The current Unicode standard is a compromise on a full harmonization only according to semantic criteria has been omitted. This had practical reasons. It was stated goal that can be modern Chinese, Japanese and Korean differentiate in the same text without font changes. Even the classic texts can be mapped semantically unambiguous in Unicode 3.1. Only the presentation of historical variations that may be of interest in the linguistic context, is not possible in Unicode 3.1. This must be done by the font.

Another problem is the lack of ability to specify different variants of a character in a text without markup. Especially in Japan, where some place names and names still use the old radicals, which leads to problems. For example, the first character of the district祇 园Gion of Kyoto is not written with礻, but with示, although other words are written with只with the礻radical.

372232
de