Universal Character Set

The Universal Character Set (UCS ) is a character encoding that is defined in the international standard ISO / IEC 10646. For all practical purposes, this is the same as Unicode.

It is developed by ISO/IEC/JTC1/SC2/WG2.

Originally, these two formats have been defined:

  • UCS- 2 encoding in 2 bytes; there can be only the Basic Multilingual Plane encode. This allows the encoding of most living languages ​​and the more common special characters. UCS -2 is also the character set of Microsoft Windows NT.
  • UCS- 4 encoding in 4 bytes (equivalent to UTF -32)

The group works very closely with the Unicode Consortium that constantly synchronize the standards in new versions. Because of all codings for interoperability on the allowed in Unicode 1,112,064 ( = 220 216, 211 less surrogates of UTF -16) characters ( U 00000 to U 0 D7FF, and U 0 to U 10 FFFF E000 ) limited.

In the version of ISO / IEC 10646-3:2003 the same formats UTF -8, UTF -16 and UTF- 32 are described in Unicode 4.0.

Comparison of versions

  • ISO / IEC 10646-1:1993 ≈ Unicode 1.1
  • ISO / IEC 10646-1:2000 ≈ Unicode 3.0
  • ISO / IEC 10646-2:2001 ≈ Unicode 3.2
  • ISO / IEC 10646-3:2003 ≈ Unicode 4.0
  • ISO / IEC 10646-4:2008 ≈ Unicode 5.1
  • ISO / IEC 10646:2012 ≈ Unicode 6.1
419473
de