Universal Character Set
The Universal Character Set (UCS ) is a character encoding that is defined in the international standard ISO / IEC 10646. For all practical purposes, this is the same as Unicode.
It is developed by ISO/IEC/JTC1/SC2/WG2.
Originally, these two formats have been defined:
- UCS- 2 encoding in 2 bytes; there can be only the Basic Multilingual Plane encode. This allows the encoding of most living languages and the more common special characters. UCS -2 is also the character set of Microsoft Windows NT.
- UCS- 4 encoding in 4 bytes (equivalent to UTF -32)
The group works very closely with the Unicode Consortium that constantly synchronize the standards in new versions. Because of all codings for interoperability on the allowed in Unicode 1,112,064 ( = 220 216, 211 less surrogates of UTF -16) characters ( U 00000 to U 0 D7FF, and U 0 to U 10 FFFF E000 ) limited.
In the version of ISO / IEC 10646-3:2003 the same formats UTF -8, UTF -16 and UTF- 32 are described in Unicode 4.0.
Comparison of versions
- ISO / IEC 10646-1:1993 ≈ Unicode 1.1
- ISO / IEC 10646-1:2000 ≈ Unicode 3.0
- ISO / IEC 10646-2:2001 ≈ Unicode 3.2
- ISO / IEC 10646-3:2003 ≈ Unicode 4.0
- ISO / IEC 10646-4:2008 ≈ Unicode 5.1
- ISO / IEC 10646:2012 ≈ Unicode 6.1