CESU-8

CESU -8 ( Compatibility Encoding Scheme for short for UTF -16: 8- bit) is a variant of UTF -8, which is described in Unicode Technical Report # 26. The code point is first expressed in UTF -16, then the result in UTF -8 is recoded as if it was UCS-2. The method is similar to the modified Java UTF-8, however, the NUL (U 0000 ) is not specially encoded. As with the modified UTF -8 CESU -8 is decoded in single UTF-16 words.

CESU -8 -encoded text arises when a UCS -2 → UTF8 converter (often from the time, was in the Unicode only a 16- bit character set ) erroneously used for the conversion of UTF-16. Limited to the area of the Basic Multilingual Plane ( characters to number 65,535 ) are UTF -8 and CESU - 8 is identical.

CESU -8 is used by the Oracle database software. The Oracle UTF8 character set - with a name chosen by mistake - which is available since version 8.0 of the database corresponds to the CESU -8 encoding. The AL32UTF8 character set, introduced in version 9.0, corresponds to the UTF -8 encoding.

Example

Same example with binary representation

1110, 11110, 10

172935
de