UTF-32

UTF -32 is a method for encoding Unicode characters in which each character with four bytes (32 bits ) is encoded. It can therefore be described as the simplest encoding, since all other UTF encodings use variable byte lengths. In the current standard Unicode 5.1 UTF-32 is a subset of UCS 4th

Benefits

UTF -32 shows its particular advantages for random access to a particular character, since the address of the n- th character can be determined by simple pointer arithmetic. It is also possible, based on the size of a document in bytes, immediately calculate the number of characters (namely by a simple division by 4). This property put into perspective by the fact that often a Unicode character is not a character corresponds to (eg ligatures ).

Disadvantages

The major disadvantage of UTF -32 is the high memory requirements. About four times the space occupied - the case of texts that consist mainly in Latin characters, is - compared to the common UTF -8 or ISO -8859 character sets. Therefore, it is hardly used for external storage. Another disadvantage is the lack of backward compatibility with ASCII, as given, for example, with UTF -8.

  • Unicode
796201
de