Under the name Base85 different, mutually incompatible encoding methods are combined, the 8-bit binary data is converted into a sequence of printable ASCII characters. They have in common that they encode blocks of four bytes in five ASCII characters. For this purpose, at least 85 different characters are required, which this method its name. The advantage of a 25 %, slightly lower coding overhead, compared to 33% which occurs in the standardized Base64 encoding.
The most common wall, this coding in postscript file format created by Adobe, this coding version is also called ASCII85.
The basic idea
Four bytes can assume 2564 = 4,294,967,296 different possible states. To encode this with minimal overhead, do you choose an appropriate subset of the printable ASCII characters, which makes it possible to manage with 5 characters. For this purpose, an alphabet of at least 85 characters is necessary because 855 = 4.437.053.125 ≥ 4.294.967.296 is. ( 84 characters is not enough, since 845 = 4182119424 < 4,294,967,296 ).
If the four bytes and are referred to, and the five coded characters, then the following conversion formula:
In other words, the four bytes are interpreted as a four-digit number in base 256 and converted into a five-digit number in base 85.
The codes are now represented by some printable ASCII characters.
The Base85 encoding in PostScript added to the values of the value of 33 and thus uses the ASCII values 33-117, which the ASCII character "!" to "u " or higher. The only exception to this: four consecutive zero bytes are not "! ! " encoding, but with a different "Z". This simple type of data compression coding overhead of Base85 is reduced depending on the data contents or even compensated, longer sequences of zero bytes can happen rather frequently especially particularly in embedded in PostScript raster graphics. When encoding can be added to achieve approximately to a certain maximum line length as desired spaces and line breaks. These characters are ignored during the decoding. All other characters represent an error, after which decoding stops.
IPv6 address encoding according to RFC 1924
A slightly different coding was proposed in RFC 1924 for IPv6 addresses (note the date of publication of this RFCs). The to be encoded 128 -bit IPv6 address is not divided into four blocks of 32 bits, but seen as a 128- bit number. This is successively divided by 85, the occurring residues are the " digits" of the Base85 encoding.
Each IPv6 address can be included into 20 numbers in the range 0 ... 84 encode. The assignment of these numbers to ASCII characters via a look-up table, as you certain in encoding ASCII characters that " in certain environments could be problematic," wanted to avoid. The look-up table is used is as follows:
Not to be used the ASCII characters: " ', /. [\ ] And the space and the 33 control characters.
Since the ASCII85 encoding that is used in PostScript and PDF, use characters that can not be used in XML, JSON and string literals in many programming languages ( ", ' and \), another encoding format called Z85 was developed. It uses the adjacent coding table and also encodes binary data only in full 4 -byte blocks. Falls binary data must be processed, the length is not an integer multiple of 4, an application-specific padding must be used.
However, it also uses the
Despite the slightly lower overheads has the Base85 encoding - except in special areas - can not enforce. Meanwhile, there basE91 with an even more efficient method. For the ASCII encoding of binary data in email and Usenet articles only Base64 encoding according to the MIME standard is provided.