Endianness

The byte order (English byte order or endianness ), in computer technology, the memory organization for simple numerical values , primarily the storage of integer values (Integer) in memory. A determination to use for the storage format is necessary if the encoding of the number to be stored more bits are required, as are the smallest addressable unit. As a rule the smallest addressable unit is a byte. This is assumed in the rest of the article as well. The storage of a number is now, if this requires more than one byte is needed in one or several bytes, the memory addresses directly to each other.

As with many other forms of memory organization, cross-vendor standards have emerged, two variants have been preserved in the byte order.

In big-endian (literally "Big- Ender ", see also etymology ) is the byte with the most significant bits (ie, the most significant digits) stored first, ie at the lowest memory address. Generally, the term means that data with the größtwertigen element are called first, as with the German spelling of the Time: Hour: Minute: Second.
In little-endian (literally "Little Closer" ), however, is the byte with the least significant bits (that is, the least significant digits ) stored at the lowest memory address or the kleinstwertige element called first, as in the conventional German date notation: Day.month.year.

The terms big-endian and little-endian so designate that end of the numerical scale, which is in an order in the first place or is stored at the lowest address.

In the language of the two variants in computer technology are often named after the manufacturers of microprocessors, using the respective variant in several processor families or have used: "Motorola " format is big-endian, whereas " Intel format " for Little - endian is.

When data is transmitted bit by bit serially, as well as the bit order is defined. Logically appears big-endian byte ordering when the most significant bit of a byte is transmitted first (about I ² C), and correspondingly vice versa (for example, RS -232). Sometimes you can also see reverse mappings, such as frame buffers.

8.1 Little- endian
8.2 Big- Endian
8.3 mixed variants (Bi- Endian)
8.4 File Formats

Example: storing an integer value

In the following example, the integer is 439 041 101 ( Vierhundertneununddreißig million ... ) are stored as 32- bit integer value (binary: 00011010 00101011 00111100 01001101 hexadecimal: 1A 2B 3C 4D). The data is stored in four bytes from the hypothetical memory address 10,000.

When the storage in the order of 1A 2B 3C 4D occurs corresponds to big endian. Storing in the reverse order (4D 3C 2B 1A), that is the least significant byte at the lowest memory address corresponds to, on the other hand little endian. Some older systems (eg, PDP -11) to store the data in the order 3C 4D 1A 2B 1A or 2B 3C 4D. This is referred to as middleware endian.

Diagram illustration of register values to memory addresses

With this diagram, register values can be mapped to memory addresses and memory addresses to register values . For a better understanding, one can imagine that Big Endian 's coordinate system grows to the right, while Little Endian 's coordinate system grows to the left.

Hardware examples

The Little Endian format was originally used in the 6502 processor, the NEC V800 series PICmicro or the Intel x86 processors. In contrast, the big-endian format was for example in the Motorola 6800 and Motorola 68000 or ColdFire family, used the processors of the System z and Sun SPARC CPUs and PowerPC. The latter, however, can be switched on some models also little-endian. The jointly developed by Hewlett- Packard and Intel IA -64 architecture also handle both byte orders, making the porting of operating systems, notably HP- UX (Big Endian ) and Windows ( Little Endian), is facilitated in this architecture.

Byte order of numbers in the language

The usual preparation of ( decimal ) number is - for the purposes of reading direction in most European languages from left to right - Big Endian. However, this comes from the fact that the number sequence of the Hindu-Arabic numbers in the writings of Central Europe has been retained. In Arabic that reads from right to left, the numbers are written the same, that is, for numbers below 100 it is read as "Little Endian" ( for numbers from 100 they are big-endian read). Also in Germany the numbers 13-99 little-endian are pronounced: "One- and - Twenty ". The one as less quality site is first spoken (also in other languages , there is this order).

An example decimal numbers: In the most common representation ( big endian), the decimal number one thousand two hundred thirty shown as " 1230 ", wherein the "1" the value 1000, which is "2", the value is 100, and the "3", value 10 is replaced. In the "little- endian" representation of the situation is reversed, so that the representation of the number " 0321 " would be (pronounced perhaps "Thirty - Two hundred - thousand ").

Contexts of the byte-order problem

The problem of byte order refers to those data types that are composed of several bytes, and be supported by the respective processor, ie mainly integer and floating-point types, and data types that are effective as such internal data types handled by the processor, for example, UTF -16. To work around this problem with Unicode characters, a byte order mark ( BOM) is often used. In a hex editor, a text is as follows:

44 00 69 00 65 00 | D i e | = UTF- 16LE / UCS -2LE; BOM at the beginning of the file = FF FE 00 44 00 69 00 65 | D i e | = UTF- 16BE / UCS- 2BE; BOM at the beginning of the file = FE FF Cross-platform representation of numbers

In order to achieve error- free data exchange between computers of different platforms, network protocols, the byte order is always fixed. This is referred to as "Network Byte Order ." The natural byte order of the system, in contrast, referred to as the " host byte order ". The system does not work with this byte order, it must be converted according to the network driver or in part in the application program.

In the case of today, most used Internet protocol set corresponds to the network byte order to big- endian format. However, there are still records, using a different byte order.

In the BSD IP socket API offered on most operating systems exist to convert the byte order of four functions:

On big-endian machines, these functions are ineffective in the case of the Internet Protocol, as host and network byte order are the same. However, it is advisable nevertheless always the use of these functions, because the source code can thereby be transferred to other systems. However, there are in this API, no standardized functions for the conversion of 64 -bit numbers, as they were not yet widespread in the development of standards.

Byte -order problems can include the exchange of files, and to some extent in the exchange of data carriers occur between different platforms. This must be remedied either by clear definition of the appropriate file format or file system, or by a compatibility mode, which performs detection and possible transformation during loading.

Jokingly the problem of different endianness of different architectures is also often referred to as Nuxi problem: If the word UNIX in two two-byte Words ( two 16 -bit registers for " UN" and " IX " ) is stored, it is in a Big- endian system as " UNIX" in memory, in a little-endian system, however, because of the reversal of the bytes in each word as " Nuxi " ( on 32-bit systems would be against it, " xinu " in a single 32 -bit register).

Important properties of the representations

Basically, only a few solid arguments for or against certain byte sequences so affixed. In addition, always pushing wider data words and the possibility of simultaneous processing of the same, the meaning of the byte order in the background. Nevertheless, there are interesting implications of byte sequences.

The register width is at most the same or CPUs generally twice as wide as the data bus. These were only 4 bits ( later a long time 8-bit) The first microprocessors. The address, however, is much wider in these CPUs. This resulted in the need to load data with a command or saving, which were distributed to at least two coupled register. To reduce the complexity of the CPU ( each transistor function was still expensive ) it was easier to automatically load at each operation, the low " bits of data ", during this storage operation could then be decoded and processed if necessary the other data in the next cycle, the command. In mainframes ( " mainframes " ) was this problem less because she was already working with data bus widths from 16 to 48 bits, so these could load in a single memory cycle, and thus the ( byte) order did not matter.

Little- endian format

In this representation one can of whole numbers immediately see where the units digit is, namely at the beginning. Large numbers do not need to be grouped only from the rear, to capture their size as in the conventional big- endian representation. Additions would be easier.

Example:

71 1402 39 007 ---- 1582 ==== To turn on a little-endian machine, a two -byte number in a four -byte number must be attached with only two zero -filled bytes at the end, without changing the memory address. On a big-endian machine, the value must first be shifted by two bytes in memory. The reverse conversion is simpler. On a little-endian machine, the high byte will be discarded easily, without changing the memory address.

Big -endian format

In big-endian format hexdumps numbers are easier to read, since the order of the digits is the same as in the usual notation of the value system.

Use

Little- endian

Today's PC systems ( x86 compatible ) use little-endian. More are Alpha, Altera Nios, Atmel AVR, some SH3/SH4-Systeme or VAX.

These are so-called true- little-endian systems. This term is used to distinguish architectures, such as PowerPC some variants ( inter alia 603, 740, 750 ), which can be configured as a little-endian system (see below bi -endian ) and from the viewpoint of the current program, then use little-endian, but still store values in memory in big- endian format. For load and store operations, the representation is implicitly converted. These systems are not true- little-endian systems. In the software development for these systems, this may need to be taken into account, for example in the driver programming.

Big- endian

Big- Endian use mainframe systems (eg IBM mainframe ) and MIPS, SPARC, PowerPC, Motorola 6800/68k-, Atmel AVR32 and TMS9900 processors. Alpha processors can operate in this mode, but this is uncommon.

Mixed variants (Bi- Endian)

There are processors, for example, certain MIPS and PowerPC variants, as well as all Alpha processors that are switchable between Little Endian and Big Endian. Also ARM processors (including Intel XScales ) can be operated with both Little, as well as Big Endian.

File Formats

The typical use of a byte sequence in a processor architecture for storing values in the main memory has an impact on the byte order of values in secondary storage (often hard disks). When a new file format, the byte order of numerical values was placed so that they get when saving and reloading from secondary storage without conversion. With storage virtualization can even be contacted directly by the program data on secondary storage.

This is significant for container formats with a general structure definition. Thus, the Interchange File Format was (IFF ) designed for Amiga programs, and according to this the Motorola 68000 processor, the four-byte chunk lengths in Motorola-Format/Big-Endian were stored. On the also working with Motorola processors Macintosh computers this was taken, among others, for the audio format AIFF.

During the transfer to the Windows platform on Intel processors the chunk lengths were redefined on four-byte Intel-Format/Little-Endian and refers to the new general container format as the Resource Interchange File Format ( RIFF ). This RIFF file format is the basis of popular file formats, such as RIFF WAVE (. Wav ) files for Audio and Audio Video Interleave (*. Avi files) for video.

Also in file formats, it is possible to develop a definition which includes two byte sequences of processor architectures. For TIFF files (Tagged Image File Format) is in the first two bytes of the file either II or MM and thus refers to the typical names of byte order: II for Intel format ( Little Endian ) and MM for Motorola format (big -endian ). Subsequent length and offset values in the file are then coded accordingly.

Etymology

The names go on the satirical novel Gulliver's Travels by Jonathan Swift back, in which the inhabitants of the land Liliput live in two opposing groups: those who beat their eggs on thick, "big", English "big " end, and therefore as Big Ender referred to as the little Ender open end of the eggs at the tip, "small", English "little". Swift was referring to the separation of the English Church ( Spitz - Ender ) by the Catholic Church (Dick - final ) - in connection with the byte order, this was the first time in 1980 by Danny Cohen in the April Fools Paper On Holy Wars and a Plea for Peace brought.