Intermediate language

An intermediate code is the code that is generated in the course of the translation process on a higher abstraction layer between the source language and the usually low-level target language. It is primarily a long-established in the conceptual compiler intermediate step, which is not always associated with the generation of products.

History

Martin Richards developed in the late 60s for its programming language BCPL, the precursor of C and C , an intermediate code called O code (O for object code) that made the actual compiler is machine independent. This allowed the easy porting of this compiler on different processors. The O code could then be interpreted or translated into machine-specific code.

The UCSD Pascal environments used from the late 70's p- code. Trying to enable fully portable computer programs on the basis of an interpreted bytecode, but failed largely due to the low speed of that computer systems - you could not and would not afford at this time slowing down due to the additional indirection.

Benefits

It may be preferable to not generate code for the processor of the runtime system, but initially only intermediate code for an ideal (or virtual ) processor, which is often simulated by software only. Reasons may be, inter alia:

  • Portability and platform independence (see also Java VM)
  • Simplifying the process of translation (see also p- code),
  • General optimizations ( efficiency-enhancing code transformations ) can already be carried out on the intermediate code,
  • The target processor is not yet comfortable enough to program, for example, because you would have liked to floating-point instructions, the processor but has no FPU - another translation step then inserts code that simulates these commands with the existing integer instructions.

Static single assignment

A special class of intermediate code is the Static Single Assignment representation ( also static single assignment form, abbreviated SSA). It is characterized by the fact that in the intermediate code is assigned to each variable, only a static value. This data dependencies between instructions to be represented explicitly, which for many optimizations beneficial. The SSA representation is generally possible only with the help of Phi- functions. The source programs of many programming languages ​​can be transformed with little effort into a SSA representation. Many modern compilers - including the compiler of the GNU Compiler Collection - therefore use SSA -based intermediate code.

Languages

Although it was not intended as an intermediate code, C, as an abstraction of an assembler, and because of the general availability as the de - facto system language of Unix-like systems and other operating systems, a popular intermediate language was: Eiffel, Sather, Esterel, some Lisp dialects ( Lush, Gambit ), Haskell (Glasgow Haskell compiler), Squeak Smalltalk 's subset slang, Cython, Seed7, Vala and others use C as intermediate code. Some versions of C were designed to make C a better portable assembly language: C - and C Intermediate Language.

Microsoft's Common Intermediate Language is an intermediate code that is used by all. NET compilers before being further compiled either statically or dynamically to machine code.

The GNU Compiler Collection (GCC ) used internally several intermediate code to support the portability and cross- compilation. Among these languages

  • The historical Register Transfer Language (RTL )
  • The language-independent tree format GENERIC and
  • The SSA -based GIMPLE.

Most intermediate code languages ​​have been developed for statically typed languages ​​. In contrast, Parrot has been developed to support the dynamically typed languages ​​such as Perl and Python.

746474
de