GNU Compiler Collection

GCC is the name of the compiler suite from the GNU project. GCC originally stood for GNU C compiler. Since GCC today but other than C can still translate some other programming languages, GCC now has the meaning GNU Compiler Collection get (English for GNU Compiler Collection). The command gcc ( in lowercase) still stands for the C compiler.

  • 4.1 frontend
  • 4.2 Middleend
  • 4.3 backend

Overview

The collection contains compiler for the C programming language, C , Java, Objective -C, Fortran, Ada, and Go. The compiler collection is subject to the terms of the GNU General Public License.

GCC is used by a number of systems as the default compiler, including many Linux distributions, BSD variants, NextStep, BeOS and ZETA. In addition, he also provides support for the runtime environment Cygwin and MinGW development tools. He was on more systems and computer architectures ported than any other compiler and is particularly suitable for operating systems that are running on different hardware platforms. The GCC can be installed as a cross-compiler.

History

The first public version (0.9) of GCC was released on 22 March 1987 by Richard Stallman for the GNU project (Version 1.0 was released on May 23 of the same year ) and is now further developed by programmers around the world. The extension of the C compiler package for Compiler Collection was part of the EGCS project, which existed parallel to the GCC a while and finally became the official GCC.

EGCS

1997, divided the project Experimental / Enhanced GNU Compiler System ( EGCS, Eng. Experimental for / improved GNU compiler system ) of GCC from, and in 1999 united with this again.

GCC 1.x had in 1991 reached a certain stability, but prevented architecture -related restrictions many improvements, so that the Free Software Foundation ( FSF) began, GCC 2.x to develop. By the mid- 1990s controlled the FSF exactly what could be added to GCC 2.x and what not, so that GCC as an example of the " Cathedral" development model was used, the Eric S. Raymond in his book The Cathedral and the Bazaar describes.

The fact that GCC is free software, it allowed programmers who wanted to work in another direction, to develop their own spin-offs. However, many spin-offs proved to be inefficient and confusing. Were that their work from the official GCC project often than not, or only with difficulty accepted, frustrated many developers.

Therefore, a group of developers founded in 1997 EGCS to combine several experimental spin-offs in a single project. These included g77 ( Fortran ), PGCC (Pentium - optimized GCC), the entering of many improvements to C , and compiler versions for other processor architectures and operating systems.

The development of EGCS proved to be faster, livelier and better overall than that of the GCC project, so the FSF in 1999 officially launched the development of GCC 2.x stopped and instead EGCS took over as the official version of GCC. The EGCS developers were to project managers (English maintainer ) of the GCC. From then on, the project was explicitly designed by the "bazaar " model, not after the " Cathedral " model. With the release of GCC 2.95 in July 1999, both projects were reunited.

Target systems

The GCC project called some platforms officially as primary and others as secondary evaluation platforms. Before every release of a new version in particular, these two groups are tested. GCC can generate programs for the following processors (primary and secondary evaluation platforms are checked):

  • Alpha
  • ARM architecture ( secondary, under Linux)
  • H8/300
  • S/370, S/390
  • I386 and AMD64
  • IA -64 " Itanium "
  • Motorola 68000 and Motorola Coldfire
  • Motorola 88000
  • MIPS architecture ( primarily under IRIX )
  • PA -RISC ( primarily on HP -UX)
  • PDP-11
  • PowerPC
  • SuperH
  • Sun SPARC ( Solaris primary, secondary on Linux)
  • VAX

There are also a number of processors for embedded systems, such as

  • Motorola 68HC11
  • A29K
  • Adapteva Epiphany
  • ARC
  • Atmel AVR and AVR32
  • Blackfin
  • C4x
  • CRIS
  • D30V
  • DSP16xx
  • FR -30
  • FR-V
  • Infineon TriCore
  • Intel i960
  • IP2000
  • M32R
  • MCORE
  • MicroBlaze
  • Microchip PIC24, dsPIC and PIC32 (in C)
  • MMIX
  • MN10200, MN10300
  • NS32K
  • ROMP
  • Stormy16
  • Synopsis DesignWare ARC
  • Texas Instruments MSP430
  • V850
  • Xtensa

Overall, the GCC supports more than 60 platforms.

Structure

The external interface of the gcc equivalent to a standard Unix compiler.

Each language compiler is a separate program that accepts source code and produces assembly language. In the diagram on the right side examples for C and assembler are given, both of which must undergo the pre-processing is in the compiler macros, inline header files and the like can be converted to obtain pure C code or assembler. That language-specific front-end parses the appropriate language and produces an abstract syntax tree, which is passed to a backend, which transfers the tree in GCC's Register Transfer Language (RTL ) (not shown in the diagram ), various code optimizations performs and produces the final assembly language.

Originally most of the components of the GCC were written in C. As part of the project " GCC in Cxx " the changes of gcc sources to C was planned and begun in 2010. Aim of this change is to keep the GCC understandable and maintainable. In the follow-up project, the missing level 1 of the GCC build process C code has been changed. Exceptions are backends that are formulated in substantial parts in RTL, as well as the Ada front end, which is mostly written in Ada.

Frontends

Frontends trees must produce, which can be processed by the backend. How they accomplish this is up to them. Some parser using yacc -like grammar, others use hand-written recursive parser.

Until recently, the tree representation of the program was not entirely on the target processor independent. The importance of a tree could be different for different language frontends and frontends could make their own tree code.

With the Tree -SSA project, which has been integrated into version 4.0 GCC, two new forms of language-independent trees were introduced. These new formats have been baptized GENERIC tree and GIMPLE. Parsing is now performed by a temporary language-specific tree by GENERIC is converted. Transferred to the so-called " Gimplifier " this complex shape can be performed from the starting a new set of language-and architecture-independent optimizations in the SSA -based GIMPLE shape.

Middleend

Optimization on trees does not quite fit into the scheme of " frontend " and " backend " because they are not language dependent and do not involve parsing. The GCC developers have given this part of the compiler, therefore the name " Middleend ". Among the currently most SSA - tree optimization benefits include Dead code elimination, partial redundancy elimination, Global Value Numbering, Sparse Conditional Constant Propagation and Scalar replacement of unit. Array -based optimizations such as automatic vectorization, as offered by the Intel compiler, currently under development.

Back

The behavior of the GCC backend is partially determined by preprocessor macros and architecture-specific functions, defined by which, for example, the endianness, word size, and calling conventions and register structure of the target machine are described. Using the machine description, a LISP -like description language, GCC converts the internal tree structure into the RTL representation. Although this is nominally independent of the processor, the sequence is therefore already adapted to the goal of abstract instructions.

The type and number of studies conducted by the GCC in RTL optimizations are evolved with each compiler version. They include about (global ) common subexpression elimination, various loops and jump optimizations (English if- conversion, branch probability estimation, sibling calls, constant propagation, ...) and the combine -pass, in the combined multiple instructions into a single can be.

Since the recent introduction of global SSA -based optimizations to GIMPLE trees RTL optimizations have easily lost in importance, as in the RTL representation of the program which is important for many optimizations high-level information are far less contained. However, machine-dependent optimizations are very important, as must be present on the machine for many optimizations information, such as information about which instructions a machine knows how expensive they are and how procure the pipeline of the target architecture.

In the "Reload " phase which in principle unlimited number is replaced by abstract pseudo - registers by the limited number of real machine registers, which, under certain circumstances new instructions must be inserted into the code to, for example, pseudo - registers on the stack of the function temporarily. This register allocation is quite complex because the different characteristics of the target architecture have to be considered.

In the last phase optimizations are performed, such as peephole optimization (English for peephole optimization) and delay slot scheduling (English literally for delay -slot scheduling ) before the right low-level expression of the RTL is mapped to assembler code by the name be implemented by registers and addresses to strings that specify the instructions.

Pictures of GNU Compiler Collection

92973
de