Polytope model

The Polytopmodell (or more generally polyhedral ) is a mathematical model that can be used by compilers for optimization of loop sets. The loops are described in the source program by polytopes on which then a correctness -preserving transformation is applied. In the final step, the resulting polytopes are again translated into (target) code.

  • 2.1 from source program to the polytope
  • 2.2 dependency analysis
  • 2.3 Setting up the transformation matrix
  • 2.4 Transformed polytope
  • 2.5 Generation of the target program

Setting up the transformation matrix

A transformation consists of two parts, the schedule and the allocation. The Schedule determines when a calculation is to take place, while the allocation defines where the calculation is performed (that is, on which processor it is running).

Calculation of a valid schedules

A schedule is valid if it receives all the data dependencies.

If one iteration to the loop indices, the results of calculation required, must apply to the Schedule. That is, all required values ​​have to be calculated at an earlier stage.

Calculation of a valid allocation

In contrast to the Schedule, there are no restrictions on the allocation.

Basically, there is always the opportunity to have the calculation performed only by a processor. However, one loses all parallelism. Therefore it makes sense to distribute the computation on as many processors to maximize the parallelism. However, it must be borne in mind that this is more data to be sent between the processors. This additional communication costs can easily exceed the profit by the parallel computation.

Example ( Automatic parallelization )

From the source program to the polytope

Consider the following program, which consists of a perfectly nested loop set. The body of the loop contains a statement S.

For i: = 0 to n do         for j: = 0 to i 2 do S: A (i, j ) = A (i -1, j ) A (i, j-1)         end     end To illustrate the loop as a polytope, it suffices to write the upper and lower bounds, and inequalities:

Or as a linear system of equations in matrix representation (one line corresponds to an inequality columns: i, j, n, 1)

Dependency analysis

In each loop the array cell A (i, j) will be overwritten. For the calculation of A ( i, j) is required, the values ​​of A (i -1, j ) and A ( i, j -1). This results in two data dependencies: Each iteration ( i, j) depends on both the iteration (i-1, j ) and (i, j -1). Both dependencies must be taken into account in the next step in the calculation of the schedule.

Algorithmically, let all the dependencies using a method for dependency analysis calculated.

Setting up the transformation matrix

A correct schedule, which receives both dependencies, for example.

Interpretation:

  • In the first step A ( 0,0) is calculated
  • In the second step A (1,0) and A (0,1) is calculated
  • The third step A (2,0), A ( 1,1) and A (0,2)
  • Etc.

To allow maximum parallelism in each calculation step, we choose as an allocation

This results in the following transformation matrix:

(Explanation: First line = Schedule ( i j), second line = allocation ( i ) First column: i, Second column: j)

Transformed polytope

The loop indices are also transformed by: and

Generation of the target program

The final step is to generate code that enumerates exactly the points from the Zielpolyeder and in the proper order (more precisely, the lexicographical order ) comply. This is calculated from so-called scanning algorithms algorithmically (eg, Fourier - Motzkin elimination or the method of Quillerè ).

This gives the following ( synchronous ) target program:

For t: = 0 to 2n 2 do      parfor p: = max (0, ceil ( t / 2) -1 ) to min ( t, n ) do            A ( p, t p ) = A ( p- 1, t -p) A ( p, t, p-1)      end end The outer loop specifies global time steps, while the second loop is a parallel calculation. Since missing in this program the code for the communication between the processors, it is on systems not directly executable with distributed memory. However, it can be combined with shared memory transpose directly, eg as OpenMP program.

Application

Among the most important applications include:

  • Cache optimization
  • Loop Tiling
  • Schleifenparallelisierung
  • Compiler
655607
de