Polytope model
The Polytopmodell (or more generally polyhedral ) is a mathematical model that can be used by compilers for optimization of loop sets. The loops are described in the source program by polytopes on which then a correctness -preserving transformation is applied. In the final step, the resulting polytopes are again translated into (target) code.
- 2.1 from source program to the polytope
- 2.2 dependency analysis
- 2.3 Setting up the transformation matrix
- 2.4 Transformed polytope
- 2.5 Generation of the target program
Setting up the transformation matrix
A transformation consists of two parts, the schedule and the allocation. The Schedule determines when a calculation is to take place, while the allocation defines where the calculation is performed (that is, on which processor it is running).
Calculation of a valid schedules
A schedule is valid if it receives all the data dependencies.
If one iteration to the loop indices, the results of calculation required, must apply to the Schedule. That is, all required values have to be calculated at an earlier stage.
Calculation of a valid allocation
In contrast to the Schedule, there are no restrictions on the allocation.
Basically, there is always the opportunity to have the calculation performed only by a processor. However, one loses all parallelism. Therefore it makes sense to distribute the computation on as many processors to maximize the parallelism. However, it must be borne in mind that this is more data to be sent between the processors. This additional communication costs can easily exceed the profit by the parallel computation.
Example ( Automatic parallelization )
From the source program to the polytope
Consider the following program, which consists of a perfectly nested loop set. The body of the loop contains a statement S.
For i: = 0 to n do for j: = 0 to i 2 do S: A (i, j ) = A (i -1, j ) A (i, j-1) end end To illustrate the loop as a polytope, it suffices to write the upper and lower bounds, and inequalities:
Or as a linear system of equations in matrix representation (one line corresponds to an inequality columns: i, j, n, 1)
Dependency analysis
In each loop the array cell A (i, j) will be overwritten. For the calculation of A ( i, j) is required, the values of A (i -1, j ) and A ( i, j -1). This results in two data dependencies: Each iteration ( i, j) depends on both the iteration (i-1, j ) and (i, j -1). Both dependencies must be taken into account in the next step in the calculation of the schedule.
Algorithmically, let all the dependencies using a method for dependency analysis calculated.
Setting up the transformation matrix
A correct schedule, which receives both dependencies, for example.
Interpretation:
- In the first step A ( 0,0) is calculated
- In the second step A (1,0) and A (0,1) is calculated
- The third step A (2,0), A ( 1,1) and A (0,2)
- Etc.
To allow maximum parallelism in each calculation step, we choose as an allocation
This results in the following transformation matrix:
(Explanation: First line = Schedule ( i j), second line = allocation ( i ) First column: i, Second column: j)
Transformed polytope
The loop indices are also transformed by: and
Generation of the target program
The final step is to generate code that enumerates exactly the points from the Zielpolyeder and in the proper order (more precisely, the lexicographical order ) comply. This is calculated from so-called scanning algorithms algorithmically (eg, Fourier - Motzkin elimination or the method of Quillerè ).
This gives the following ( synchronous ) target program:
For t: = 0 to 2n 2 do parfor p: = max (0, ceil ( t / 2) -1 ) to min ( t, n ) do A ( p, t p ) = A ( p- 1, t -p) A ( p, t, p-1) end end The outer loop specifies global time steps, while the second loop is a parallel calculation. Since missing in this program the code for the communication between the processors, it is on systems not directly executable with distributed memory. However, it can be combined with shared memory transpose directly, eg as OpenMP program.
Application
Among the most important applications include:
- Cache optimization
- Loop Tiling
- Schleifenparallelisierung
- Compiler