Duff's device

Duff's Device (on German about: Duff's agent) is a named after its inventor Tom Duff programming method for skillful unrolling of loops (english loop unrolling ) in the programming language C. It dissolves code - saving way, the problem that the number of loop iterations to hire. not a multiple of the n -fold of the loop unrolling or the number of iterations is variable and is only obtained at runtime. Duff's Device is applicable in any programming language that Einsprünge allowed in the loop body.

Operation

Duff worked the 1983 film production company Lucasfilm, and had found that the following function for outputting image data on special hardware to slowly ran:

Send ( to, from, count) register short * to, * from; register count; {      do          * to = * from ;      while ( - count> 0); } Duff found out that this implementation spent half the time with testing the loop condition. The standard method to increase the execution speed of grinding is to unroll it. However, a partial flow remains. From assembly language programming Duff knew the usual technique, jump in directly into the loop. In the C programming language this can use the switch statement to be solved:

Send ( to, from, count) register short * to, * from; register count; {      register n = (count 7 ) / 8;      switch ( count% 8) {          case 0: do { * to = * from ;          case 7: * to = * from ;          case 6: * to = * from ;          case 5: * to = * from ;          case 4: * to = * from ;          case 3: * to = * from ;          case 2: * to = * from ;          case 1: * to = * from ;          } While ( - n> 0);      } } explanation

The function gets the addresses of a call, the output register ( * to) and an array in memory ( * from ) and the number of data to be transmitted (short A., stands for two bytes) transferred. It uses a loop unrolling, in which eight elements to be rolled.

The counter variable n contains the number of loop iterations (count / 8, rounded up). Count is not a multiple of eight, then skipped on the first pass through the switch statement is a part of the loop and just rest (count div 8 ) copies elements. From the second pass, the number of remaining elements to be copied is then a multiple of eight. From then on, the loop body is then always run through completely and processed eight elements at a time.

Disadvantages

In conventional processors, today it is not to say that an application of this method leads to an increase in efficiency, as it can adversely affect the cache behavior. Modern compilers themselves have procedures for unwinding of program loops, taking into account the particular target platform. The code can be extremely enlarged by frequent applications of the method. In addition, the method reduces the expressiveness of the code.

248894
de