Since instructions inside loops can be executed repeatedly, it is frequently not possible to give a bound on the number of instruction executions that will be impacted by a loop optimization. This presents challenges when reasoning about the correctness and benefits of a loop optimization, specifically the representations of the computation being optimized and the optimization(s) being performed.1
Loop optimization can be viewed as the application of a sequence of specific loop transformations (listed below or in Compiler transformations for high-performance computing2) to the source code or intermediate representation, with each transformation having an associated test for legality. A transformation (or sequence of transformations) generally must preserve the temporal sequence of all dependencies if it is to preserve the result of the program (i.e., be a legal transformation). Evaluating the benefit of a transformation or sequence of transformations can be quite difficult within this approach, as the application of one beneficial transformation may require the prior use of one or more other transformations that, by themselves, would result in reduced performance.
Common loop transformations include:
The unimodular transformation approach6 uses a single unimodular matrix to describe the combined result of a sequence of many of the above transformations. Central to this approach is the view of the set of all executions of a statement within n loops as a set of integer points in an n-dimensional space, with the points being executed in lexicographical order. For example, the executions of a statement nested inside an outer loop with index i and an inner loop with index j can be associated with the pairs of integers ( i , j ) {\displaystyle (i,j)} . The application of a unimodular transformation corresponds to the multiplication of the points within this space by the matrix. For example, the interchange of two loops corresponds to the matrix [ 0 1 1 0 ] {\displaystyle {\begin{bmatrix}0&1\\1&0\end{bmatrix}}} .
A unimodular transformation is legal if it preserves the temporal sequence of all dependencies; measuring the performance impact of a unimodular transformation is more difficult. Imperfectly nested loops and some transformations (such as tiling) do not fit easily into this framework.
The polyhedral model7 handles a wider class of programs and transformations than the unimodular framework. The set of executions of a set of statements within a possibly imperfectly nested set of loops is seen as the union of a set of polytopes representing the executions of the statements. Affine transformations are applied to these polytopes, producing a description of a new execution order. The boundaries of the polytopes, the data dependencies, and the transformations are often described using systems of constraints, and this approach is often referred to as a constraint-based approach to loop optimization. For example, a single statement within an outer loop 'for i := 0 to n' and an inner loop 'for j := 0 to i+2' is executed once for each (i, j) pair such that 0 <= i <= n and 0 <= j <= i+2.
Once again, a transformation is legal if it preserves the temporal sequence of all dependencies. Estimating the benefits of a transformation, or finding the best transformation for a given code on a given computer, remain the subject of ongoing research as of the time of this writing (2010).
In the book Reasoning About Program Transformations, Jean-Francois Collard discusses in depth the general question of representing executions of programs rather than program text in the context of static optimization. https://books.google.com/books?id=8FbTZc4pa-cC&q=%22loop+optimization%22 ↩
David F. Bacon, Susan L. Graham, and Oliver J. Sharp. Compiler transformations for high-performance computing. Report No. UCB/CSD 93/781, Computer Science Division-EECS, University of California, Berkeley, Berkeley, California 94720, November 1993 (available at CiteSeer [1]). Introduces compiler analysis such as data dependence analysis and interprocedural analysis, as well as a very complete list of loop transformations /wiki/Susan_L._Graham ↩
"8051 Instruction Set". www.win.tue.nl. Retrieved 2019-12-09. https://www.win.tue.nl/~aeb/comp/8051/set8051.html-orig#51djnz ↩
"Intel Developer Zone". http://software.intel.com/en-us/articles/strip-mining-to-optimize-memory-use-on-32-bit-intel-architecture/ ↩
"7.6.3.1 Strip-Mining (Sun Studio 12: Fortran Programming Guide)". http://download.oracle.com/docs/cd/E19205-01/819-5262/aeugr/index.html ↩
Steven S. Muchnick, Advanced Compiler Design and Implementation, 1997 Morgan Kaufmann. Section 20.4.2 discusses loop optimization. https://books.google.com/books?id=Pq7pHwG1_OkC&q=%22unimodular+transformation%22 ↩
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002. ↩