Superblock-Based High-Level WCET Optimization

Superblocks are a well-known structure from instruction scheduling theory. A superblock is a list of successive basic blocks and can only be entered at the first block. The superblocks can then be used to perform optimizations across basic block borders. WCC is the first compiler which applies this structure for high-level WCET minimization. As listed in the following, superblock optimizations have several beneficial properties for WCET optimization:

  • They explicitly optimize a given path (trace) through the program, in our case on the Worst-Case Execution Path (WCEP).
  • They eliminate control flow merge points in the CFG which may
    • decrease the precision of static timing analyses, and
    • lead to additional jump instructions on the WCEP.

  • They resolve dependencies in the code and thus enable other optimizations in the optimization chain.

The drawback of superblock optimization is, like with loop unrolling or function inlining, that it may increase the code size. As example on how the superblock optimizations work, consider the following code:

  int func( ... ) {
    ...
    if ( cond ) {
      x = y;
    } else {
      x = z;
    }

    return foo( x );
  }

The WCEP is assumed to pass through the then-part of the if-statement. The superblock optimizations will eliminate all side entrances in a given trace, which in our case is the WCEP, consisting of the if block, the x = y block and the return block. This results in

  int func( ... ) {
    ...
    if ( cond ) {
      x = y;
      return foo( x );
    } else {
      x = z;
      return foo( x );
    }
  }

This way, in the assembler representation, the jump from the end of the then-part is removed and also value propagation and dead code elimination will be able to transform x = y; return foo( x ); to return foo( y ); which would not have been possible previously. If y is constant, procedure cloning can be applied additionally.

Optimization Framework

The general procedure for superblock optimization at the high level is divided into several substeps. Before the optimization starts, an initial run of the timing analyzer (aiT) is required to generate WCET data that will guide our optimization decisions. The WCET data is then available at the low-level and propagated to the high-level IR via the back-annotation.

  1. Trace Selection
    In this step, we select a trace, which denotes a list of successive basic blocks that do not span across a loop border. The traces are steering our optimizations, as we will transform them to superblocks and further optimize them in the following. We select traces along the WCEP using a novel trace selection algorithm which finds the longest path using the IPET approach. This way, we make sure that we select traces that maximally contribute to the WCET.
  2. Superblock formation
    The selected traces are then transformed into superblocks. This is achieved by iterating over the trace from the last to the first block and perform high-level tail-duplication at each block that has incoming control flow edges, like, e.g., the block that contains foo( x ); in the example above. The tail (foo( x ); in the example) is copied to every control flow predecessor (x = y; and x = z;). To limit the code size expansion during this step, the trace selection pre-computes the code size increase and stops the trace growth if the resulting superblock would exceed user-defined code size growth restrictions.
    We also implemented some enhancements to this basic method that are able to further reduce the code size increase, like switch conversion and else-if refolding. See the documents listed at the bottom of the page to get more information on this.
  3. Superblock optimization
    In addition to the superblock formation itself, we have implemented a superblock-specific version of two well-known compiler optimizations, namely common subexpression elimination (CSE) and dead code elimination (DCE).
    • Superblock Common Subexpression Elimination: Whereas standard approaches to common subexpression elimination operate on basic block or on global scope, the superblock version is restricted to the scope of a superblock. Compared to the basic block variant, this yields more optimization potential on the WCEP (as our superblocks are only constructed on the WCEP). Compared to the global variant, we avoid the introduction of new variables for common subexpressions that are not useful for WCET reduction. These new variables may produce spill code in the register allocation phase and thus even have a negative effect on the WCET.
    • Superblock Dead Code Elimination: In this optimization, we move instructions which compute results that are not used on the superblock, out of the superblock. If the result of an instruction is used nowhere, then the instruction is called dead in classical compiler terminology. In addition, an instruction can be superblock-dead if its results may be used in the later program execution, but not inside the superblock. If the operands of the instruction are not overwritten in the superblock, we are able to move the superblock-dead instruction out of the superblock into the path, where the computed results are used.

  4. WCEP update
    After a trace was selected, transformed to a superblock and optimized, we recompute the WCEP / WCET because of WCEP switches that may occur after each modification to the code. To limit the computational demands of this step, we have implemented a tree-based heuristic that operates on the syntax tree and gives a rough estimation of the effect that a code change may have on the WCET. Performing a full WCET analysis after each superblock formation step is not feasible because of the analysis duration. To limit the effect of misestimations of our tree-based heuristic, we perform a full WCET recomputation using aiT each time a pre-defined number of superblocks was optimized. After the WCEP update, the next trace is selected, or the optimization terminates when all blocks are already part of a trace.

Results

The above figures show the WCET effect of superblock formation followed by either superblock CSE or superblock DCE. The base line of the diagrams is the WCET for the highest optimization level O3 without the standard basic block CSE / standard DCE. The first bar shows the impact of the standard basic block CSE / standard DCE relative to that baseline, whereas the second and third bars show the effect of superblock formation and of superblock formation including superblock CSE / DCE, respectively.

On average for 55 selected real-world benchmarks, we achieve a WCET reduction of 10.2% for the combination of superblock formation and superblock CSE (standard CSE: 3.4% reduction) and of 8.8% for the combination of superblock formation and superblock DCE (standard DCE: 2.0%). Individual improvements vary between 42.1% and 0%. A WCET degradation was not observed due to WCC's rollback mechanism: if the optimized code yields a worse WCET than the original code, the code modifications are undone. The ACET improvements are considerably lower, namely 4.9% for superblock formation and superblock CSE, and 2.1% for superblock formation and superblock DCE. This shows that the optimizations are successful in explicitly focusing on the WCET. The code size increase amounts to 23% for the superblock CSE and 28% for the superblock DCE.