WCET-Aware Static I-Cache Lockdown
Modern caches can be controlled by software. The software can load parts of its code or of its data into the cache and lock the cache afterwards. Cache locking prevents the cache's contents from being flushed by deactivating the replacement. A locked cache is highly predictable and leads to very precise WCET estimates, because the uncertainty caused by the replacement strategy is eliminated completely. This page presents techniques exploring the lockdown of instruction caches at compile-time to minimize WCETs.
WCET-Aware I-Cache Contents Selection
For real-time systems with hard timing constraints, many cache architectures are problematic due to their unpredictability. Since they are hardware controlled, it is virtually impossible in many cases to determine the latency of a memory access. Thus, WCET estimates may be heavily overestimated in the presence of caches. Modern caches allow to lock their contents, i.e. to protect it from being flushed by disabling the replacement. This way, it is possible to predict access times of data or instructions that have been locked in the cache, and to make precise statements about the cache's worst-case timing.
This page presents techniques for compile-time decided I-cache lockdown to minimize WCETs. A set of functions of a program is determined that is locked into the cache at system startup time. During the whole execution time of the program exploiting cache locking, the locked cache's contents remains invariant ("static I-cache locking"). Cache contents selection is done such that the set of selected functions leads to the highest WCET reductions, under simultaneous consideration of switching worst-case execution paths (WCEPs).
The entire approach for WCET-aware static I-cache locking works in the following three stages:
- Execution Flow Graph Generation
The Execution Flow Graph (EFG) is an extended function call graph of a program. A single EFG node corresponds to a function of a program in a certain execution context, where the execution context contains information about all different ways a function can be invoked in a normal call graph. The node weights within the EFG represent a function's context-specific for a single execution of the function. Edge weights denote an upper bound of how many times execution flow passes this edge in an execution context. The edges of the EFG are created such that all paths from the source to the sinks in the EFG correspond to all possible ways of passing the control flow between the functions.
- Worst-Case Execution Path Construction
Since the EFG reflects all possible execution paths within a program, the WCEP can be determined by finding the longest path from a program's start to its end nodes. To determine the longest path within the EFG, a modified variant of Dijkstra's algorithm is employed.
- Cache Contents Selection
The main algorithm for cache contents selection fills a set L of functions to be locked in the I-cache. For each unlocked function along the WCEP fitting into the remaining I-cache capacity, that function with the highest gain is selected for lockdown. After this decision, node weights within the EFG are adjusted and the new WCEP after lockdown of that function is computed.
The above figure depicts the effect of static WCET-aware I-cache lockdown on the resulting WCETs. It shows the WCETs of the benchmarks after I-cache locking as a percentage of the WCETs without cache locking. The 100% base line thus reflects the WCETs of the benchmarks for a system with I-cache in normal operation mode.
As can be seen from this figure, the proposed techniques achieve significant WCET reductions for all cache sizes. Already for a very small I-cache of 64 bytes, lockdown leads to improvements between 2% (ADPCM) and 38.3% (G723). The large WCET reduction for G723 is caused by locking two very small but WCET-critical functions for absolute value computation and quantization.
With increasing I-cache sizes, even higher WCET reductions were achieved since more functions were locked in the I-cache, thus leading to a proportionally higher predictability of instruction fetches. Per benchmark, a monotonic decrease of WCETs was observed with increasing I-cache sizes. For some benchmarks (G723, MPEG2), smoothly decreasing WCET curves were obtained. This smooth decrease shows that many small functions lie on the WCEP of these benchmarks. By increasing the I-cache size by only a few bytes, these benchmarks benefit since the additional cache space is used for lockdown of functions on the WCEP.
In contrast, ADPCM exhibits a stepwise WCET decrease with increasing cache sizes. Here, the steep WCET reduction down to 34% when increasing the cache size from 128 bytes to 256 bytes stems from the lockdown of a system library function for integer division. This system function is heavily used within ADPCM during sine and cosine computations. When increasing the I-cache size further to 512 bytes, another reduction of WCET down to 27% was observed. Hereafter, a saturation is reached so that caches larger than 512 bytes do not lead to more reductions of the WCET of this benchmark.
The overall largest WCET reductions were achieved for an I-cache size of 16 kB which is the size of the ARM920T's native configuration. For this scenario, WCET reductions between 54.6% (MPEG2) and 73.1% (ADPCM) were measured. On average over all five benchmarks, a WCET reduction of 60.1% was achieved for a 16 kB I-cache.