WCET Timing Model Integration
Accurate WCET timing models are available within static WCET analyzers like, e.g., aiT, as discussed in the Overview on Static Timing Analysis. To make such models available within the WCC compiler, it is not advisable to re-implement them once again inside the compiler. Instead, we decided to tightly couple the static WCET analyzer aiT with WCC's compiler back-end:
Conversion from LLIR to CRL2
As already stated, CRL2 is used by aiT as exchange format storing the application under WCET analysis as well as analysis results generated by the individual substeps of aiT. Since both LLIR and CRL2 are low-level IRs representing a program as a control flow graph, it is straightforward to translate these IRs into each other. At the top level, the CFG of both IRs consists of functions. Each function contains a list of basic blocks connected via edges. Basic blocks in turn consist of a sequence of instructions. In both IRs, an instruction can consist of several operations in order to express the implicit parallelism present in, e.g., a VLIW machine.
Due to the analogy of both IRs, the conversion step basically needs to traverse the LLIR CFG recursively and needs to generate equivalent CRL2 CFG components.
However, aiT inherently performs a some kind of loop transformation during WCET analysis. During this transformation, aiT removes each loop within the CFG from its function and moves it into an own dedicated function. The jump back to the loop header is realized using a recursive function call after loop transformation. This transformation is not applied to the actual code to be analyzed but only to the CFG. Hence, it does not affect program execution but improves the precision of WCET analysis since it allows the distinction of different loop iterations that are internally represented by function-like calling contexts. WCC needs to be aware of this loop transformation in order to keep track of how and where the loops of the LLIR CFG are represented in CRL2.
In addition to a pure CFG conversion, WCC also makes sure that the statically known content of data sections, like, e.g., pre-initialized arrays, is made available to aiT by generating data object dumps in CRL2.
Transparent Invocation of aiT
Using the conversion step from LLIR to CRL2, WCC is able to produce a CRL2 file representing the program for which WCET timing data is actually required. Fully transparent to the user of WCC, the compiler is able to apply aiT on this CRL2 file. WCC takes over full control of the WCET analyzer and invokes value, loop bound, cache, pipeline and path analysis in the background.
As a consequence, the timing analyzer is completely encapsulated within WCC. The compiler user is unaware of the fact that timing analysis is performed in the background. Furthermore, the user does not get in touch with the lots of configuration parameters mandatory to run aiT. The burden of setting up a valid runtime environment for aiT, where reasonable values for dozens of configuration parameters need to be defined manually, is taken away from the user and is completely managed by WCC.
Import of Worst-Case Execution Data
The result of static WCET analysis when invoking aiT within the WCC compiler consists of a final CRL2 file representing a program's CFG which is enriched with all the data about a program's worst-case behavior computed by aiT. The last step for fully integrating aiT's timing models into WCC thus consists of traversing this final CRL2 file, extracting all worst-case execution data it contains and importing this data into WCC's back-end. Technically, all the worst-case execution data computed by aiT is attached to LLIR objects within WCC. The following list gives an overview about the data made available within WCC this way:
- Worst-Case Execution Time of the entire program
- Global Worst-Case Execution Time per function
- Context-specific Worst-Case Execution Time per function
- Worst-Case Call Frequency per function
- Global Worst-Case Execution Time per basic block
- Context-specific Worst-Case Execution Time per basic block
- Context-specific Worst-Case Execution Frequency per CFG edge
- Context-specific Feasibility of each CFG edge
- Encountered I-cache misses per basic block