Function Inlining is a well-known transformation replacing a function call by the body of the callee while storing the arguments in variables that correspond to function parameters. Function inlining originally intends to reduce average-case execution times (ACETs) for the following reasons:
- By copying the callee's code into the caller, the calling overhead is reduced since the function call and return instructions as well as parameter handling is removed.
- Removing calls and returns potentially yields a smoother pipeline behavior.
- Inlining potentially enables other compiler optimizations which are unable to do global code analysis and transformation across several functions.
One of the main drawbacks of this optimization is the increased register pressure. By inserting additional variables from the inlined function into the caller, possibly more registers are required which may result in additional spill code degrading a program's performance. Furthermore, I-cache performance may be degraded due to the increased code size and reduced spatial locality of memory accesses.
The evaluation of the impact of inlining on ACETs is challenging due to the side-effects of the register allocator and the caches. This complicates the decision whether a function should be inlined. Up to now, more or less simple compiler heuristics try to predict if inlining of a given function will be beneficial or not. The WCC compiler includes mechanisms for function inlining which, on the one hand, focus on WCETs instead of ACETs. On the other hand, simple inlining heuristics are replaced by sophisticated machine learning (ML) techniques.
Machine learning techniques provide a flexible and adaptive way to automatically generate compiler heuristics handling complex search spaces. Furthermore, such machine learning based approaches often outperform hand-crafted heuristics. The application of inlining poses a typical classification problem, i.e., one is interested in a classification rule that decides if a particular function should be inlined or not for a particular call site.
WCC extracts numerous features from the program to be optimized to drive the classification process for inlining. These include among others:
- size of caller / callee,
- WCET of the caller / callee,
- worst-case execution frequency per call site,
- WCET of a callee for a particular call site, i.e., the product of the callee's WCET and the worst-case execution frequency of the call site,
- number of call sites of a callee lying on the current WCEP,
- whether a callee is called from only one call site,
- number of registers whose lifetimes span a call. Inlining of calls crossing a high number of lifetimes increases the probability for spilling.
- maximal number of registers that are simultaneously live within a function.
These features are extracted by heavily using WCC's infrastructure. Using the tight integration of a timing analyzer into the compiler back-end, all WCET- and WCEP-related features are collected. Features related to liveness of registers are computed by a dedicated register pressure analyzer in the compiler back-end. These assembly code-specific features are then propagated to the ICD-C level via WCC's back-annotation. All other features dealing with the number and positions of call sites are directly computed at C code level within ICD-C.
Hereafter, a random forests classifier integrated into the machine learning tool RapidMiner uses these features at ICD-C level. The knowledge base of this classifier was generated off-line by means of a training set of 41 realistic benchmarks. If the classifier decided to inline a function at a particular call size, this optimization was applied to WCC's intermediate code representation, followed by a WCET analysis of the resulting optimized program. This way, all WCET-related data used by the classifier's features is updated and possible WCEP changes are automatically captured.
The above figure shows the WCETs for different inlining approaches and 26 real-life benchmarks. The first two bars per benchmark denote standard, WCET-unaware heuristics inlining only functions with less than 50 and 100 ANSI-C expressions, respectively. The third bar denotes the WCETs achieved by WCC's novel machine learning based inlining which does not use any hard-coded size threshold. All results are given as a percentage, with 100% corresponding to the benchmarks' WCETs if no function inlining is applied. All results are generated using WCC's highest optimization level -O3.
As can be seen, inlining of functions with less than 50 C expressions only has a marginal impact on the benchmarks' WCETs. Only for very few benchmarks, significant WCET reductions were achieved. For most benchmarks, WCETs did not change after inlining. On average over all considered benchmarks, a WCET of 100.3% of the WCET without inlining was obtained. When inlining functions with less than 100 C expressions, the WCETs of 11 benchmarks increased significantly. On average, this inlining strategy increases WCETs by 5.5%. Unlike these standard inlining approaches, machine learning based inlining never significantly increases WCETs in any experiment. Instead, machine learning based inlining outperforms both WCET-unaware strategies by up to 11.4% on average over the entire benchmark set.