BDDStab - In-depth example
Here, we discuss the
demo2.exe case study, taken from Kinder’s paper about alternating control flow reconstruction . The corresponding assembly code, with added edges to visualize control flow originating from jump instructions, is depicted in Figure 1. Note that the edges represent exactly the jump information that Jakstab, using our domain, will determine. The control flow of the assembly code can be divided into two parts; a loop part from
0x00401033 that is terminated by a counter; and a loop-free part from
0x00401058. Looking only at the code, without information about possible jump targets, it is not obvious that the jump at
0x00401041 will not cause a loop. The method in the original paper found that only 35 of the 37 instructions are reachable that
objdump -d1 disassembles. Although not specified in the paper, the code that was found dead is likely at the addresses
0x00401003, as indicated in the original assembly. Using our BDD-based domain to resolve the targets of jumps, Jakstab is able to additionally determine that the code at the addresses
0x0040104c is dead. There are three crucial pieces of information our domain must provide to identify the dead code:
The VD of
%ebxfor the indirect jump at
0x00401033must not contain the addresses of dead code. Note that in practice, any approximation of VDs for indirect jumps causes approximation at subsequent program locations, which may then cause dead code to be identified as reachable.
Similarly, the VD of
%eaxfor the indirect jump at
0x00401041may not be approximated.
It must be shown that
%ebxcannot be equal at
0x0040103d, to show that the jump at
0x0040103fwill not be taken.
In the following in-depth discussion, we will take a look at the contents of the
%ebx registers as well as that of the memory location
0x0040103000, which stores a counter value that is 0 initially.
Table 1 lists a selection of flow data after stabilization as determined by our BDD-based abstract domain. We denote the variation domains of
%ebx in bits, where
* indicates that the value of a bit is not known. Because nothing is known about
%ebx initially, both are set to ⊤ in
0x00401000 and no subsequent join operation will improve precision. After the shift instruction at
0x00401018, which shifts 30 times to the right and fills with zeros from the left, only the least significant two bits of
%ebx remain unknown. The subsequent shift instruction will shift these bits to the left by 3 positions, and the following
addl instruction does not interfere with the unknown bits. The code from
0x00401031 increments the counter, which terminates the loop that results from the jump instruction at
0x00401033 after the second iteration. A look at the flow data shows that the indirect jump at
0x00401033 can only have the targets
0x00401018, thereby solving challenge 1. When control reaches
%eax gets shifted by 31 positions to the right and filled with zeros from the left. After the shift, all but the least significant bit of
%eax are known to be 0. As a result of the addition in
0x00401038, our domain determines that
%eax can only have the values
0x00401044, thereby generating precise jump targets for the jump at
0x00401041 (challenge 2). Because the intersection between the variation domains of
0x0040103d is empty, the corresponding jump at
0x0040103f cannot be taken (challenge 3). Therefore, our domain can determine that the code at
0x0040104c is dead as well.
Steps for Reproduction
To reproduce the results from above, download the source code for our domain2 and follow the installation instructions provided in the “Readme” file. Within Jakstab’s installation directory, execute
“./jakstab –cpa sfz -m input/bin/vmcai12/demo2.exe –cpa2 z”,
which will write a
dot file to “input/bin/vmcai12/demo2_cfa.dot” that contains the flow data from our domain. This dot file can either be compiled using the dot program as provided by graphviz3, or visualized using xdot4. Note that registers and memory locations that are set to ⊤ or ⊥ will not be listed in the flow data.