BDDStab - In-depth example


Here, we discuss the demo2.exe case study, taken from Kinder’s paper about alternating control flow reconstruction [1]. The corresponding assembly code, with added edges to visualize control flow originating from jump instructions, is depicted in Figure 1. Note that the edges represent exactly the jump information that Jakstab, using our domain, will determine. The control flow of the assembly code can be divided into two parts; a loop part from 0x00401000 to 0x00401033 that is terminated by a counter; and a loop-free part from 0x00401035 to 0x00401058. Looking only at the code, without information about possible jump targets, it is not obvious that the jump at 0x00401041 will not cause a loop. The method in the original paper found that only 35 of the 37 instructions are reachable that objdump -d1 disassembles. Although not specified in the paper, the code that was found dead is likely at the addresses 0x00401002 and 0x00401003, as indicated in the original assembly. Using our BDD-based domain to resolve the targets of jumps, Jakstab is able to additionally determine that the code at the addresses 0x0040104a and 0x0040104c is dead. There are three crucial pieces of information our domain must provide to identify the dead code:

  1. The VD of %ebxfor the indirect jump at 0x00401033 must not contain the addresses of dead code. Note that in practice, any approximation of VDs for indirect jumps causes approximation at subsequent program locations, which may then cause dead code to be identified as reachable.

  2. Similarly, the VD of %eax for the indirect jump at 0x00401041 may not be approximated.

  3. It must be shown that %eax and %ebx cannot be equal at 0x0040103d, to show that the jump at 0x0040103f will not be taken.

In the following in-depth discussion, we will take a look at the contents of the %eax and %ebx registers as well as that of the memory location 0x0040103000, which stores a counter value that is 0 initially.

Table 1 lists a selection of flow data after stabilization as determined by our BDD-based abstract domain. We denote the variation domains of %eax and %ebx in bits, where * indicates that the value of a bit is not known. Because nothing is known about %eax and %ebx initially, both are set to ⊤ in 0x00401000 and no subsequent join operation will improve precision. After the shift instruction at 0x00401018, which shifts 30 times to the right and fills with zeros from the left, only the least significant two bits of %ebx remain unknown. The subsequent shift instruction will shift these bits to the left by 3 positions, and the following addl instruction does not interfere with the unknown bits. The code from 0x00401024 to 0x00401031 increments the counter, which terminates the loop that results from the jump instruction at 0x00401033 after the second iteration. A look at the flow data shows that the indirect jump at 0x00401033 can only have the targets 0x00401000, 0x00401008, 0x00401010, and 0x00401018, thereby solving challenge 1. When control reaches 0x00401035, %eax gets shifted by 31 positions to the right and filled with zeros from the left. After the shift, all but the least significant bit of %eax are known to be 0. As a result of the addition in 0x00401038, our domain determines that %eax can only have the values 0x00401043 and 0x00401044, thereby generating precise jump targets for the jump at 0x00401041 (challenge 2). Because the intersection between the variation domains of %eax and %ebx at 0x0040103d is empty, the corresponding jump at 0x0040103f cannot be taken (challenge 3). Therefore, our domain can determine that the code at 0x0040104a and 0x0040104c is dead as well.



Steps for Reproduction

To reproduce the results from above, download the source code for our domain2 and follow the installation instructions provided in the “Readme” file. Within Jakstab’s installation directory, execute
“./jakstab –cpa sfz -m input/bin/vmcai12/demo2.exe –cpa2 z”,
which will write a dot file to “input/bin/vmcai12/” that contains the flow data from our domain. This dot file can either be compiled using the dot program as provided by graphviz3, or visualized using xdot4. Note that registers and memory locations that are set to ⊤ or ⊥ will not be listed in the flow data.