BDDStab - In-depth example
Here, we discuss the demo2.exe
case study, taken from Kinder’s paper about alternating control flow reconstruction [1]. The corresponding assembly code, with added edges to visualize control flow originating from jump instructions, is depicted in Figure 1. Note that the edges represent exactly the jump information that Jakstab, using our domain, will determine. The control flow of the assembly code can be divided into two parts; a loop part from 0x00401000
to 0x00401033
that is terminated by a counter; and a loop-free part from 0x00401035
to 0x00401058
. Looking only at the code, without information about possible jump targets, it is not obvious that the jump at 0x00401041
will not cause a loop. The method in the original paper found that only 35 of the 37 instructions are reachable that objdump -d
1 disassembles. Although not specified in the paper, the code that was found dead is likely at the addresses 0x00401002
and 0x00401003
, as indicated in the original assembly. Using our BDD-based domain to resolve the targets of jumps, Jakstab is able to additionally determine that the code at the addresses 0x0040104a
and 0x0040104c
is dead. There are three crucial pieces of information our domain must provide to identify the dead code:
-
The VD of
%ebx
for the indirect jump at0x00401033
must not contain the addresses of dead code. Note that in practice, any approximation of VDs for indirect jumps causes approximation at subsequent program locations, which may then cause dead code to be identified as reachable. -
Similarly, the VD of
%eax
for the indirect jump at0x00401041
may not be approximated. -
It must be shown that
%eax
and%ebx
cannot be equal at0x0040103d
, to show that the jump at0x0040103f
will not be taken.
In the following in-depth discussion, we will take a look at the contents of the %eax
and %ebx
registers as well as that of the memory location 0x0040103000
, which stores a counter value that is 0 initially.
Table 1 lists a selection of flow data after stabilization as determined by our BDD-based abstract domain. We denote the variation domains of %eax
and %ebx
in bits, where *
indicates that the value of a bit is not known. Because nothing is known about %eax
and %ebx
initially, both are set to ⊤ in 0x00401000
and no subsequent join operation will improve precision. After the shift instruction at 0x00401018
, which shifts 30 times to the right and fills with zeros from the left, only the least significant two bits of %ebx
remain unknown. The subsequent shift instruction will shift these bits to the left by 3 positions, and the following addl
instruction does not interfere with the unknown bits. The code from 0x00401024
to 0x00401031
increments the counter, which terminates the loop that results from the jump instruction at 0x00401033
after the second iteration. A look at the flow data shows that the indirect jump at 0x00401033
can only have the targets 0x00401000
, 0x00401008
, 0x00401010
, and 0x00401018
, thereby solving challenge 1. When control reaches 0x00401035
, %eax
gets shifted by 31 positions to the right and filled with zeros from the left. After the shift, all but the least significant bit of %eax
are known to be 0. As a result of the addition in 0x00401038
, our domain determines that %eax
can only have the values 0x00401043
and 0x00401044
, thereby generating precise jump targets for the jump at 0x00401041
(challenge 2). Because the intersection between the variation domains of %eax
and %ebx
at 0x0040103d
is empty, the corresponding jump at 0x0040103f
cannot be taken (challenge 3). Therefore, our domain can determine that the code at 0x0040104a
and 0x0040104c
is dead as well.
Steps for Reproduction
To reproduce the results from above, download the source code for our domain2 and follow the installation instructions provided in the “Readme” file. Within Jakstab’s installation directory, execute
“./jakstab –cpa sfz -m input/bin/vmcai12/demo2.exe –cpa2 z”,
which will write a dot
file to “input/bin/vmcai12/demo2_cfa.dot” that contains the flow data from our domain. This dot file can either be compiled using the dot program as provided by graphviz3, or visualized using xdot4. Note that registers and memory locations that are set to ⊤ or ⊥ will not be listed in the flow data.