x86 反汇编/优化示例

x86 反汇编

示例：优化与非优化代码

以下示例改编自 Knuth（卷 1，第 1 章）中介绍的一种算法，用于查找两个整数的最大公约数。比较该函数的清单文件，开启和关闭编译器优化时。

 /*line 1*/
 int EuclidsGCD(int m, int n) /*we want to find the GCD of m and n*/
 {
 	int q, r; /*q is the quotient, r is the remainder*/
 	while(1)
 	{
 		q = m / n; /*find q and r*/
 		r = m % n;
 		if(r == 0) /*if r is 0, return our n value*/
 		{
 			return n;
 		}
 		m = n; /*set m to the current n value*/
 		n = r; /*set n to our current remainder value*/
 	} /*repeat*/
 }

使用 Microsoft C 编译器编译，我们使用无优化生成一个清单文件

 PUBLIC	_EuclidsGCD
 _TEXT	SEGMENT
 _r$ = -8	; size = 4
 _q$ = -4	; size = 4
 _m$ = 8	; size = 4
 _n$ = 12	; size = 4
 _EuclidsGCD PROC NEAR
 ; Line 2
 	push	ebp
 	mov	ebp, esp
 	sub	esp, 8
 $L477:
 ; Line 4
 	mov	eax, 1
 	test	eax, eax
 	je	SHORT $L473
 ; Line 6
 	mov	eax, DWORD PTR _m$[ebp]
 	cdq
 	idiv	DWORD PTR _n$[ebp]
 	mov	DWORD PTR _q$[ebp], eax
 ; Line 7
 	mov	eax, DWORD PTR _m$[ebp]
 	cdq
 	idiv	DWORD PTR _n$[ebp]
 	mov	DWORD PTR _r$[ebp], edx
 ; Line 8
 	cmp	DWORD PTR _r$[ebp], 0
 	jne	SHORT $L479
 ; Line 10
 	mov	eax, DWORD PTR _n$[ebp]
 	jmp	SHORT $L473
 $L479:
 ; Line 12
 	mov	ecx, DWORD PTR _n$[ebp]
 	mov	DWORD PTR _m$[ebp], ecx
 ; Line 13
 	mov	edx, DWORD PTR _r$[ebp]
 	mov	DWORD PTR _n$[ebp], edx
 ; Line 14
 	jmp	SHORT $L477
 $L473:
 ; Line 15
 	mov	esp, ebp
 	pop	ebp
 	ret	0
 _EuclidsGCD ENDP
 _TEXT	ENDS
 END

请注意 C 代码行与 ASM 代码行之间存在非常清晰的对应关系。"; 行 x" 指令在这方面非常有帮助。

接下来，我们使用一系列优化编译相同函数，以强调速度而非大小

cl.exe /Tceuclids.c /Fa /Ogt2

我们生成以下清单

 PUBLIC	_EuclidsGCD
 _TEXT	SEGMENT
 _m$ = 8	; size = 4
 _n$ = 12       ; size = 4
 _EuclidsGCD PROC NEAR	
 ; Line 7
 	mov	eax, DWORD PTR _m$[esp-4]
 	push	esi
 	mov	esi, DWORD PTR _n$[esp]
 	cdq
 	idiv	esi
 	mov	ecx, edx
 ; Line 8
 	test	ecx, ecx
 	je	SHORT $L563
 $L547:
 ; Line 12
 	mov	eax, esi
 	cdq
 	idiv	ecx
 ; Line 13
 	mov	esi, ecx
 	mov	ecx, edx
 	test	ecx, ecx
 	jne	SHORT $L547
 $L563:
 ; Line 10
 	mov	eax, esi
 	pop	esi
 ; Line 15
 	ret	0
 _EuclidsGCD ENDP
 _TEXT	ENDS
 END

如您所见，优化版本明显比非优化版本短。一些关键差异包括

优化版本不准备标准堆栈帧。这很重要，因为许多新的逆向工程人员都认为函数总是以正确的堆栈帧开始和结束，而事实并非如此。EBP 未被使用，ESP 未被更改（因为局部变量保存在寄存器中，而不是放在堆栈上），并且没有调用子函数。这减少了 5 条指令。
非优化输出中 "; 行 4" 下的 "test EAX, EAX" 指令序列都是不必要的。while 循环由 "while(1)" 定义，因此循环总是继续。这段多余的代码可以安全地删去。还要注意，循环中没有像预期的那样出现无条件跳转： "if(r == 0) return n;" 指令已成为新的循环条件。
函数的结构发生了很大改变：m 和 n 的除法以生成 q 和 r 在此函数中执行了两次：一次在函数开始时进行初始化，一次在循环结束时进行。此外，r 的值在相同的位置进行了两次测试。编译器在分配函数中的存储空间方面非常自由，并且会轻易丢弃不需要的值。

示例：手动优化

以下汇编代码行未经优化，但可以非常轻松地进行优化。你能找到优化这些行的方法吗？

mov	eax, 1
test	eax, eax
je	SHORT $L473

此行中的代码是为 "while( 1 )" C 代码生成的代码，确切地说，它表示循环退出条件。因为这是一个无限循环，所以我们可以假设这些行是不必要的。

"mov eax, 1" 初始化 eax。

之后的测试立即测试 eax 的值，以确保它不为零。因为 eax 在此时始终不为零（eax = 1），所以条件跳转可以与 "mov" 和 "test" 一起删除。

汇编代码实际上检查的是 1 是否等于 1。另一个事实是，无限 **FOR** 循环的 C 代码

 for( ; ; )
 {
    ...
 }

一开始就不会创建这种毫无意义的汇编代码，并且在逻辑上与 "while( 1 )" 相同。

示例：跟踪变量

以下是来自上面示例的 EuclidGCD 函数的 C 代码和优化后的汇编清单。你能确定哪个寄存器包含变量 **r** 和 **q** 吗？

 /*line 1*/
 int EuclidsGCD(int m, int n) /*we want to find the GCD of m and n*/
 {
 	int q, r; /*q is the quotient, r is the remainder*/
 	while(1)
 	{
 		q = m / n; /*find q and r*/
 		r = m % n;
 		if(r == 0) /*if r is 0, return our n value*/
 		{
 			return n;
 		}
 		m = n; /*set m to the current n value*/
 		n = r; /*set n to our current remainder value*/
 	} /*repeat*/
 }

 PUBLIC	_EuclidsGCD
 _TEXT	SEGMENT
 _m$ = 8	; size = 4
 _n$ = 12       ; size = 4
 _EuclidsGCD PROC NEAR	
 ; Line 7
 	mov	eax, DWORD PTR _m$[esp-4]
 	push	esi
 	mov	esi, DWORD PTR _n$[esp]
 	cdq
 	idiv	esi
 	mov	ecx, edx
 ; Line 8
 	test	ecx, ecx
 	je	SHORT $L563
 $L547:
 ; Line 12
 	mov	eax, esi
 	cdq
 	idiv	ecx
 ; Line 13
 	mov	esi, ecx
 	mov	ecx, edx
 	test	ecx, ecx
 	jne	SHORT $L547
 $L563:
 ; Line 10
 	mov	eax, esi
 	pop	esi
 ; Line 15
 	ret	0
 _EuclidsGCD ENDP
 _TEXT	ENDS
 END

在函数开始时，**eax** 包含 m，**esi** 包含 n。当执行 "idiv esi" 指令时，eax 包含商（q），edx 包含余数（r）。"mov ecx, edx" 指令将 r 移动到 **ecx** 中，而 q 在循环的剩余部分没有使用，因此被丢弃。

示例：反编译优化代码

以下是上面示例中介绍的 EuclidGCD 函数的优化后的清单文件。你能将这段汇编代码清单反编译成等效的 "优化" C 代码吗？优化版本在结构上与非优化版本有何不同？

 PUBLIC	_EuclidsGCD
 _TEXT	SEGMENT
 _m$ = 8	; size = 4
 _n$ = 12       ; size = 4
 _EuclidsGCD PROC NEAR	
 ; Line 7
 	mov	eax, DWORD PTR _m$[esp-4]
 	push	esi
 	mov	esi, DWORD PTR _n$[esp]
 	cdq
 	idiv	esi
 	mov	ecx, edx
 ; Line 8
 	test	ecx, ecx
 	je	SHORT $L563
 $L547:
 ; Line 12
 	mov	eax, esi
 	cdq
 	idiv	ecx
 ; Line 13
 	mov	esi, ecx
 	mov	ecx, edx
 	test	ecx, ecx
 	jne	SHORT $L547
 $L563:
 ; Line 10
 	mov	eax, esi
 	pop	esi
 ; Line 15
 	ret	0
 _EuclidsGCD ENDP
 _TEXT	ENDS
 END

更改条件以保持相同结构，我们得到

 int EuclidsGCD(int m, int n)
 {
     int r;
     r = m % n;
     if(r != 0) 
     {
         do
         {
             m = n;
             r = m % r;
             n = r;
         }while(r != 0)
     }
     return n;
 }

由读者自行编译这段新的 "优化" C 代码，并确定是否存在任何性能提升。首先尝试在没有优化的情况下编译这段新代码，然后在开启优化的情况下编译。将新的汇编清单与之前的清单进行比较。

示例：指令配对

问: 为什么 **dec** / **jne** 组合比等效的 **loopnz** 运行得更快？
答: **dec** / **jnz** 对比 **loopnz** 运行得更快的原因有很多。首先，**dec** 和 **jnz** 在 netburst 管道的不同模块中配对，因此可以同时执行。除此之外，**dec** 和 **jnz** 都需要很少的周期来执行，而 **loopnz**（以及所有 loop 指令，实际上）指令需要更多周期才能完成。好的编译器很少看到循环指令的输出。

示例：避免分支

以下是以汇编代码形式呈现的表达式 c ? d : 0。代码中没有分支，那么它是如何工作的？

; ecx = c and edx = d
; eax will contain c ? d : 0 (eax = d if c is not zero, otherwise eax = 0)
neg	ecx
sbb	eax, eax
and	eax, edx
ret

这是一个使用各种算术指令来避免分支的示例。**neg** 指令如果 c 不为零，则设置进位标志；否则，它会清除进位标志。下一行取决于此。如果进位标志被设置，那么 **sbb** 将导致 eax = eax - eax - 1 = 0xffffffff。否则，eax = eax - eax = 0。最后，对此结果执行 **and** 操作可以确保，如果 **ecx** 最初不为零，则 **eax** 将包含 **edx**，否则为零。

示例：Duff's Device

以下 C 代码函数的功能是什么？它有用吗？为什么或为什么不？

void MyFunction(int *arrayA, int *arrayB, int cnt)
{
  switch(cnt % 6) 
  {
    while(cnt != 0) 
    {
      case 0:
        arrayA[--cnt] = arrayB[cnt];
      case 5:
        arrayA[--cnt] = arrayB[cnt];
      case 4:
        arrayA[--cnt] = arrayB[cnt];
      case 3:
        arrayA[--cnt] = arrayB[cnt];
      case 2:
        arrayA[--cnt] = arrayB[cnt];
      case 1:
        arrayA[--cnt] = arrayB[cnt];
    }
  }
}

维基百科有相关信息，请访问 Duff's device

这段代码被称为 **Duff's device** 或 "Duff's machine"。它用于提高效率，部分展开循环。请注意 while() 嵌套在 switch 语句中的奇怪方式？两个整数数组传递给函数，在 while 循环的每次迭代中，会将 6 个连续元素从 arrayB 复制到 arrayA。由于 switch 语句位于 while 循环之外，因此它只在函数开始时发生。对变量 cnt 进行模 6 操作。如果 cnt 不能被 6 整除，那么模运算将在旋转的中间某个位置开始循环，从而防止循环在每次迭代后测试当前计数的情况下导致缓冲区溢出。

Duff's Device 被认为是用于复制字符串、数组或数据流的最有效通用方法之一。