GCC “-fomit-frame-pointer”编译选项的含义

在 makefile 中看到这个编译选项,不太理解于是查了一下,相关的东西还不多,抄一下放在这里

首先这个文章,http://blog.csdn.net/byzs/arti…,讲的不错,挺清晰的:

优化你的软件时,发觉”-fomit-frame-pointer”这个选项还是蛮有用的。

GCC手册上面这么说:
Don’t keep the frame pointer in a register for functions that don’t need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.

On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn’t exist. The machine-description macro “FRAME_POINTER_REQUIRED” controls whether a target machine supports this flag.

这里,引入了一个”frame pointer”的概念,什么是”stack frame pointer(SFP)”呢?

我们知道,backtrace是利用堆栈中的信息把函数调用关系层层遍历出来的,其中这里的堆栈信息就是SFP。
一般情况下,每一个函数都包含一个堆栈边界指针,也就是说会存在一个栈底和栈顶指针。在X86下,假设堆栈由上往下发展,栈底大地址而栈顶小地址,那么,通常情况下,寄存器ESP为栈顶指针,而EBP就为栈底指针。而EBP和ESP之间的空间就是这个函数的stack frame。
GCC在默认情况下会在每个函数的开始加入一些堆栈设置代码,而在函数退出的时候恢复原来的样子,SFP就是在这个时候设置的。还是看一下这个时候的汇编代码吧 😉

环境:X86+Redhat 9.0,gcc 3.2.2

源文件如下:

$ cat test.c
void a(unsigned long a, unsigned int b)
{
unsigned long i;
unsigned int j;

        i = a;
j = b;

        i++;

        j += 2;

}

默认编译选项:
$ gcc -c test.c -o with_SFP.o

反汇编后是这个样子:
$ objdump -D with_SFP.o

with_SFP.o:     file format elf32-i386

Disassembly of section .text:

00000000 <a>:
0:   55                      push   %ebp
1:   89 e5                   mov    %esp,%ebp
3:   83 ec 08                sub    $0x8,%esp
6:   8b 45 08                mov    0x8(%ebp),%eax
9:   89 45 fc                mov    %eax,0xfffffffc(%ebp)
c:   8b 45 0c                mov    0xc(%ebp),%eax
f:   89 45 f8                mov    %eax,0xfffffff8(%ebp)
12:   8d 45 fc                lea    0xfffffffc(%ebp),%eax
15:   ff 00                   incl   (%eax)
17:   8d 45 f8                lea    0xfffffff8(%ebp),%eax
1a:   83 00 02                addl   $0x2,(%eax)
1d:   c9                      leave
1e:   c3                      ret
Disassembly of section .data:

可以看到函数ENTER时首先把上一层函数的EBP入栈,设置本函数的EBP,然后会根据临时变量的数量和对齐要求去设置ESP,也就产生了函数的stack frame。
我们再看看函数的返回:”leave”指令相当于”mov %ebp,%esp;pop %ebp”,也就是ENTER是两条指令的恢复过程,所以,后面的”ret”指令和”call”指令对应。
这里backtrace就可以根据现有函数EBP指针得知上一个函数的EBP—-栈底再往上保存着上一个函数的EBP和EIP,然后就可以得知函数调用的路径。

SFP是可以在编译时候优化掉的,用”-fomit-frame-pointer”选项

编译:
$ gcc -fomit-frame-pointer -c test.c -o no_SFP.o

$ objdump -D no_SFP.o

no_SFP.o:     file format elf32-i386

Disassembly of section .text:

00000000 <a>:
0:   83 ec 08                sub    $0x8,%esp
3:   8b 44 24 0c             mov    0xc(%esp,1),%eax
7:   89 44 24 04             mov    %eax,0x4(%esp,1)
b:   8b 44 24 10             mov    0x10(%esp,1),%eax
f:   89 04 24                mov    %eax,(%esp,1)
12:   8d 44 24 04             lea    0x4(%esp,1),%eax
16:   ff 00                   incl   (%eax)
18:   89 e0                   mov    %esp,%eax
1a:   83 00 02                addl   $0x2,(%eax)
1d:   83 c4 08                add    $0x8,%esp
20:   c3                      ret
Disassembly of section .data:

这里把EBP省掉了,ESP兼职了EBP的部分工作(索引临时变量)。
显而易见,代码难懂了;-P, 代码执行长度缩短了,应该能引起效率的提升。 可恶的是,不能用backtrace调试了。

看一下arm下面的情况:
含有SFP的版本:
$ arm-linux-objdump -D SFP_arm.o

SFP_arm.o :     file format elf32-littlearm

Disassembly of section .text:

00000000 <a>:
0:   e1a0c00d        mov     ip, sp
4:   e92dd800        stmdb   sp!, {fp, ip, lr, pc}
8:   e24cb004        sub     fp, ip, #4      ; 0x4
c:   e24dd010        sub     sp, sp, #16     ; 0x10
10:   e50b0010        str     r0, [fp, -#16]
14:   e50b1014        str     r1, [fp, -#20]
18:   e51b3010        ldr     r3, [fp, -#16]
1c:   e50b3018        str     r3, [fp, -#24]
20:   e51b3014        ldr     r3, [fp, -#20]
24:   e50b301c        str     r3, [fp, -#28]
28:   e51b3018        ldr     r3, [fp, -#24]
2c:   e2833001        add     r3, r3, #1      ; 0x1
30:   e50b3018        str     r3, [fp, -#24]
34:   e51b301c        ldr     r3, [fp, -#28]
38:   e2833002        add     r3, r3, #2      ; 0x2
3c:   e50b301c        str     r3, [fp, -#28]
40:   e91ba800        ldmdb   fp, {fp, sp, pc}
Disassembly of section .data:

优化后的版本:
$ arm-linux-objdump -D no_SFP_arm.o

no_SFP_arm.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <a>:
0:   e24dd010        sub     sp, sp, #16     ; 0x10
4:   e58d000c        str     r0, [sp, #12]
8:   e58d1008        str     r1, [sp, #8]
c:   e59d300c        ldr     r3, [sp, #12]
10:   e58d3004        str     r3, [sp, #4]
14:   e59d3008        ldr     r3, [sp, #8]
18:   e58d3000        str     r3, [sp]
1c:   e59d3004        ldr     r3, [sp, #4]
20:   e2833001        add     r3, r3, #1      ; 0x1
24:   e58d3004        str     r3, [sp, #4]
28:   e59d3000        ldr     r3, [sp]
2c:   e2833002        add     r3, r3, #2      ; 0x2
30:   e58d3000        str     r3, [sp]
34:   e28dd010        add     sp, sp, #16     ; 0x10
38:   e1a0f00e        mov     pc, lr
Disassembly of section .data:

这里,”fp”充当了”EBP”的角色,ESP在X86里面被leave隐含的恢复好了,所以没有显示设置的必要。
看起来arm平台上”-fomit-frame-pointer”选项的优化作用更加明显。

然后后面又看到这里,http://zhouzhen1.blogbus.com/l…,也讲到函数调用的压栈过程的 ebp,esp 指针的问题,但是感觉没太看懂:

今天下班前看了一下工作中涉及到的C++程序在VS.net环境中的编译选项。发现其中很多东西对我来说很陌生,公司的手册只是告诉我哪些要怎么设置,但并不提这个参数有什么意义,或者这样设的目的是什么。

刚才上网查了有关omit frame pointer的资料,

http://www.codeproject.com/tip…

我们说所谓函数调用反映在汇编层面表现为对栈的操作,而frame pointer指的是一个叫做ebp的寄存器,其中存放该函数的基址。发生函数调用时,调用者首先将函数变量以及返回地址入栈,而后代码指针转入被调用函数,这里首先将调用者的基址(已在调用者开始时存在ebp中)入栈,同时将自己的基址记入ebp,然后修改栈顶指针esp以容纳被调用函数自身的局部变量,函数退出时,将栈顶指针恢复至自己的基址(从ebp中取得),然后将调用者的基址弹出至ebp并返回。

所以说ebp用于记录函数的基址,用于在函数返回时的清栈工作。而在上面的链接中,作者提出利用这个处理过程的汇编代码来寻找栈操作汇编代码。

那么为什么VS.net可以忽略frame pointer?如果忽略的话,函数返回时岂不是不知道自己的基址了,怎么完成退栈呢?网上没人提这个,我想会不会是因为在编译过程中其实已经可以知道这个信息,所以直接把一个立即数赋给esp就可以了?

后面又看到这里,http://stackoverflow.com/quest…,也讲到 ebp 可以减少栈操作:

-fomit-frame-pointer allows one extra register to be available for general-purpose use. I would assume this is really only a big deal on 32-bit x86, which is a bit starved for registers.*

One would expect to see EBP no longer saved and adjusted on every function call, and probably some additional use of EBP in normal code, and fewer stack operations on occasions where EBP gets used as a general-purpose register.

Your code is far too simple to see any benefit from this sort of optimization– you’re not using enough registers. Also, you haven’t turned on the optimizer, which might be necessary to see some of these effects.

* ISA registers, not micro-architecture registers.

而这里,http://stackoverflow.com/quest…,也讲到了,在简单的上下文环境中,这个选项其实是意义不大的:

Most smaller functions don’t need a frame-pointer – larger functions MAY need one.

It’s really about how well the compiler manages to track how the stack is used, and where things are on the stack (local variables, arguments passed to the current function and arguments being prepared for a function about to be called). I don’t think it’s easy to characterize the functions that need or don’t need a frame-pointer [technically, NO function HAS to have a frame-pointer – it’s more a case of “if the compiler deems it necessary to reduce the complexity of other code”].

I don’t think you should “attempt to make functions not have frame-pointer” as part of your strategy for coding – like I said, simple functions don’t need them, so use -fomit-frame-pointer, and you’ll get one more register available for the register allocator, and save 1-3 instructions on entry/exit to functions. If your function needs a frame-pointer, it’s because the compiler decides that’s a better option than using frame pointer. It’s not a goal to have functions without frame pointer, it’s a goal to have code that works both correctly and fast.

Note that “not having framepointer” should give better performance, but it’s not some magic bullet that gives enormous improvements – particularly not on x86-64, which already has 16 registers to start with. On 32-bit x86, since it only has 8 registers, one of which is stackpointer, and taking up another as the frame-pointer means 25% of register-space is taken. To change that to 12.5% is quite an improvement. Of course, compiling for 64-bit will help quite a lot too.

最后,gnu gcc 的文档这里,http://gcc.gnu.org/onlinedocs/…,讲到了这个选项的默认配置:

-fomit-frame-pointerDon’t keep the frame pointer in a register for functions that don’t need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.

On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn’t exist. The machine-description macro FRAME_POINTER_REQUIRED controls whether a target machine supports this flag. See Register Usage.

Enabled at levels -O-O2-O3-Os.

Leave a Reply

Your email address will not be published. Required fields are marked *