May 12, 2014 - Problems Encoutered with the Optimizer

May 12, 2014

Problems Encoutered with the Optimizer

My current project is nearing completion. I have been testing it both with no optimization, and also with a high level of optimization in GCC. I have also tested a few other optimization flags at one time or another, but some do not work because I don't feel like fixing the problems I encountered because I don't care about those optimization levels enough. I've been learning a lot about what the optimizer does, since each level of optimization creates its own set of problems that I have to figure out. It does a lot of things I didn't expect.

First of all, a lot of weird things happen in the math functions. I originally implemented only sin, cos, and atan2 because those are the only functions I call in my C code. This works fine with no optimization, but GCC's optimizer decides to change these calls. In -O2 and -O3, GCC combines sin and cos calls into one sincos call. This is great and all, except when I haven't implemented sincos. However, sincos is about as easy as implementing sin and cos seperately was (actually, easier). Then at -Ofast, GCC changes atan2 to __atan2_finite, which I am not going to bother with right now. To me that seems a little funny, GCC makes assumptions about your math library, sincos is not so bad because that's kind of a standard function, but __atan2_finite not so much, I didn't do the research, but that seems like a glibc specific function, so maybe -Ofast could break builds with non-glibc math libraries.

However, the much more troubling issue occurred this afternoon. I was just polishing up "everything mode" which gives you waves of different enemies each time, when I noticed that asteroids mode was now causing a segfault within a few seconds of starting. A weird regression since I hadn't changed that code much for a long while.
So I ran it through a debugger, and this is what the debugger told me:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000406ed1 in draw_circle (pixel=16777215, radius=20, cy=200, cx=500) at fb.c:264
264		if (cx-radius<=1 || cx+radius>=vinfo.xres)

Which, of course, made me really confused. I spent hours trying to figure it out and some futile programming by permutation. It was so crazy that the circle drawing code worked perfectly everywhere except the asteroids drawing code. At one point I was getting ready to report the problem to the GCC maintainers as a bug (but I didn't do it). Eventually I decided to take a look at the assembly code in the debugger. This is where the segfault was occurring:
8206		movapd	%xmm9, 32(%rsp)

Took some time for the answer to occur to me, but I did eventually figure it out, and then I realized how deep this whole thing went. Turns out "movapd" is one of them SSE2 instructions, one that fails if invoked with non-16-bit-aligned memory. Somehow, every other time that function was called happened to be with a properly aligned stack except when the asteroid wanted to be drawn.
Now, of course this is my own fault for not aligning the stack properly, but it points out the GCC optimizer makes more assumptions. I guess it's not really a problem, the optimizer does have to make some assumptions, but it sure took me a lot longer than it needed to to fix, especially considering that the fix is one instruction in _start in crt.s. The given the GDB output, its really not obvious that the actual problem is in the startup code and nowhere near the actual segfault.

Oh well, gotta learn somehow, that's all for now. Align your stack!