: An attempt to make computer machines run better


Web Log: INDEX |

Web Log

April 24, 2014 - Floating Point Numbers

April 24, 2014

Floating Point Numbers

Last time I wrote, I was implementing "cat" in C with no libc functions. I ran into a problem with redirected text streams. Since then I have solved the problem, I was simply mixing up STDIN and STDOUT. Fixing that solved all the issues. Silly me.
So anyways, I have been encountering many issues using this very low level style of programming, but I am succeeding in accomplishing things, and in the process I am gaining a lot of knowledge, so I am going to stick with it.

Today, I am revisiting my framebuffer code indirectly. That has been what I have been working on, but not what I want to write about exactly. I had the idea to implement a simple game (something like UntitledOne) that runs on the framebuffer and requires no libc. My simple game idea involves a ship that can turn and propel itself forward (like asteroids). To accomplish this usually requires some very simple code, but I have restricted myself to not using libc functions, which includes math library functions, which includes sine and cosine functions, which I need to accomplish this task.
So, I had to implement sine and cosine functions myself. I quickly realized that this wouldn't be an easy task. I looked at the Musl libm source and did research online, eventually finding myself reading glibc sources as well. Turns out the common implementations of sine and cosine are really complicated.
However, I figured there must be an easier way, and then I discovered that amd64 processors have an instruction to do sine and cosine.
However, it turns out this instruction is part of the FPU, which has it's own stack using it's own set of registers. In addition, moving values from SSE registers (XMM0-7) to FPU registers is expensive, and the implementation of these trig functions in the FPU is not that fast or accurate.
So it was a bit of a dilemma, but I decided that my goal was to learn as much low level programming as possible and to use the lowest level functionality available, so I decided to go the FPU route even though technically it goes against the BetterOS philosophy. My rationalization is that later it can be easily replaced with equivalent SSE code, but for now, the simplest implementation is the best implementation. This is an experiment after all and not an attempt to build a full-fledged application.

So I began the process of learning how to use the fsin and fcos instructions. Both functions operate on the top value of the FPU stack. In order to get the value into the FPU stack, we must use the fld function to "load" the value into the stack. The fsin and fcos instructions place the result on the top of the stack as well (replacing the original value). So, to get the result of the trig function, we must use the fstp function to get the result off the stack.
So far, everything is straightforeward. However, I have to write this as a function which gets called by my C program, and since I am calling CPU instructions, this all has to be done in assembly. This is where the problems begin.
In the amd64 archetecture, floating point numbers get passed in the first SSE register (XMM0). The XMM0 register is 128 bits wide, unlike the general purpose registers which are 64 bits wide, and unlike the FPU registers which are 80 bits wide. So we can already tell that this is going to be trouble. We cannot move values directly from SSE registers into FPU registers, so we have to first store the value in memory and then load it from memory into the FPU register.
So the first thing I tried was just to copy the value from XMM0 into the memory pointed to by [esp], which was a silly idea since that space should hold the return address for the function and replacing that will make the function return to the wrong place and probably cause a segfault. Which is exactly what happened, a segfault. So I figured that I should push it onto the stack instead and that way it wouldn't overwrite the return address. Seems there is no push or pop instruction for the SSE registers. So the solution I finally came to was to manually make some room on the stack and then mov the value from XMM0 into the memory space on the stack, and then load it into the FPU register stack, perform the fsin/fcos instruction and then pop it out of the FPU register stack back into memory, then mov it from memory back into the XMM0 register.
The finished assembly code looks like this:
		finit						;
		sub		rsp,16				; Make room for the value
		movsd	QWORD [rsp],xmm0	;
		fld		QWORD [rsp]			; ST0
		fsin						;
		fstp	QWORD [rsp]			; ST0
		movsd	xmm0,QWORD [rsp]	;
		add		rsp,16				;
		ret							;

Then I spent a long time testing because I ran into another problem. My test code draws a line from x,y (300,300) to x,y + cos(dir),sin(dir) where dir starts at zero and can be incremented/decremented by pressing a and w on the keyboard (read though /dev/input/event5). However, my line drawing function needed a signum function, which would be easy enough to implement in C, but I figured it was simple so I thought I would implement that in asm as well. So I did, and it appeared to work at first. That was until I actually tested it.
Basically the code just compares rdi to 0 and then sets rax to 1, 0, or -1 depending on the result. However, there is a logical error in that code, integer arguements do get passed in rdi, but rdi is a 64 bit value and integers in C are 32 bits. So the sign bit was never being interpreted properly resulting in my function always returning 1 or 0, but never -1.
Fortunately, the fix was easy, I just had to use edi instead of rdi, since edi is the lower 32 bits of rdi, causing the sign to be interpreted correctly. Took me an embarrassing amount of time to find the problem though.

In other news, as I already mentioned, I plan to write a small game similar to UntitledOne that uses the framebuffer and no libc. I have implemented some simple math functions as described in this post, and I have also implemented simple input handling using Linux event device files. I may decide later to give more details about how I did that, but right now I don't think it's worth it because it wasn't really a challenge and there wasn't much to learn from it.
Anyways, that's all for now. Till next time.