April 24, 2014 - Floating Point Numbers
April 24, 2014
Floating Point Numbers
Last time I wrote, I was implementing "cat" in C with no libc functions. I ran
into a problem with redirected text streams. Since then I have solved the
problem, I was simply mixing up STDIN and STDOUT. Fixing that solved all the
issues. Silly me.
So anyways, I have been encountering many issues using this very low level
style of programming, but I am succeeding in accomplishing things, and in the
process I am gaining a lot of knowledge, so I am going to stick with it.
Today, I am revisiting my framebuffer code indirectly. That has been what I
have been working on, but not what I want to write about exactly. I had the
idea to implement a simple game (something like UntitledOne) that runs on the
framebuffer and requires no libc. My simple game idea involves a ship that can
turn and propel itself forward (like asteroids). To accomplish this usually
requires some very simple code, but I have restricted myself to not using libc
functions, which includes math library functions, which includes sine and cosine
functions, which I need to accomplish this task.
So, I had to implement sine and cosine functions myself. I quickly realized
that this wouldn't be an easy task. I looked at the Musl libm source and did
research online, eventually finding myself reading glibc sources as well. Turns
out the common implementations of sine and cosine are really complicated.
However, I figured there must be an easier way, and then I discovered that
amd64 processors have an instruction to do sine and cosine.
However, it turns out this instruction is part of the FPU, which has it's own
stack using it's own set of registers. In addition, moving values from SSE
registers (XMM0-7) to FPU registers is expensive, and the implementation of
these trig functions in the FPU is not that fast or accurate.
So it was a bit of a dilemma, but I decided that my goal was to learn as much
low level programming as possible and to use the lowest level functionality
available, so I decided to go the FPU route even though technically it goes
against the BetterOS philosophy. My rationalization is that later it can be
easily replaced with equivalent SSE code, but for now, the simplest
implementation is the best implementation. This is an experiment after all and
not an attempt to build a full-fledged application.
So I began the process of learning how to use the fsin and fcos instructions.
Both functions operate on the top value of the FPU stack. In order to get the
value into the FPU stack, we must use the fld function to "load" the value into
the stack. The fsin and fcos instructions place the result on the top of the
stack as well (replacing the original value). So, to get the result of the
trig function, we must use the fstp function to get the result off the stack.
So far, everything is straightforeward. However, I have to write this as a
function which gets called by my C program, and since I am calling CPU
instructions, this all has to be done in assembly. This is where the problems
begin.
In the amd64 archetecture, floating point numbers get passed in the first SSE
register (XMM0). The XMM0 register is 128 bits wide, unlike the general
purpose registers which are 64 bits wide, and unlike the FPU registers which
are 80 bits wide. So we can already tell that this is going to be trouble. We
cannot move values directly from SSE registers into FPU registers, so we have
to first store the value in memory and then load it from memory into the FPU
register.
So the first thing I tried was just to copy the value from XMM0 into the memory
pointed to by [esp], which was a silly idea since that space should hold the
return address for the function and replacing that will make the function
return to the wrong place and probably cause a segfault. Which is exactly what
happened, a segfault. So I figured that I should push it onto the stack instead
and that way it wouldn't overwrite the return address. Seems there is no push
or pop instruction for the SSE registers. So the solution I finally came to was
to manually make some room on the stack and then mov the value from XMM0 into
the memory space on the stack, and then load it into the FPU register stack,
perform the fsin/fcos instruction and then pop it out of the FPU register stack
back into memory, then mov it from memory back into the XMM0 register.
The finished assembly code looks like this:
sin:
finit ;
sub rsp,16 ; Make room for the value
movsd QWORD [rsp],xmm0 ;
fld QWORD [rsp] ; ST0
fsin ;
fstp QWORD [rsp] ; ST0
movsd xmm0,QWORD [rsp] ;
add rsp,16 ;
ret ;
Then I spent a long time testing because I ran into another problem. My test
code draws a line from x,y (300,300) to x,y + cos(dir),sin(dir) where dir
starts at zero and can be incremented/decremented by pressing a and w on the
keyboard (read though /dev/input/event5). However, my line drawing function
needed a signum function, which would be easy enough to implement in C, but I
figured it was simple so I thought I would implement that in asm as well. So I
did, and it appeared to work at first. That was until I actually tested it.
Basically the code just compares rdi to 0 and then sets rax to 1, 0, or -1
depending on the result. However, there is a logical error in that code, integer
arguements do get passed in rdi, but rdi is a 64 bit value and integers in C
are 32 bits. So the sign bit was never being interpreted properly resulting in
my function always returning 1 or 0, but never -1.
Fortunately, the fix was easy, I just had to use edi instead of rdi, since edi
is the lower 32 bits of rdi, causing the sign to be interpreted correctly. Took
me an embarrassing amount of time to find the problem though.
In other news, as I already mentioned, I plan to write a small game similar to
UntitledOne that uses the framebuffer and no libc. I have implemented some
simple math functions as described in this post, and I have also implemented
simple input handling using Linux event device files. I may decide later to
give more details about how I did that, but right now I don't think it's worth
it because it wasn't really a challenge and there wasn't much to learn from it.
Anyways, that's all for now. Till next time.