BetterOS.org : an attempt to make computer machines run better

BetterOS.org : an attempt to make computer machines run better


home | better linux | games | software | tutorials | reference | web log |
index | C | x86_64 assembly | riscv32 assembly | riscv64 assembly | webassembly | C 1 (old) | C 2 (old) | C 3 (old) | C 4 (old) | low-level graphics |
introduction

Webassembly is still very new, but is now supported in all recent releases of major browsers. Compilers now exist for several high-level languages targeting webassembly, with lots of tutorials detailing how to compile to webassembly.
However, there are practically no tutorials about actually writing the assembly-like webassembly text code directly. And those that do exist don't even get up to the standard "hello world" example. This tutorial intends to solve this.

what is webassembly?

Webassembly is named very poorly and unfortunately. It is not an assembly language by any definition, it's actually an instruction set architecture for a virtual machine and a binary executable format. However, it also defines a "webassembly text format", which is somewhat assembly-like, although it is most certainly different from any assembly language you may have worked with before. This makes searching for webassembly near impossible, since most sources conflate webassembly and the text format and use "webassembly" to describe both.
For the purposes of this article I will use the term "webassembly byte-code" to refer to the binary webassembly format, and "text format" to refer to the webassembly text format. If I slip and accidentally write just "webassembly", it is likely that I meant "text format".
Is the text format an assembly language though? In a way it is. Its instructions map nearly 1-to-1 to the webassembly byte-code, but the text format still is much more structured than any real assembly language. Its use of s-expressions was a very strange choice, which makes the text format look and feel even less like a real assembly language. Additionally, there are some quirks in the language which should not be present in an assembly language. Webassembly is also different in that it has both variables and functions, something no real assembly languages have (they are usually considered high-level constructs). Still, the near 1-to-1 mapping to webassembly virtual machine byte-code is a property of an assembly language. For this reason, I would like to define it as (and coin the term) a "virtual assembly language".
Don't let any of this disuade you from learning the text-format though, it may not be perfect, but its really the only candidate for a possible future javascript replacement, and at least it has integers.

webassembly vs javascript

Currently, using webassembly can't fully replace javascript. Some javascript is required to define the webassembly virtual machine, load the webassembly byte-code, and to allow webassembly code to interact with the DOM. There are future plans to allow webassembly to interact with the DOM directly, but at the time of this writing, interveining javascript code is still required.
Since this is a webassembly tutorial, I will try to be light on javascript, but all examples will still require some javascript glue. To avoid repeating this javascript glue, I will omit javascript code from some examples, in these cases, it means that the javascript code is the same as it was in the previous example.

software

Software required for this tutorial is simple. You will need a web browser that supports webassembly. I personally use firefox, but chromium also supports webassembly. A reasonably recent version is required, webassembly is quite new. You will also need a text editor for writing text format code. I prefer geany, because it's open-source, offers syntax highlighting, and most importantly, has an embedded terminal emulator, but there are plenty of other text editors available. You will also need a terminal emulator and some kind of shell, such as bash. I will assume that you already know how to use your shell. Finally, you will need a webassembly assembler. I prefer the one in the WebAssembly Binary Toolkit (wabt), which includes an assembler (wat2wasm) and a disassembler (wasm2wat). If you prefer another assembler (such as emscripten), you may need to adapt some (or all) of the examples given. You will also need access to a web server. I use my own custom web server, "unweave", but other servers will work just as well. Cherokee is a good choice, but apache or nginx are fine too. You could even use a remote web hosting service if you are unable to run a server locally.
In list form:

hello integers

Unfortunately, the traditional "hello world" example is a bit more complicated in webassembly. We will get there, but let's first start with something simpler. Let's first look at a simple webassembly function that returns a constant integer.
(module (type (func (result i32))) (func (type 0) i32.const 14 ) )
This example is completely valid, but unfortunately, it is pretty useless by itself, it can't even be called from javascript. However, we can see how the webassembly text format code is structured.
The entire module is organized using s-expressions (everything is organized with deeply nested parenthesis). At the top-level of each module is the module expression. Each expression nested directly in the module expression represents something is to be added to a section in the webassembly byte-code. Similar to native executable binary formats (like ELF, MACH-O, or PE), the webassembly byte-code binary is organized into sections. The "type" section stores records representing the data types of arguments and return values for a function. The function section stores a table of functions, and the code section stores the body of each entry in the functions section.
In text format, "type" specifies an entry in the type section. Only type records of type "func" are currently allowed. Each func type record defines an optional return type, and 0 or more parameters, each with a corresponding data type.
"func" specifies an entry in the function section. Each function entry refers back to an entry in the type section. Notice that entries in the type section are indexed sequentially starting at 0. So type 0 refers to the first entry, and type 1 would refer to the second entry. Functions are also indexed this way, which we will see in the next example.
Although the actual body of a function get stored in the code section of the byte-code, in the text format, it can be written in the "func" section expression, the assembler will take care of building the binary properly.
In this example, the function body consists of one single instruction with one operand. "i32.const" pushes its operand onto the stack. When the function ends, the last value on the stack becomes the return value for the function.

running webassembly

The previous example shows how webassembly is structured, but it can't actually be executed because there is no way to call the function it defines. To do this, we need to add 2 things. One is an export section to make the function visible to javascript code, and the other is javascript code to call the webassembly function.Let's first add the export section to our webassembly text format code.
(module (type (func (result i32))) (func (type 0) i32.const 14 ) (export "fourteen" (func 0)) )
Next, we need some javascript. There are several ways to load and execute webassembly from javascript. The binary can be stored in another file or inlined in hexidecimal in a javascript arraybuffer object. I prefer a seperate file. The byte-code can be loaded in one of two ways, one requires a specific mime-type from the server when it is downloaded, but my webserver does not support this so I prefer the other method. Below is an example of javascript code used to load and execute the exported webassembly function.
fetch('fourteen.wasm').then(response => response.arrayBuffer() ).then(bytes => WebAssembly.instantiate(bytes, { }) ).then(obj => { console.log(obj.instance.exports.fourteen()); });

I would rather not explain all the javascript, but effectively it downloads the webassembly file, stores it in an arraybuffer, loads the webassembly virtual machine, then finally calls the function and writes the result to the browser's javascript console. This code assumes that the webassembly byte-code module is stored on the server as a file called "fourteen.wasm".
Also, the webassembly module needs to be assembled. To do this, save the text format code in a text file (the extention .wat seems to be preferred), and then assemble using wat2wasm (or your assembler of choice). The resulting .wasm file should be the one loaded by the javascript code.

data types

Webassembly has data types (not to be confused with entries in the types section). There are 4 primative data types in webassembly: i32, i64, f32, and f64.
i32 and i64 are 32 bit and 64 bit sign-agnostic integers, respectively. f32 and f64 are 32 bit and 64 bit IEEE floating point numbers, respectively.
If you are familiar with javascript, you may notice that, in our example, we returned a 32 bit integer to javascript code, yet javascript doesn't have have an integer type. Nevertheless, the javascript code handles the return value correctly. This is possible because the browser or javascript engine does the conversion automatically.

the stack

Webassembly is a mostly a stack based architecture. This means that all operations operate on "the stack" instead of registers. For instance, to add two integers, you would first push the two integers onto the stack, then issue an add instruction. The webassembly vm will then pop the last two integers off the stack, add them together and then push the result onto the stack. In our examples so far, we have also used the stack to return a value from a webassembly function.
The stack-based nature of webassembly can make writing webassembly feel unfamiliar if you are used to real assembly languages, but in reality, it isn't difficult to get the hang of it.
There are some exceptions though, which we will look at later when we talk about function arguments and local variables.

If you are unfamiliar with the concept of a stack, I will give a breif explaination, but I recommend you research this on your own. The stack is a very important concept in webassembly, and having a good understanding of how it works is very important.
A stack is a data structure, similar to an array. A stack is special because data can only be added to the end of the stack (referred to as the "top" of the stack), using an operation called a "push". Likewise, data can only be read from the end of the stack. Reading a value from the stack also removes it from the stack. The read operation is called a "pop".

addition example

I've already mentioned addition and function arguments, let's take a look at some example code that takes two integer arguments, adds them, and returns the result.
(module (type (func (param i32 i32) (result i32))) (func (type 0) local.get 0 local.get 1 i32.add ) (export "add" (func 0)) )
And the corresponding javascript:
fetch('addition.wasm').then(response => response.arrayBuffer() ).then(bytes => WebAssembly.instantiate(bytes, { }) ).then(obj => { console.log(obj.instance.exports.add(1, 2)); });


arguments

Notice, in this example, we have changed the function signature in the type section. It now includes a "param" expression. param is followed by two i32 keywords, indicating that this function takes two arguments, and both are 32 bitintegers.
Also note the changes in the function body. There now we have 3 instructions instead of just one. This is where things start to get weird in my opinion. Because webassembly is stack-based, I would have expected arguments to be on the top of the stack at the beginning of a function's execution. This is NOT the case.
Although arguments are passed to functions by pushing them onto the stack before calling the function, when the called function gets control, the arguments are no longer on the stack, but instead stored somewhere else and need to be fetched using special instructions. The "local.get" instruction takes one operand, which is the index of the argument or local variable to get, and pushes that value onto the stack.
To add the functions two arguments together, we actually need to fetch the arguments and push them onto the stack again at the beginning of the function. Once both arguments are on the stack, we can issue the "i32.add" instruction to pop them off the stack, add them, and push their sum onto the stack. Since this is the value we want to return, there is no need to do anything else with it, since it is already on the top of the stack.

On a side note, since this is the first example with multiple instructions in the function body, notice that the instructions in the function body are written linearly and not structured with s-expressions. There actually is a way of writting all instructions as s-expressions (called "folded expressions"), but I think it makes reading and writing webassembly very difficult and avoid it at all cost. If you are interested in this format, wasm2wat is capable of outputting instructions in this format using the --fold-exprs command line option. It's pretty terrible, it's exactly what you would expect of an assembly language designed by javascript and web developers.

calling javascript

Webassembly can't do very much on its own. It can't interact with the browser or the user at all. However, it can call back to javascript code, which can do anything that javascript would normally do. Using this, we can write javascript wrapper functions which effectively allow us to perform actions which webassembly normally wouldn't be able to do. Let's look at how this can be done.
Just like we can export a function from webassembly, making available for javascript to call, we can also import javascript functions into the webassembly module, allowing the imported function to be callable from within the webassembly module.
Our next example shows how this can be done.

(module (type (func (param i32))) (import "js" "console_log" (func (type 0))) (func (type 0) local.get 0 call 0 ) (export "write_to_console" (func 1)) )
And the corresponding javascript:
fetch('console_log.wasm').then(response => response.arrayBuffer() ).then(bytes => WebAssembly.instantiate(bytes, {js: {console_log: console.log}}) ).then(obj => { obj.instance.exports.write_to_console(14); });


imported functions

Notice in this example, a few things have changed. We have added an import section to our webassembly, which specifies the function to be imported into the webassembly module. If you look at the javascript for this example, you can see how we map the imported function to a javascript function. In this case we just import console.log, but you could just as easily use a user defined function.
In our text format code, we define and export one function, just like we did before, but if you look at the export section, you can see that we are now exporting function 1 instead of function 0. This is because imported functions get indexed together with webassembly defined functions. Imported functions are always index before webassembly functions, so the imported function is func 0, and our webassembly function becomes func 1.

string examples

So far we have only looked at handling and passing integers. However, on the web, strings are also very important. This is where things start to get more complicated, but bear with me. While javascript number (IEEE 64 bit floating point) to webassembly i32, i64, f32, or f64 conversions happen automatically, javascript string to webassembly is a more manual process.
Let's look at how we can handle string data and pass that data from a javascript context to a webassembly context.

memory

So far we have only looked at handling and passing integers. However, on the web, strings are also very important. This is where things start to get more complicated, but bear with me. While javascript number (IEEE 64 bit floating point) to webassembly i32, i64, f32, or f64 conversions happen automatically, javascript string to webassembly is a more manual process.
Let's look at how we can handle string data and pass that data from a javascript context to a webassembly context.
Previously, we looked at how a javascript function can be imported into a webassembly module. In a similar fashion, we can import other things into webassembly as well. In particular, we can import javascript arrayBuffers and use them within webassembly as memory (well ... "memory").
Webassembly is a load/store architecture, which means reading from and writing to memory requires the use of load or store instruction. This makes a lot of sense for a stack-based architecture.

encoding

When we start dealing with text in webassembly, we have to figure out what encoding we want to use, unlike in javascript where most people aren't even aware of what encoding is in use (hint: it's not utf8). The easiest thing to do would be to store one character per 32 bits, since that would allow us to store a character in an i32 type. However, for mainly latin based text (such as english), this would be a huge waste of memory. It would be better to use something like utf8, since that would be much more space efficient for latin based text, but handling multibyte characters would be more complicated. Using something like utf16 (like javascript uses) would be both inefficient and complicated to encode and decode. So for the purposes of this example, let's just use simple ascii encoding.
Let's look at two examples, the first will be an ugly example where we build a string in the function itself, and a second, prettier example where we just tell webassembly to initialize the memory at load time.

the ugly example

(module (type (func (result i32))) (import "js" "mem" (memory 1)) (func (type 0) i32.const 32 i32.const 0x48 i32.store8 i32.const 33 i32.const 0x65 i32.store8 i32.const 34 i32.const 0x6c i32.store8 i32.const 35 i32.const 0x6c i32.store8 i32.const 36 i32.const 0x6f i32.store8 i32.const 37 i32.const 0x00 i32.store8 i32.const 32 ) (export "hello" (func 0)) )
Corresponding javascript:
function mem_to_str(memory, pointer) { var str = ""; var buffer = new Uint8Array(memory); var i = pointer; console.log(buffer); while (buffer[i]) { str += ""+String.fromCharCode(buffer[i]); i++; } return str; } var memory = new WebAssembly.Memory({initial:1}); fetch('wasmtest.wasm').then(response => response.arrayBuffer() ).then(bytes => WebAssembly.instantiate(bytes, { js: { mem: memory } }) ).then(obj => { var str = obj.instance.exports.hello(); console.log(mem_to_str(memory.buffer, str)); });
I know, I know... It's ugly... I tried to warn you.
In this example, there are a couple things we need to look at. First of all, we have an import section, just like before, but this time we didn't use a func expression, instead we use "memory". Note the 1 in the memory expression. That looks like an index, but it is not. That 1 is actually the number of "pages" we want to make available to webassembly. In webassembly, a "page" is 64 kb, so 1 would mean 65536 bytes, and 2 would mean 131072 bytes. Since we only need to store the string "hello", we only need 6 bytes, so 1 page is plenty. In javascript, WebAssembly.Memory also takes the size of allocated space in "pages".
In the function body, we use i32.const to push two values onto the stack for each character. The first is the position in memory that we want to place it, and the second is the ascii code for the correct character. We can then call i32.store8, which is a store instruction that pops two i32 values off the stack and stores the top value in memory at the position of the other value. store8 only stores the lower 8 bits of the i32 value. There also exist i32.store16 and i32.store instructions, which store the lower 16 bits or all 32 bits respectively. For i64 types, there are i64.store8, i64.store16, i64.store32, and i64.store.
Once we have all 5 characters, plus a null character, stored in memory, we can return the index of the start of the string to the javascript code. Then, in javascript, we need a function to iterate over the memory arrayBuffer, starting at the index we returned and ending when it finds a null byte, and building a javascript string from that.

Now that we have seen how this can be done, let's look at a prettier way of doing the same thing.

the prettier example

(module (type (func (result i32))) (import "js" "mem" (memory 1)) (data (i32.const 32) "Hello World") (func (type 0) i32.const 32 ) (export "hello" (func 0)) )
Looks a lot better, right?
In this example, we used a new section, the data section, to store the string in memory. The data section basically just copies data into webassembly memory when the module is instantiated. We can specify where in the memory we want the data to be placed, and then in our function, just return that index and all the javascript we wrote for the last example will work exactly the same way.

comments

Although we did get that second example looking much nicer than the that mess we made by adding characters one by one to memory, the messiness highlights how easy it is for webassembly code to get complicated and hard to read. Fortunately, just like other languages, webassembly has comments. We are actually incredibly lucky that we got these, since webassembly was developed to work with javascript, and a lot of javascript devs seem to hold the (insane) belief that comments are outdated and only for "bad" code. Another language that has spun-off from javascript, JSON, has no support for comments at all. And given that webassembly wasn't designed to be hand-written, I'm surprized (but grateful) that they bothered.
Comments allow you to write anything you want in your code, and the assembler will completely ignore it. This allows you to write plain text explanations of what your code is supposed to be doing, so later, when you have forgotten what you were thinking when you originally wrote the code, or if someone else is reading your code, it will be a lot easier to understand. Good comments can make any code, no matter how messy, easily understandable and maintainable. Comments are especially
Webassembly has both line-based comments, and block-comments. Line based comments begin with a double semicolon, and go to the end of a line. Block comments start with an open parenthesis followed by a semicolon, and continue until a semicolon followed immediately by a closing parenthesis. Personally, I like line-based comments, but both types have their purposes. Block comments can be placed mid-line, in the middle of code, which can sometimes be useful.

Let's look at the previous example again with comments added.

(module (type (func (result i32))) (import "js" "mem" (memory 1)) ;; import 1 page of memory from js (data (i32.const 32) "Hello World") ;; copy string into memory at address 32 (func (type 0) ;; i32 hello(void) i32.const (;string address;) 32 ;; return address of string ) (export "hello" (func 0)) ;; export function to js )
shorthand and labels

In webassembly, there are a few things we can do to make our code shorter, but but sacrifice the property of representing the resulting binary. For instance, we can actually completely omit the type section expressions in our code, and instead write the type declaration directly into our func expressions. This can make the code easier to read since it keeps a function and its type signature in the same place, and avoids using numerical indexes for type section entries. Webassembly assemblers will automatically figure out what needs to be put into the type section and automatically map those to the correct functions.
Additionally, webassembly has labels. These aren't shorthand, but instead allow us to (in most cases) replace the use of numerical indexes with text-based names. The resulting binary won't use these names, the assembler destroys them and replaces them with the corresponding indexes. However, using labels in hand written code can make intentions clearer. These can be used to give names to functions, parameters, blocks of code, and to variables.
Let's look at the same example again, this time with an implicit type section and text a text label for the function.

(module (import "js" "mem" (memory 1)) ;; import 1 page of memory from js (data (i32.const 32) "Hello World") ;; copy string into memory at address 32 (func $hello (result i32) ;; i32 hello(void) i32.const (;string address;) 32 ;; return address of string ) (export "hello" (func $hello)) ;; export function to js )
looping example

Just like in other languages, webassembly has loops and branches. Control flow is an important part of all programming and webassembly is no different.
However, webassembly's implementation of control flow may seema bit foreign for anyone used to high-level concepts OR for anyone familiar with assembly language programming, because webassembly implements a little bit of both.
Before we go in-depth into flow control in webassembly, let's take a look at some example code. Below is a strightforward implementation of a fibonacci number generator. The function takes 1 agument representing the nth number in the sequence that we want to generate.

fibonacci generator

(module (type (func (param i32) (result i32))) (func $fibonacci (type 0) (param $n i32) (result i32) (local $previous_1 i32) (local $previous_2 i32) (local $index i32) ;; fibonacci sequence begins with 0, 1, ... i32.const 0 i32.const 1 local.set $previous_1 local.set $previous_2 i32.const 2 local.set $index loop $loop local.get $previous_2 ;; get second to last number local.get $previous_1 ;; get last number local.tee $previous_2 ;; save last number as 2nd-to-last for next iteration i32.add ;; add last and 2nd-to-last local.set $previous_1 ;; save result as last for next iteration (or return if this is the last.) local.get $index ;; get current index i32.const 1 i32.add ;; increment index local.tee $index ;; save new index local.get $n ;; get target sequence number i32.le_u ;; check if current index is target number br_if $loop ;; if not, restart loop end local.get $previous_1 ;; return result ) (export "fibonacci" (func $fibonacci)) )
Fibonacci numbers are simply the sum of the previous two fibonacci numbers, starting with 0, and 1. The simplest way to calculate these numbers is with a simple loop as shown above.
However, to make sense of the above example, there are a few other things we need to cover first.

local variables

We already looked at function parameters and how they are accessed from within a function, and local variables work pretty much the same way, except that they aren't passed to the function. Instead, local variables need to be set from within the function. They can be used as a place to store temporary data needed by the function.
Local variables are specified just like parameters, but using a "local" s-expression instead of "param". To access local variables, we use the local.get and local.set instructions. As I mentioned earlier, local variables are actually indexed along with and after the function parameters. So in the above example, $n is local variable 0, $previous_1 is 1, and $previous_2 is 2.
Along with local.get and local.set, there is another instruction that can be used with local variables that we haven't previously talked about, but we make use of it in this example. The "local.tee" instruction works just like local.set, except that instead of popping a value off the stack, it just peeks at the top of the stack. That is, it sets the value of the variable based on the value on the top of the stack, but it does not remove that value from the stack.

loops

To implement a loop, webassembly provides the "loop" instruction. The loop instruction takes one operand, which is a block signature describing the what values it will use from the stack from outside of the loop, and what value will be returned by being left on the stack when the loop exits. The signature is very similar to function signature entries like those in the type section of a webassembly module. In our example, this signature is omitted, which indicates that the stack will be in the same state when the loop exits, no values popped off and no extra values pushed onto it. We do, however, give the loop a label. This is not an operand, but simply a human readable label. When the module is assembled, this label will be removed and the loop will be referred to in the byte-code by index only.
The loop instruction itself does not actually implement a loop. Instead, it sets up a point in the code that we can use to jump back to later (based on some condition). To implement this, we will need to learn a few more instructions.

comparisons

Before we actually loop each iteration, we need to decide if we actually need to loop. In high-level languages, this condition is usually built into the loop syntax or implemented using an if statement. However, in webassembly, we need to set up the stack, do a comparison using the values on the stack, which will result in a true or false value being pushed onto the stack, and then loop (or not loop) using a conditional branch instruction.
The comparison instructions for i32 types are as follows:

Note that along with instructions for i32 comparisons, there are also corresponding instructions for i64, f32, and f64 type values.
The eq instruction pushes true onto the stack if the top two values on the stack were equal to each other. The ne instructions return false if the values were equal (ne stands for "not equal"). lt_u and lt_s return true if the first value (2nd to top of stack) was less than the 2nd value (top of stack). le_u and le_s are similar, but mean "less than or equal". gt_u and gt_s mean "greater than". ge_u and ge_s mean "greater than or equal".
Also note that all of the inequality instructions include a _u or _s suffix. This indicates whether the comparison should be done by considering the values as unsigned or as signed. This is needed because the i32 and i64 types are sign agnostic, which means that they just store 32 or 64 bits. It doesn't matter if they are signed or unsigned, the bitwise representation is the same. However, whether they are signed or unsigned instead depends on how they are used. The eq and ne instructions do not need signed and unsigned versions because sign is not important for these instructions, either the values are the same or they are not.
Additionally, there is no way to do a comparison between a signed integer and an unsigned integer. So 4294967295 will be equal to -1 if using i32 values.
f32 and f64 comparison instructions do not include the _u and _s versions. This is because IEEE floating point numbers are always signed. So there is no "f32.lt_u", there is only "f32.lt".

branching

Next, let's look at branching instructions. Branching instructions tell the VM to jump to another position in the code and start executing there instead of the next instruction. The conditional branch only performs this jump if the result of the previous comparison was true. There is also an unconditional branch which will always jump to a new position.

stack-based control flow

The way that control flow works in webassembly is a little bit strange. Unlike real assembly languages, where jumps and branches target memory addresses, in webassembly, jumps work based off of a stack. However, it is not the same stack used to store values for instructions. Instead, there is a seperate stack entirely for control flow.