: an attempt to make computer machines run better : an attempt to make computer machines run better

home | better linux | games | software | tutorials | reference | web log |
index | C | x86_64 assembly | riscv32 assembly | riscv64 assembly | webassembly | C 1 (old) | C 2 (old) | C 3 (old) | C 4 (old) | low-level graphics |

RISC-V is a fairly new instruction set. It has a lot of potential to solve some of the biggest problems we have today in CPU designs. RISC-V can potentially handle all kinds of computing jobs, from microcontrollers, to mobile embedded devices, to desktops, to servers, to supercomputers. However, what really makes it interesting is the fact that the ISA is free and open source.
Until recently, RISC-V was only implemented in software emulators, or on FPGAs, but in November 2016, SiFive released the HiFive1 development board, containing the first ever RISC-V chip, and for a reasonable.
Since then, several other companies have announced future RISC-V based products, and SiFive has released another even more powerful RISC-V chip that is capable of running Linux.
In 2018, ARM Holdings attempted a smear campaign against RISC-V. This did not got well for them, but did prove that ARM is scared.

RISC-V is clearly an interesting architecture. Now that there are a few board choices available, experimenting with RISC-V is easier and more fun than ever before. But experimenting with a new ISA is really ONLY fun when you get to deal directly with the instruction set, otherwise programming with RISC-V will not be experiencing the differences from other architectures. So let's get started learning RISC-V assembly language!


For this tutorial, we are going to be working with 32-bit RISC-V. Since at the time of this writing, there is only one RV32 chip on the market, this tutorial is going to assume you are using the FE310 SoC on a HiFive1 development board. It is entirely possible that when you read this, there will be other boards on the market, and if you use one of these other boards, you will need to adapt the examples to match your board's specifics. Likewise, if you do not have any RISC-V hardware, but wish to use an emulator instead, you will need to figure out how to adapt the examples to match the emulated environment. If you have a RISC-V 64 (64 bit) based board, such as the HiFive Unleashed, running Linux, you probably will want to look at the RV64 assembly tutorial instead.


For software, you will need to have an assembler for RISC-V. If you are working on a typical x86 based computer, you will need a cross assembler. You can get this either by compiling your own, or you can use the one packaged with the official SiFive Freedom Everywhere SDK. I have used both, but I prefer my own compiled toolchain because it can also build 64-bit code for the HiFive Unleashed.
If you have a HiFive Unleashed, and have installed a native compiler on it, you can also use that to assemble the code from this tutorial, however, since the HiFive Unleased lacks a host USB port, it may be more difficult to program the HiFive1 (you can probably use the UART, but I have not tried yet).
For programming the board, you can use a compination of OpenOCD and GDB. Both OpenOCD and GDB need to be built with RISC-V support. If you build a full GCC cross compiler, it will typically include GDB. If you go this route, OpenOCD will need to be built separately, and RISC-V support is not yet available in upstream, so you need SiFive's version.
If you want to avoid compiling your own toolchain, SiFive's SDK includes both OpenOCD and GDB.
Freedom Everywhere SDK:
RISC-V GCC Toolchain (if compiling your own):
OpenOCD with RISC-V support (if compiling your own):

about RISC-V

RISC-V is, as its name suggests, a Reduced Intruction Set Computing architecture. That means that it uses a very simple set of instructions. This makes learning the assembly language a lot easier than it would be on a CISC architecture like x86 or ARM, but it also means we will need to utilize some "tricks" to do things that would x86. It also means that it will sometimes require more instructions than x86 to accomplish the same task, but note that this does not mean that RISC-V is inherently slower or less efficient.
RISC-V uses an almost fixed size instruction encoding. I say "almost" because there is an extention for 16-bit compressed instructions, but all others are encoded using 32-bits.
The fixed size instruction encoding makes memory alignment almost a non-issue, but it has implications for things like dealing with immediates.
RISC-V is also a load-store instruction set. This means that it requires load or store instructions to access or modify memory. Other instructions can operate only on registers. This helps to keep the instruction set small and simple.


RISC-V has 32 general purpose registers. In RV32, they are all 32-bits wide. Some RISC-V documentation has them listed as x0-x31, however, there is also another set of names for the these registers which suggests the intended usage of each register in the ABI. The GNU assembler uses a combination of these two naming conventions. The register names are enumerated in the table below:
registerABI nameGAS namedescription
x1rax1return address
x2spx2stack pointer
x3gpx3global pointer
x4tpx4thread pointer
x8s0 / fps0saved / frame pointer

Most of these registers can be used for any purpose, RISC-V avoids instructions which implicitly update a register. However, there is one register which behaves differently than the others: x0.
x0 is hard-wired to 0. That means that no matter what, if you read from x0, the result will be 0. x0 can be written to, but it will not change the result of subsequent reads from x0. This might see pointless, but this is one of the little tricks that enables RISC-V's instruction set to be so simple. We will take a more in-depth look at how x0 is used later.


Before we look at code, let's quickly talk about general assembly language syntax. This isn't specific to RISC-V, and it can vary from assember to assembler - even assemblers for the same ISA. The syntax we will discuss is that of GNU AS (aka GAS, or GNU Assembler), which is the assembler I will assume you are using for this tutorial.
Assembly language grammar is very simple, much simpler than any Other programming language that I know. In general, each line of code represents something, and every line comes in a similar form, The first token on the line is a directive or instruction, and is followed by a list of comma separated operands, which are somewhat like arguments. Whitespace is ignored, except to separate tokens.
This doesn't cover the entire grammar, as we will see soon, but it covers most of it, and hopefully knowing this will make it easier to understand the code we will be looking at next chapter.

hello world

Now, since we are working with a simple microcontroller (with no screen), the normal "hello world" output becomes a bit more complex. So instead of outputting text, let's instead just blink a built-in LED.
Before we tackle this task though, we need to learn about our target platform, the HiFive1. The HiFive1 has 3 built-in LEDs, one indicates power input, the other indicates an active 3.3v circuit, and the last is an RGB LED connected to GPIO pins 2 (green), 5 (blue), and 6 (red). They are also wired such that bringing the GPIO low turns on the LED. So, all we need to do is toggle a GPIO pin.
To interact with the GPIOs on the HiFive1, we need to write to a few special registers on the SoC. These are not the same as the RISC-V general purpose registers we just talked about, we can't just writing instructions that directly interact with them. Instead, they are memory-mapped, which means we need to treat them as memory.
The memory locations of the GPIO registers, and the correct values to set can all be determined by studying the FE310 Manual:

Now, enough about the board, let's take a look at some code:
// definitions for memory mapped register bases .set gpio0_base, 0x10012000 // gpio memory mapped registers (relative to gpio0_base) .set gpio_out_en, 0x08 .set gpio_port, 0x0c .section .text .align 2 .globl _start _start: // load the GPIO base memory offset to s1. // all base addresses occupy only the upper 20 bits of an address, so only lui is needed for bases lui s1, %hi(gpio0_base) // load bitmask for gpio pins into t0 lui t0, %hi(0x00080000) // 19th bit means pin 3 // store pin bitmask in gpio_base + gpio_out_en to enable pin 3 as output sw t0, gpio_out_en(s1) // set up a0 and a1, these will serve as timeouts for the loops lui a0, %hi(0x10000) lui a1, %hi(0x15000) 1: // clear t1 by adding 0 to x0 and storing in t1 addi t1, x0, 0 // write bitmask to gpio_base + gpio_port to turn off LED sw t0, gpio_port(s1) 2: // increment t1 and loop until t1 reaches a0 addi t1, t1, 1 bne t1, a0, 2b // clear gpio_base + gpio_port to turn on LED sw x0, gpio_port(s1) 3: // increment t1 and loop until t1 reaches a1 addi t1, t1, 1 bne t1, a1, 3b // begin cycle again jal x0, 1b

Note that this syntax is specific to GAS (the GNU assembler - part of binutils).


Lines beginning with // are just comments. The assembler simply ignores those lines, they are placed there only to help a human reader understand the code.

assembler directives

The lines beginning with . are assembler directives. These give direction to the assembler, but they don't directly result in anything being added to the assembled binary. The first three directives in this example are "set" directives. These define a "symbol" and assign it a value. Symbols are used for a lot of things, and may or may end up in the final executable, depending on the symbol's properties. These particular symbols we are just using as a sort of constant. We define gpio0_base to be the base memory address of the GPIO registers on the HiFive1, gpio_out_en and gpio_port are the address of two registers relative to gpio0_base.


Next, we start a new section with the "section" directive. Sections are subdivisions of an executable. Normally, executable code goes in the .text section, and data goes either in the .bss or .data section. But, this only applies to ELF executables. The HiFive1 doesn't load ELF executables because it doesn't have an operating system, it just has executable code placed somewhere in memory, and data placed somewhere else. We still need to define the sections though, because when we prepare this program for uploading to the HiFive1, the tools we use will use these sections to determine how the code should be written to the HiFive1's memory.


The next directive, the "align" directive, tells the assembler to make sure the next assembled code following the directive is aligned. This means that it must start on a memory address divisible by 2, 4, or 8, depending on the operand given to .align. In this case, we align to 2 bytes, because RISC-V actually requires that all code be aligned to at least a 16-bit boundary. Since all Instructions are either 32 or 16 bits long, its not possible for instructions to become unaligned using only valid instructions.

global symbols

Next, the "globl" directive. This tells the assembler that a specific symbol should be made visible to other programs or files. Since we only have one source file for this program, we don't need to share any symbols with other source files during linking, but this directive also makes the linker aware of the symbol. We specify the _start symbol here because we need to make the linker aware of that symbol so it can place the code following that symbol in the correct place in the HiFive1's memory.


We are almost ready to get into actual code. The next line defines a "label". Labels are symbols, just like the ones created earlier with the .set directives. However, for this symbol, we are not explicitly defining it's value. A label's value is equal to the memory address of the assembled code following it. This allows us to have a sort of pointer to the current position in memory. Although I use the term pointer, a label is not the same a pointer in C. Labels' values cannot be changed at runtime, so don't work like variables in higher level languages, however, they can be used to implement what would be called a variable or a function in a higher level language.
Here, we are using a label to define the _start symbol, pointing to the beginning of our executable code. The linker will use this symbol to place the executable code at memory address 0x20400000, which is where the HiFive1 start executing code after booting. So, the _start symbol's value will also be 0x20400000. It's rarely important to know the actual number a label will be equal to, but in this case its easy to determine because it's part of the way the HiFive1 boots, and part of the linker script.
In this case, we are using the label to implement what is effectively a "function". In fact, if we linked this code with RISC-V C code, we could call _start as if it were a function, from the C code.


Now it's finally time to talk about CPU instructions. CPU instructions are the real executable code. These get read and executed directly by the CPU. When the assembler runs, it reads the assembly code you have written, and it takes each instruction and converts it into 32 bits (RV32) of native machine code. This 32 bits contains numerical representations of the instruction (this numerical representation is called an "operation code" or "opcode") and all of it's operands. In RISC-V, there is a 1 to 1 relationship between opcode and instruction mnemonic (the code we use in assembly language to represent an instruction). It's this 32 bit value that the CPU reads, interprets, and executes when your program is running.


The first instruction we use in this program is the "lui" instruction, which stands for "Load Upper Immediate".
Immediates are literal integer values given as operands to instructions. Immediates are given a special name (instead of just value or literal) because they are actually encoded directly into the instruction. This means they are not stored somewhere else in memory until needed, they are stored in the 32 bit encoded instruction read by the CPU.
This is especially important to understand in RISC-V, because RISC-V uses a fixed 32 bit encoding for instructions, AND also 32 bit wide general purpose registers. So if we want to load 32 bits of data into a register, there is no way to fit 32 bits of immediate data, and the opcode for the instruction, and the register number into a 32 bits. So instead, RISC-V requires 2 instructions to fill a register with an immediate. The first is lui, which fills the upper 20 bits of the register with the upper 20 bits of the desired value. We would then issue another instruction to add in the remaining lower 12 bits to get the full 32 bit value.
lui takes two operands, the first is the register to load the data into, and the second is the immediate to load. The immediate must be only 20 bits long, but since it's the upper 20 bits, of the actual value we want, we usually don't want to calculate this ourselves, and the assembler will not do it automatically. So we use a macro here to do the calculation. The %hi() macro replaces its argument with only the high 20 bits of its argument at assembly time, sort of like how the C preprocessor works. We could rewrite this line as "lui s1, 0x10012", manually removing the low 12 bits of the immediate, and the assembled code would be identical.
For the gpio_base address, and any base address on the HiFive1, we only need lui to load the value into a register. This is because they all use only the upper 20 bits of the address. If we were to add in the lower bits, we would be always be adding 0.
Next, we need to select which GPIO pin we want to toggle. We can do this by preparing a bitmask to place into the GPIO output enable register. We can find which pin maps to which bit in the register by reading the HiFive1's documentation. A good choice is pin 3, because that pin controls the green led already attached to the board. Pin 3 is enabled by bit 19, so our bitmask should have that bit set, and none others. 1 * 219 = 0x00080000. Again, only bits in the upper 20 bits are set, so we need only lui to load this value into a register.

storing data

Next, we can write the value from the t0 register to the memory mapped GPIO output enable register. For this we use the "sw" instruction, which stands for "store word". sw takes three operands, two registers and an immediate. The first operand is the source register. This is a general purpose register that contains the data you want to store. The second two operands are are given in a slightly different form. Instead of being comma separated like other operands, we use the form imm(reg)
This is because, although there isn't anything that different about how sw is encoded, the CPU calculates the destination memory address by adding these operands together. So, in our example, we are storing the value currently in t0 (which is the gpio bitmask) into s1 (gpio0_base = 0x10012000) + gpio_out_en (0x08). So the result is that the bitmask is written to address 0x10012008, which is the full memory address of the gpio_out_en register on the HiFive1.
Note that this instruction stands for store word. Word, here means the natural data size processed by the CPU. On RISC-V 32, it means 32 bits, the same size as a register. To store smaller amounts of data, there exist sh (store half-word), and sb (store byte) instructions. To store more data than a word, you must use sw more than once.

The next two lines of code are just timeout values. We will use these to implement "busy loops" later, so these will define how long the LED will remain off or on. I use only lui because these are tight loops, and they happen very quickly, anything in the lower 12 bits would not cause enough delay for your eye to notice.

local labels

The next line is another label. However, this one consists of only a number. Numeric labels are treated differently by the assembler. These labels are never added to the symbol table of the assembled binary, never marked global, and duplicate definitions of the same label are allowed. They are meant to only be used by code in the same file, and relatively near where they are defined. When we refer to one of these local labels, we affix a letter to the end of the label, either an 'f' or 'b'. 'f' means foreward, and 'b' means backward. The assembler would understand "1b" to mean the most recent definition of label 1 before the current line of code. Likewise, "1f" would mean the first definition of label 1 after the current line of code. These are very convenient when we need to jump or branch to another place in the code, but don't want to give that location a unique name.

clearing registers

In the next line, we want to clear the t1 register. We will be using this register as a counter, so it makes sense for it to start counting from 0. This is not the only way it could be done, but I think this way is fairly straightforward.
If you are familiar with x86 assembly, you might think that this would be the place for a mov instruction, to move 0 into t1. However, RISC-V does not have a mov instruction. So instead, we can use the "addi" or "add immediate" instruction. In x86, it would not make sense to use an add instruction to clear a register, because we would need to be very aware of what data was already in the register and add it's two's complement to get it back to 0. But in RISC-V, things are a little easier. RISC-V's add instructions all take 3 operands, unlike x86 which takes 2 operands. In RISC-V, the first operand is the destination register, the second is the source register, and the last is (in addi's case) the immediate to add to the register. This makes the add instructions very flexible, you use them like the x86's add instructions by setting the source register and destination register to be the same, or you use different source and destination registers, which was not possible in x86. If your source register happens to have a value of 0, then you can effectively use it to place any value (as long as it fits in the lower 12 bits), into a register. In RISC-V, finding a register that has 0 is always easy, because we have x0, which is hard-wired to 0.
This is our first x0 "trick". We can use x0 as the source register to addi, to effectively emulate a mov like instruction. Here, we just want to clear all bits, so we add 0 to x0 and store the result (which will be 0) into t1, effectively clearing all bits of t1. There are other ways we could accomplish the same thing, for instance we could have used "lui t1, 0", or we could have used the non-immediate form of add to add x0 to x0 and store the result in t1. If this were x86, we would have avoided using immediate operands because those instructions typically have longer encodings, but in RISC-V, it doesn't matter as much because all instructions are 32 bits long.

toggling the LED

Next we want to turn off the LED on the HiFive1. To do this, we need to write to the gpio_port register. Normally we would want to clear the bit for the LED to turn it off, but because of the way the HiFive1 board was designed, we set the GPIO pin high to turn off the onboard LED. If we connected another LED directly to the GPIO pin ourselves, setting the pin high would light the LED. However, for this example, let's use only the onboard LED.
We already know how to write to a memory mapped register, this one is no different. We also know that we need to set bit 19, the same bit we set on the gpio_out_en register. We used t0 to store the bitmask before, and if we look carefully though our code, we can see that we have not modified t0 since then (we did this intentionally), so we know t0 already has the correct value in it and we can just reuse that. Likewise, s0 still holds the correct base address, so we can reuse that too. So turning the LED off requires only one sw instruction.

busy loops

Next, we want to implement a busy loop to waste some time. If we neglect this step, we will not be able to see the LED turning on and off, because it will happen way too fast for our human eyes to pick up the changes, it would just look like it was always on, just not quite as bright.
Now a busy loop isn't actually the best solution. In fact, on any CPU designed to run an OS, busy looping is usually a very bad idea. We use busy looping here because it is the simplest solution, and the FE310 can do this without any issue.
This technique is called "busy looping" because, during the loop, the CPU is still "busy". Although nothing useful gets done, the CPU is still actively fetching and executing instructions, this uses power and generates heat. Inside of an operating system, this becomes a bigger problem because it can prevent other programs from running at the same time because the busy loop is using all of the available CPU time. A better solution would be to set up a timer based interrupt an then put the processor into wait mode, but we will look at how to do this later.


For now, let's look at how a the busy loop works. First, we set up a local label at the beginning. Then, we use an addi instruction to add 1 to t1 and store the result in t1. This causes t1 to be incremented when the instruction executes. Finally, we run a new instruction, "bne", which means "branch if not equal". If you are familiar with x86 assembly, you probably understand conditional jumps. Branching is basically the same thing. A branch instruction tells the CPU that it should do one of two things, jump to a different location in memory and start executing there, or to not jump and continue executing the next instruction after the branch instruction. How it decides what to do depends on the instruction used. In this case, we use "branch if not equal", which takes three operands. The first two operands are registers. If these two registers do not contain the same value (not equal), then the CPU jumps to the third operand (an immediate) and begins executing instructions from there. If the first two operands do contain equal values, then the execution continues past the branch instruction.
The t1 register was cleared before this loop, so when the loop starts, it equals 0. Then each time the loop executes, 1 is added to t1. If t1 is not equal to a0 (which contains 0x10000), then we jump back to the beginning of the loop. Once the loop runs 65536 times (0x10000), then t1 will be equal to a1, and the loop ends.Because instructions some time to be executed, this effectively cause a short delay before we continue to the next part of the code.

LED on

Next, we want to turn the LED on, which we will do by setting the GPIO pin low. To do this, we must clear the corresponding bit in the memory mapped gpio_port register. Since we aren't using any other GPIO pins, we can just clear the whole register. To do this, we will again use the sw instruction, but this time we can use x0, the hard-wired 0 register as the source register.
Next, we enter another busy loop so that we can actually perceive the LED being turned on. Remember that the t1 register is already set to 0x10000 when we start this loop, so a1 must be higher than that for the loop to work correctly. I set a1 to 0x15000, so the loop will run a total of 20480 times (0x5000).

unconditional jumps

Finally, we need to jump back to the beginning of the program. Because there is no operating system running, our program cannot exit. There is no operating system to take over if it ends. We also don't want it to end because then we will only see the LED flicker once, not continually. So we need to jump back to the very first local label that we created. However, we want to make sure that we always jump to that location, not only if a condition is met. To do this we can use the jal instruction.
The jal instruction mean "jump and link". We only care about the "jump" part right now, the "link" part is used to store the address of the jump instuction, which can be used to return execution back to it and continue past it, which we do not want to do. So, to avoid this, we can give x0 as the instruction's first operand. This will cause the return address to be written to x0, but since x0 ignores writes, it effectively just throws away the return address.
The second operand is the place we want to jump to. We use 1b here to jump back to the first local label we created. Take a look through the example one more time to see how it will execute the second time through.