BetterOS.org : an attempt to make computer machines run better

BetterOS.org : an attempt to make computer machines run better


home | better linux | games | software | tutorials | reference | web log |
index | C | x86_64 assembly | riscv32 assembly | riscv64 assembly | webassembly | C 1 (old) | C 2 (old) | C 3 (old) | C 4 (old) | low-level graphics |
introduction

There are many C tutorials to choose from online. However, most of them are pretty terrible and many of them contain factual errors. Learning from the wrong tutorial can lead a new programmer to have a misunderstanding of the fundamentals of the language. The problem is that most tutorials focus on getting the reader to start producing code as quickly as possible instead of first getting the reader to first understand the concepts involved, and often this is how the author of said tutorials learned as well. This tutorial will focus first on explaining the fundamental concepts and then second on producing correct code, because I believe this will help you become a better programmer faster in the long run.

what is C?

C is a language, plain and simple. More specifically, it is a compiled language, meaning that a program written in C needs to be converted into a binary machine code before it can be run.

what is C not?

C is not a program, it is not a set of commands, it is not an old version of C++ or C#. Most importantly, it is not magic. Programming in C never involves magic, it can (and should) be understood fully, from the highest level, down to the lowest level.

what is this tutorial?

This tutorial is designed to teach programming using C. It focuses on the language itself, instead of on the 'c library'. It also focuses on C for a unix-like environment, because I believe that is the best environment for learning the language, and once the language is well understood, the platform becomes less of an obstacle. If you insist on using windows for this tutorial, you will need to replace all instances of 'write()' with '_write()', and all instances of 'read()' with '_read()'.

hello world

Let's take a look at the simple "hello world" program. This is where most programmers start out. After we look at it, we will go through it, line-by line.
int write(int fd, char *str, int len); int main(int argc, char **argv) { char hello_str[15] = "Hello World!!\n"; write(1, hello_str, 14); return 0; }


If you tried other tutorials, you might notice that my version is a little bit different. The reason for this is because I designed my version to demostrate the core concepts of the language. I intentionally avoid 3 concepts that are a common source of confusion and misunderstanding for beginners (the preprocessor, implicit array sizes, and variadic functions). We will cover all of these advanced concepts in the future, but for now we will avoid them and focus on the most important basic concepts of the language.

function prototypes

Let's look through the code line by line and take a look at what each line does. The first line
int write(int fd, char *str, int len);
is called a "function prototype". A function is some code which does something. In this case, we are talking about the function called 'write', which performs the task of writing some data, either to a file or to the terminal. A function prototype, however, is not the definition of a function, it is just a way of telling the C compiler that the function exists. It must still be defined somewhere else. In the case of 'write', it is defined by a C library.

Notice the form that the prototype takes. Functions definitions and function prototypes look very similar, the main difference is that function definitions will include additional code in their body, while function prototypes end with a semicolon and no additional code.

The first word of the prototype, 'int', is called the 'return type'. This is not very important right now, since our example doesn't use the return value. Return values and types are something we will get into later.

The second word, as you might have already guessed, is the name of the function. When you create your own functions, you can use almost any name you want, as long as it follows some simple rules. However, you will frequently be using functions that are defined by some library, such as this one, in which case, you need to know the name of the function in the library.

Next there is an opening parenthesis. Inside the parentheses is a list of the arguments that the function takes. Each argument is defined with a data type and a name. In this case, there are three arguments:
fd, which is an integer
str, which is a character pointer
len, which is an integer

That's it for the prototype, now when the program needs to call write later, the compiler will have all the information it needs to make that happen. As for the different types, we will be talking about those soon.

the main function

The next line is "whitespace", but since in C, whitespace doesn't really matter, so let's just skip to the next line with something on it:
int main(int argc, char **argv)
Hopefully, you can see that this has a form that is very similar to the function prototype we just talked about. Based on what we already learned, you should be able to guess what this line is. Take a few seconds, look back at the last few paragraphs and see if you can figure it out. Once you think you know, or if you give up, move on to the next paragraph.

This line has different arguments and a different function name, but there is one thing that makes this different from a function prototype. There is no semicolon at the end of this line, which makes this a function definition. This line defines a function called "main", which returns an integer, and takes two arguments:
argc, an integer
argv, a pointer to a character pointer

The main function is usually considered the starting point for C programs. The real starting point is determined by your linker, and is usually called "_start", but that part of the program is usually part of the operating system, or compiler, or standard C library, and is usually written in assembly, so you don't have to worry about that. _start's main job is to get the cpu ready and then call main, so for all intensive purposes, main is the program's start point. The only reason I explain _start is because its important for you to know that there is no magic in the main function, its just the function that _start calls. And likewise, there is no magic in _start, its just the entry point chosen by linker or operating system.

code blocks

Following the function definition is an opening curly backet. Curly brackets in C define blocks of code. In this case, because the block of code immediately follows the function definition, the code in the block is the body of the function.

Side note: In most cases I prefer not to consider code enclosed in curly brackets to be "blocks" of code, however, because of the way that C handles function definitions, it is actually accurate here. We will see later how curly brackets can be used in other ways like in compound statements.

variables, types, and arrays

The next line:
char hello_str[15] = "Hello World!!\n";
creates a variable called "hello_str". A variable is kind of like a named container for some data. In programming, variables are the main way that you interact with any kind of data. In C, each variable has a "type". A variable's type helps the compiler figure out how to work with the data in that variable, and they also determine how much memory it needs to reserve for the variable. This variable is of the type 'char', which mean "character". In all modern systems, a char is 8 bits long, which is just enough space to store one single ascii letter. If you are familiar with ascii, you may already have guessed, but in C and computers in general, a character is actually just a number between 0 and 255, it's your operating system's job to change that into a letter when you want to display it.

However, the phrase that we want to display is "Hello World!!\n", which is 14 characters long, so one character is not enough to hold that. The solution for that problem is to create an array of characters instead of a single character, which is what we have done here. The number in square brackets after the variable name is what tells the compiler that we want an array, and the number indicates the length of the array. There are a couple things you might be confused about right now. We made a character array big enough to store 15 characters, but I said we want to print a 14 character long string, so why do we need 15 characters? Because in C, when you specify a string (like "Hello World!!\n") the compiler adds a 0 (ascii value 0, not the character '0') to the end of it. The compiler does this because in C, many functions rely on the 0 at the end to figure out how long the string is. Second, if you counted the characters in the string, you might count 15 instead of 14, but that '\n' at the end is actually only one single character. C treats any character with a \ before it as a single character, and some as a special value. For example, the character '\n' gets interpreted by the C compiler as a newline character, so the string we will actually be printing is actually "Hello World!!" followed by a new line (like pressing enter).

The rest of this line is pretty straight forward, the '=' sign is called the assignment operator. It assigns the variable on the left the value on the right. The "Hello World!!\n" is the values we want to use to initialize the character array. And the line ends with a semicolon, which terminates the statement.

function calls

Next is a function call, its the one that actually prints the text out on the screen. We are calling the "write" function, which as we know from it's prototype, takes 3 arguments. The three arguments that we are passing to it are within the parenthesis and separated by commas. The first argument we give to it (fd) is 1. If you are familiar with shell scripting, you might be able to guess what that 1 is for, if not, I'll just tell you, its "standard output", that first argument is the place we want to write the data to. If you look at the prototype you can see that this first argument should be an integer, which the number 1 is. The next argument is the string we want to write, and we are passing our array as the second argument. However, you might have noticed that the function prototype shows that the second argument should be of the type "char *" (which means "pointer to character"), but our variable is a character array. This is OK, because these two types are almost the same thing, and the compiler knows how to convert a character array to a character pointer. The final argument is another integer, 14. This argument is the length of the string we want to print. I know I told you that many functions in C use the 0 at the end of a string to figure out how long it is, but "write" is not one of those functions (and there are several reasons for this). When our program runs and write is called, write will send 14 characters starting from the beginning of hello_str to standard output. Note that this line also ends with a semicolon (as almost every line in C does).

the return statement

Next is the return statement. The return statement is used to send information from a function back to the code that called the function. Remember how function prototypes and function definitions specify a return type? That's used for the return statement. In the case of the main function, we need to return an integer, and the code that gets main's return value sends it back to your operating system, and that becomes the "exit code" for your program. Usually, an exit code of 0 means "success", so thats what we want to return.

Finally the closing curly bracket, which signifies the end of the main function, and then end of the file.

compilation process

Now that we have a basic understanding of the code, let's look at how the compiler would turn this from human readable C code into machine executable native code, and how your operating system will take that native code and run it. Having an understanding of this process is essential if you want to be a good C programmer. Knowing this information will help you write better code and also simplify debugging certain types of issues that you will encounter at one time or another. It's also not hard to understand, so don't skip ahead.

Believe it or not, C is designed to be easy for humans to read, not computers. You processor can't read C code on its own, it can only read native code. In order for the C code to be converted into native code, you need the help of a program called a "C Compiler", and several other programs (sometimes known as a toolchain). We will be referring to the whole process as the "compilation process".

The compiliation process happens in several stages. The first stage is actually writing the code, which we have already done (hello_world.c).

preprocessing

The next stage is "preprocessing", which is handled by a program known as the "preprocessor". The preprocessor reads the C code and makes changes to the code based on rules written in the code known as"preprocessor directives". These are usually simple substitutions, but in some cases can be quite complicated. Our example program, hello world, did not include any preprocessor directives, so this stage will do nothing for this example.

compilation

The next stage is known as "compilation", and is handled by the compiler. The compiler reads the C code and converts it into assembly code. The assembly code is closer to what machines can understand. The compiler will also check your code at this stage to see if you made any mistakes in the syntax or types you used. It will also check for code that it thinks might have been a mistake and give you warning messages if it isn't sure.

assembly

The next stage after that is known as "assembly", and is handled by an assembler. This stage takes the assembly code from the compiler and converts it into a binary machine code. On modern systems, this is called "object code". It is almost the same as native executable code, but it is still missing a few peices. For example, in our hello world, we used the "write" function, but we didn't create that function, its part of a library. The finished executable needs either the code for the write function, or enough information for the OS to find the write function when it runs. The object code does not include this, that's the job for the next stage.

linking

The next stage, and last stage in the compilation process, is known as "linking", and is handled by a program called the "linker". The linker takes the object code from the assembler, and all the libraries you need for the final executable and links them together. It does this either by adding code from the libraries into your code, or by adding enough information to it so that the OS can figure out how to find the right library code when it runs. Also, if your C code was separated into more than one file, the linker will combine them. The linker will then output the final native executable.

loading

Then, when you want to run your program, the operating system reads the executable file, and copys the code into main memory. It will copy into memory any libraries that are needed by the program if the library was not already loaded by another program. The operating system then finds the start position and gives control over to the copy of the executable in memory.

execution

The program starts executing from the position set by the linker, which would be at _start. _start sets up a few things needed by the operating system and processor, and then it calls main(). Code executes from the start of a function and proceeds downward. The first line, our variable declaration and initialization, saves the string "Hello World!! " somewhere in memory. Next, the code calls write() and passes to it the three arguments, stdout, memory address of the string, and the length of the string. The write function, which is part of an external library, will take the arguments provided and send do something with them (specifically, the write function will send the arguments to the OS), and then, once finished, it will return control back to the executable code. The executable will continue executing where it left off. When the return 0; statement is reached, the program gives contol back to the function that called main, which will exit the program and give control back to the operating system.

command line walkthrough using gcc

To compile C code, I recommend using gcc, in my opinion its the best C compiler available, and its open source. There are other options, clang is alright, I would place it below gcc, but it does one thing better, which is it give much better warnings and error messages, so it could be good for beginners. Tcc is also alright, its small and fast, but it does not do any optimization, so the finished executables will likely be somewhat slower.

To compile with gcc, make sure it is installed on your computer. Different distributions will have different methods of installing it. Then save your source code as a file with a .c extention. Make sure you know where you saved it.
Open a terminal emulator and use 'cd' to navigate to the location you saved the file. Finally, execute 'gcc' with the name of your C source code file as an argument. This will create a file called "a.out", which is your final executable. To run this, type "./a.out". For the sake of this example, let's assume you saved your file as:
/home/betteros/c/hello.c
The following command line session would compile and execute that file:
$ cd /home/betteros/c $ ls hello.c $ gcc hello.c $ ls a.out hello.c $ ./a.out Hello World!! $


expressions, input, and references

Now that we know a little bit about know some basic syntax and we have an idea of how the compilation process works, let's try to write something a little bit more complicated. Let's write a program that can do some single digit math and output the result. To do this, we will need to learn several new concepts.

program skeleton

For this program, we will write the program as we go, instead of just looking at it line-by-line from the top down. So let's first start with an empty main() function, because we know for sure we will need that.
int main(int argc, char **argv) { }


The first thing we will need to do is ask the user to enter a number, so we can use the write function. However, in order to use the write function, we should first provide a prototype so the compiler knows how to call it.
int write(int fd, char *data, int len);


Once we have provided a prototype, we can call the write function and print out a string on the screen.
write(1, "Enter a single digit number: ", 29);


Note that it is OK to include the string directly in a function call like this. The C compiler will take care of storing the string somewhere in memory and giving the memory address of that data to the function.

the read() function

Next, we are going to need to get input from the user, sort of like the opposite of what we did with write. So the function we want to use is called "read", and it's prototype looks pretty much exactly like write.
int read(int fd, char *data, int len);


The difference with this function is that instead of giving it data to write on the screen, we will be getting data back from it. In order to do this, we need to reserve enough memory for the data we expect to get back. In our case, we expect to get back one character (because our calculator program will operate only on single digits). We already know that creating a variable allocates enough space to store that variable's data, so let's do that. Since we need on character, a variable of type char will work well.
char digit1;


Since we don't need it to start with a value, we can leave it uninitialized. Uninitialized variables will have a value, its just left up to the operating system or compiler to decide what it is. It could be 0, or it could be whatever previously occupied that memory location. Either way it doesn't matter, we just need the storage space.

segmentation faults

Next, we need to use read to get something from the user and store it there. We want to read from stdin this time instead of stdout, and we only want 1 byte (because that is the size of a character).
read(0, digit1, 1);


Congratulates, you just encountered your first segmentation fault. There is a bug in this code which will cause it to generate a segmentation fault. What is a segmentation fault though? A segmentation fault occurs when a program tries to read, write, or execute memory that it is not allowed to read, write, or execute respectively. This is the most common cause of what it commonly known as a "crash". But how did this happen? We allocated enough storage, we gave the proper function prototype, what did we do wrong?
The answer is that we gave the read function the value of the variable digit1, but read needs the memory address of digit1, otherwise it won't know where to store the data it reads.

reference operator

For example, if digit1 were stored in memory location 0x800032, and the operating system set its initial value to 0, then our function call would actually look like this: read(0, 0, 1); which won't work because read will get data from the user and try to store that data at memory location 0, which is outside of the program's memory space.So, what we need to do is give the memory address to the function instead of it's value. To do this, we can use the reference operator, also known as the address operator. In C, & is used as the reference operator, and to use it, you only need to prefix the variable name with the operator.
read(0, &digit1, 1);

Now, that we have that figured out and working correctly, let's print out what we read from the user. This is a simple method of debugging, since we can see make sure what gets printed out is the same as what the user entered, and if not, it can reveal information about what is wrong with our program (as will be demonstrated shortly).

We already know about the write function, but until now, we have only used it with constant strings, now we need to print out user entered data. I also want to print out a message with the data that say "You entered: " so that we can easily recognize what is happening and to make it a little bit more user friendly.
If you have worked with a higher level language before, you might expect to be able to embed the variable in a string, maybe with some kind of expansion operator, or maybe use some kind of string concatenation operator to combine the message with the user entered data, but C does not have any of these. There are many reasons for this, strings in C are actually just arrays of characters, they aren't handled differently from normal arrays in C. Also, C allocates exactly the memory needed (or specified) for each variable, and if you were to concatenate strings, you would suddenly be using more memory than you allocated, which would mean over-writing other memory, possibly destroying other variables, and possibly causing a segmentation fault. This may seem like a big disadvantage of C compared to other programming languages, but if you ask me it is the main reason that C is better than other languages. Although this seems like a weakness, it actually makes C a more powerful language, because it gives the programmer greater control over how data is stored, and enables certain techniques which are only possible in C.
There are many ways to work around this. We could allocate enough memory for the concatenated string in the beginning, and write our own code to do the concatenation, but we won't do that quite yet. Or we could write the first part of the string, then write the user's input, and then write a newline character. This method is not the most efficient way to do it (it is more efficient to do as few writes as possible), but it has the advantage of being very simple to do and to understand.
write(1, "The number you entered: ", 24); write(1, &digit1, 1); write(1, "\n", 1);


Notice that we still had to use the reference operator to pass the variable to the function. Remeber that in our prototype for write, the second argument is of type "char *" (pointer to char), but digit1 is only a char. So we need to pass the address of the data (the pointer to the data) to the function; thus, we need the reference operator.

Now, if we were to compile and run the program, we can see that we get a prompt for a digit successfully, and we can enter a digit by typing a number, and when we press enter, the program repeats back to us what we entered.
Note that when asked to enter a digit, the program will sit and wait until we have entered something and pressed enter. It will not continue if we do not enter something, and it will not continue if you type a digit but do not press enter.

We now have one of the two digits we need. The code to read the second digit is going to be very similar to the first, just with a different variable, let's call it digit2.
char digit2; write(1, "Enter a single digit number: ", 29); read(0, &digit2, 1); write(1, "The number you entered: ", 24); write(1, &digit2, 1); write(1, "\n", 1);


input line buffering

Now, if you can, I would recommend compiling the code you have now and testing it.
If you do, you might notice that it doesn't perform exactly as you might have expected. When you enter a single digit as instructed, it will print out the digit you entered, but then it will seem to skip the second input and tell you that you have entered nothing. If you enter 4 for the first digit, the output would look like this:
$ ./a.out Enter a single digit number: 4 The number you entered: 4 Enter a single digit number: The number you entered: $


So what went wrong? If you are familiar with command line interfaces, especially on unix-like systems with pipes and text streams, you may be able to figure it out. If not, let me explain. stdin is a text stream, but in your terminal, it is line-buffered. That means that when your program executes "read", it starts waiting for data to become available on that text stream. At first, there is no data available, so the "blocks" (which means it keeps waiting and prevents your program from continuing while it is waiting). Then the user presses the number 4 key. Your terminal emulator tells the kernel (through the tty) that the number 4 was pressed, and the kernel (through the tty) echos that number 4 back to the terminal emulator, and the terminal emulator draws that 4 character on the screen. However, that '4' character is not made available to the program running on the tty (your program), because it is not a complete line, instead it gets added to the line buffer (which is hidden by the kernel). So at this point, read is still blocking, waiting for 1 character to become available on stdin. Then the user presses the enter key. The terminal emulator tells the tty that enter was pressed, the tty sends back the '\n' character, the terminal emulator advances to the next line, and then the tty adds the '\n' character the the line buffer. However, now the line buffer does contain a full line of text (because it ends with '\n'), so it sends the entire content of the line buffer ('4\n') to stdin. Now read has two characters available to it on stdin, so it can move the data it needs to the memory location specified in its second argument and stop blocking. However, the third argument specifies that it only needs 1 character's worth of data, so it starts from the beginning of the data available on stdin, and moves 1 character into &digit1, then removes that character from stdin. Now, stdin still contains one character, '\n', because only the '4' has been removed. Then, several write functions are called, and then the program calls read a second time. This time, read checks stdin for available data, and there is still one character left in stdin ('\n'), so it moves that character from stdin to &digit2 and removes it from stdin, and then it does not need to block, so the program continues executing.

There are several ways to solve this problem. We could disable line buffering on stdin before running the program (which is not user very user friendly), or we could learn about loops and flow control and use that to check what type of character was entered and deal with it appropriately (which we will do a little bit later), or we could use another function to clear stdin before trying to read again, or we could just add another read after the first and throw away the result. For now, let's implement that last one since we won't need any flow control or other functions for that and we can finish this program.

int write(int fd, char *data, int len); int read(int fd, char *data, int len); int main(int argc, char **argv) { char digit1; char digit2;
char garbage;
write(1, "Enter a single digit number: ", 29); read(0, &digit1, 1);
read(0, &garbage, 1);
write(1, "The number you entered: ", 24); write(1, &digit1, 1); write(1, "\n", 1); write(1, "Enter a single digit number: ", 29); read(0, &digit2, 1);
read(0, &garbage, 1);
write(1, "The number you entered: ", 24); write(1, &digit2, 1); write(1, "\n", 1); }
Now the code works correctly. All that is left to do is to actuallyadd the digits and print out the result.

addition and ascii

In C, adding 2 integers together is very easy. Unfortunately, we don't have 2 integers, we have 2 ascii encoded digits, and adding ascii encoded digits together will not get you the result you want. Fortunately, converting a single ascii encoded digit to a number is also very easy. ascii works by assigning an integer to each character. The value of the digit '0' in ascii is 48, '1' is 49, '2' is 50, '3' is 51 and so on. ascii keeps all the digits together and in order. So we can easily convert an ascii encoded digit to a number by simply treating it as a number and subtracting the value of '0'.
Then we can add these integers together and convert that back to ascii by adding the value of '0' (assuming the result is also a single decimal digit).

Doing addition and subtraction in C is very easy, all that you need is the numbers you want to add, and the addition or subtraction operator (+ or -). C used infix notation just like you probably learned in elementary school, the operator goes between the numbers to be added or subtracted (e.g. 1 + 1). But we can add more than just literal integers, we can also add or subtract the values of variables.
int write(int fd, char *data, int len); int read(int fd, char *data, int len); int main(int argc, char **argv) { char digit1; char digit2;
char digitresult;
char garbage;
int num1;
int num2;
int numresult;
write(1, "Enter a single digit number: ", 29); read(0, &digit1, 1); read(0, &garbage, 1); write(1, "The number you entered: ", 24); write(1, &digit1, 1); write(1, "\n", 1); write(1, "Enter a single digit number: ", 29); read(0, &digit2, 1); read(0, &garbage, 1); write(1, "The number you entered: ", 24); write(1, &digit2, 1); write(1, "\n", 1);
num1 = digit1 - 48;
num2 = digit2 - 48;
numresult = num1 + num2;
digitresult = numresult + 48;
write(1, "The sum is: ", 12); write(1, &digitresult, 1); write(1, "\n", 1);
}
Hopefully you understand why this code works the way it does, as well as the general process of problem solving that leads to creating a functioning program. Notice, however, that this program we wrote kind of sucks. It has a lot of problems. It's very long for such a simple task, it really relies on the user entering exactly what it expects in just the right way, and it only prints out the right answer if the right answer is less than 10. So, in the next example, we will look at some slightly more advanced concepts that will make this process a little less tedious and more interesting, and fix a lot of the problems with this example.

terminology

Before we go further, I want to make sure we understand a few things that will make communicating new concepts easier. There have been a few cases so far where I (intentionally) may not have used the best or most correct terms to describe something. Everything in programming has a very concrete definition which should make communication between programmers like you and me very easy and efficient. However, a lot of these terms might seem foreign to non-programmers, and many are borrowed from other fields but with slightly different meanings. So its important to understand exactly what these terms mean in computer science because it will help you avoid confusion later, and it will make it easier for me to explain important concepts to you.
The following is a list of terms that you should understand moving forward:

expressions

Expressions are a very important part of any programming language. The simplest and most accurate way to define an expression is anything that yields a value.
This stands in contrast to statements, which do not yield a value. Expressions can be as simple as a single literal integer (14 has a value of 14), or a variable (x has whatever value was assigned to x), or it can be made up of other expressions combined by an operator (14 + x is also an expression). Because expressions can contain other expressions, an expression can be arbitrarily complex, there is no limit ( 32 * ((14 + x) - y) is also an expression). Even a function call that returns a value is an expression.

operators and operands

Operators are a representation of an operation to be performed. An operator has one or more operands, yields a value, and forms an expression. operands are simply the values to be operated on. There are actually different types of values, although, r-values and l-values, but I would suggest you don't worry about that distinction quite yet.
There are quite a few operators in C, examples include +, -, /, *, ++, ~, =, and many more. We will go over what each does when we need them.

statements

Statements are the essential building block of a program. It is basically the equivalent of a sentence in a human language. Previously, I used the term "lines" to refer to pieces of code, but the compiler doesn't actually care about what is on a line, only what is contained within a statement. A single statement can span multiple lines, or there could even be multiple statements on a single line. There are actually multiple kinds of statements. The simplest is the simple statement, which is any code terminated by a semicolon. There are also compound statements, which are made up of other statements. There are also special statements like if statements, for loop statements, while loop statements, do while loop statements, switch and case statements (we will look at these statements in the coming chapters). There is also a goto statement, but most programmers believe it should never be used.
Unlike expressions, statements do not yield and cannot yield a value. However, a statement can contain an expression, but an expression cannot contain a statement. A statement can even be nothing but an expression terminated by a semicolon. For instance: 1 + 1 is an expression, but 1 + 1; is a statement because it is terminated by a semicolon.

declarations and definitions

Declarations are a special type of statement that tells the C compiler about something, but doesn't necessarily define what it is. For instance, function prototypes are a type of declaration. When we create variables, we also use a declatation statement. Definitions, on the other hand, define what something is. Frequently, programmers declare and define something at the same time, although there are cases where it makes sense to keep then seperate.
Declaration statements are a very special type of statement. In fact, they are the only type of statement allowed outside of a function body.

improving the previous example

The previous example really needs some work. However, we will need to learn more before we can make the necessary improvements. I have titled this chapter "refactoring", which is a common term you will hear in as a programmer. Basically it means to improve the code without drastically changing what it does. However, in addition to refactoring, we will also be enhancing the code to increase its functionality.

header files and the preprocessor

Previously in this tutorial, we learned about function prototypes. Function prototypes are a type of declaration that tells the compiler that a function exists, but doesn't actually define what it does. In our program, we wrote function prototypes for read() and write(), which are defined in an external library. These functions are both specified by a standard called POSIX, which specify that these functions should be defined in the standard C library (libc), which is typically linked automatically by the linker.
It is a little bit inconvenient having to correctly enter all function prototypes for functions defined in external libraries, which is why most libraries also come with a file listing all the prototypes you would need for all the functions the library exposes. These files are called headers and usually come with a .h extension.
To use these files, you could open up the correct header, find the prototype that you need, and then copy and paste it into your code, but that would still be quite tedious. Fortunately, there is a better way, the C Preprocessor.
The C Preprocessor can automatically copy these prototypes out of a header file and paste them into your code. It will actually copies the entire contents of the header file and pastes it into your code, but that's fine because these header files always contain valid C code. This will allow you to avoid manually writing prototypes for these library-defined function. Please note that a header file is not the library, it is only a helper file that will allow you to access the library more easily.

To use this preprocessor functionality, we need to replace our prototypes with a preprocessor directive. Preprocessor directives are weird because they are not actually C code, they are handled by the preprocessor instead of the C compiler. When the preprocessor runs, it finds all the preprocessor directives, does what they direct it to do, and then removes them from the code and passes the result to the C compiler. So when the C compiler compiles C code, there are no preprocessor directives left for it to see. If we intentionally skipped the preprocessor and tried to compile code containing preprocessor directives, the compiler would issue an error and fail to compile the code. Fortunately, most compiler packages automatically run the preprocessor as the first step, so you don't need to do anything manually to use these directives.

The specific preprocessor directive that we want to use is called include. All preprocessor directives begin with a # character, and they must be at the beginning of a line. The include directive has the following syntax:
#include <filename>

or
#include "filename"

Note that filename represents the name of the header file to be included. Also note that they do not end with a semicolon, thats because they are not statements, they are not C code at all, they are preprocessor directives.
The difference between the <filename> and "filename" versions is that using <filename> causes the proprocessor to search for a file called filename that is installed on the system, and using "filename" causes the preprocessor to search for a file called filename in a location specified manually. Generally the first form is for headers installed with a library, and the latter form is for headers that are part of your current project. Since read() and write() are part of a library, we will use the first form.
The header file that contains prototypes for these 2 functions that we need to use is called unistd.h. unistd.h stands for "unix standard header" (but it really refers to POSIX instead of unix), and it contains a lot of standard POSIX function prototypes.

To put this all together, we can replace this code:
int write(int fd, char *data, int len); int read(int fd, char *data, int len);

with this:
#include <unistd.h>



the if statement







Please note that this tutorial is not yet complete, it was last updated: October 19th, 2019. If you have any comments or corrections, please email me at prushik@betteros.org.