In order to know if a function is going to be used properly, the compiler needs to know
- the return type,
- the types of its arguments,
- and the name of the function.
This is the twelfth article in the Making Sense of C series. In this article, we're going to discuss the symbol table, which will explain how the compiler functions, and header files, which will allow us to use the standard library functions to print things to the terminal, read from files, and write to files.
As we stated back in the introduction to the series, we want a standard library
for C
that contains all the basic functionality that a user would need,
including reading and writing from files, printing out things to the terminal,
getting user input in the terminal, doing string things, etc.
So, let's say that we've already written the standard library, likely in a
combination of machine specific Assembly and C
, and we want to let programmers
use it.
Since we want to compile our programs quickly and organize our code, we have
decided to write our code in multiple files, compile them separately, then
combine them together into a single program.
That way, the next time we make a change, we only have to recompile the changed
files and then link everything together.
Furthermore, it's easier to read code and find things if you split them into
multiple files.
Everything is all and good until we look at how the C
compiler works.
The C
compiler will go through each file it needs to compile from top to
bottom exactly once and then either compile the code or throw an error.
So now, let's say that you have a function like this
1 2 3 4 5 | int main(void) { a = 4; int a = 5; return 0; } |
This function obviously won't compile because we haven't declared the variable
a
with its type in the first line.
To really understand where the error is, let's read through this like a
compiler.
int main(void) {
main
that returns an
int
and takes in no arguments.
All good.
a = 4;
a
to be 4
.
What is a
?
We've never allocated any memory for it and we don't know its type, so we can't
store anything in it, so this line of code is an error.
int a = 5;
a
and store the value of 5
in that
memory.
return 0;
}
test.c: In function ‘main’: test.c:2:2: error: ‘a’ undeclared (first use in this function) a = 4; ^
If you were to correct the code above, your first line containing the a
variable would have an int a
, and all the lines after it would not have an
int a
since we've already declared and allocated memory for that variable.
int main(void) { int a = 4; a = 5; return 0; }
When the compiler reads this code, everything makes sense and we don't find any
errors.
It should be clear that the order in which we make statements in a C
file
makes a big difference, as we would expect from our compiler reading the file
from the top to the bottom, so if we want to have working code, don't declare
things before we use them.
When the program gets to a = 5
in the corrected C
file, how does it know
that a = 5
is valid code?
Furthermore, when the program gets to a = 4
in the incorrect C
file, how
does it know that a = 4
is invalid code?
Without going into too much detail about the inner workings of a compiler,
your compiler will build a list of all the valid symbols that have been
declared as it reads through the file.
In the incorrect C
file, a
in a = 4
isn't in the symbol table because we
haven't declared it yet.
When the compiler reads int a;
or int a = 4;
it assigns a memory address to
a
, which it stores in the symbol table.
When the compiler reads a = 5
, it looks in the symbol table for the name a
and substitues the assigned memory in the output assembly code.
If a
is not in the symbol table, you will get an error, otherwise, you will
get the correct memory address.
The symbol table stores more than just local variables. It also stores functions and other stuff that you don't need to worry about for now (If you're curious, macros, typedefs, etc., but don't worry about them for now.).
Let's say I have the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 | int main(void) { int n = 300; int triangular_number = nth_triangular_num(n); return 0; } int nth_triangular_num(unsigned int n) { int val = 0; for (int i = 0; i < n; i++) { val += i; } return val; } |
You might think that this should compile properly. Everything is defined properly and there are no syntax errors, so we should be fine. If I try to compile this code, however, I get this warning:
test.c: In function ‘main’: test.c:3:29: warning: implicit declaration of function ‘nth_triangular_num’ [-Wimplicit-function-declaration] int triangular_number = nth_triangular_num(n); ^~~~~~~~~~~~~~~~~~
which means that something might be wrong with the code, but the compiler is going to allow it because it might still work.
When most compilers find an error or a warning in your code, they will tell you the name of the file and the line number, then print out the line itself, and the error or warning.
gcc
uses the syntax:
[FILE]: In function ‘[function name]’: [FILE]:[LINE]:[COLUMN]: error: [ERROR DESCRIPTION] actual line of code with the error clearly marked ^~~~~~~~~~~~~~~~~~~~~~~~
where everything highlighted like this
is something that can vary
depending on the specifics of the error or warning.
Also, it's possible to have errors outside of functions (which we're going to
see in a second) in which the first line of the output will say something like
At top level:
instead of In function ...:
.
Since you can technically turn off warnings (don't), gcc
will also tell you
what warning was triggered off to the side like it did with [-Wimplicit-function-declaration]
.
Don't worry about it for now.
Other compilers can print out error codes, but you mostly need just the file, line number, and what the error was.
Lastly, if at any point you get an error or a warning, copy either the error description or the error code into a search engine and search for it. Feel free to remove anything specific to your project like the line number or file.
Unlike with variables, if the compiler sees a function it hasn't seen before,
it will implicity assume that the function returns an int
and take in any
number of parameters of any type.
Our function just happens to return an int
, so it fits the implicit
declaration.
On the other hand, if I change the return type of nth_triangular_number
to
unsigned long long
and the type of triangular_number
to unsigned long
long
like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 | int main(void) { int n = 300; unsigned long long triangular_number = nth_triangular_num(n); return 0; } unsigned long long nth_triangular_num(unsigned int n) { unsigned long long val = 0; for (int i = 0; i < n; i++) { val += i; } return val; } |
I get the following warning and error:
test.c: In function ‘main’: test.c:3:41: warning: implicit declaration of function ‘nth_triangular_num’ [-Wimplicit-function-declaration] unsigned long long triangular_number = nth_triangular_num(n); ^~~~~~~~~~~~~~~~~~ test.c: At top level: test.c:7:20: error: conflicting types for ‘nth_triangular_num’ unsigned long long nth_triangular_num(int n) { ^~~~~~~~~~~~~~~~~~ test.c:3:41: note: previous implicit declaration of ‘nth_triangular_num’ was here unsigned long long triangular_number = nth_triangular_num(n); ^~~~~~~~~~~~~~~~~~
Notice that the compiler thinks we declared (more on what it means to
declare a function later in the article) nth_triangular_num
on line 3
As we did with the variables above, we can solve this by defining
nth_triangular_number
before we call it.
1 2 3 4 5 6 7 8 9 10 11 12 13 | unsigned long long nth_triangular_num(unsigned int n) { unsigned long long val = 0; for (int i = 0; i < n; i++) { val += i; } return val; } int main(void) { int n = 300; unsigned long long triangular_number = nth_triangular_num(n); return 0; } |
So now, we just need to make sure that we define each function before we call it. Should be easy enough, right? We have two problems with this approach:
int
, we can't write a
correct program, period.
C
compiler will run
into functions that were defined in another file.
Putting all our code into one file makes it almost impossible to navigate our code and it means we have to recompile our entire code base every time we make a change. For small projects, the compile time will be about the same, but for larger projects, the compile time can get significantly longer.
We need some way to tell the compiler that we want to add something to its symbol table without defining it within the file.
Let's say you're a compiler and you see the following code:
int a = 4; char sample_string[] = "This is a test. Wouldn't you agree, Jean Pierre Polnareff?" int array[] = { 0, 1, 1, 2, 3, 5, 8, 13, 21 }; a = some_func(a, sample_string, array);
What would you, as a compiler, need to know to determine if a = some_func(a,
sample_string, array)
is a valid line of code?
Try to find out on your own, then mouse over the box below to reveal the answer.
In order to know if a function is going to be used properly, the compiler needs to know
If we know these details about the function, we should be able to add the function to our symbol table, so we just need to find a way to tell this information to the compiler. We don't want to waste any typing, so let's just use the syntax:
return_type no_argument_function(void); return_type one_argument_function(first_type); return_type two_argument_function(first_type, second_type); return_type three_argument_function(first_type, second_type, third_type);
For nth_triangular_num
, we would use
unsigned long long nth_triangular_num(unsigned int);
Even though it isn't strictly necessary to put the parameter names, I do so I can look back and figure out what the arguments should be.
unsigned long long nth_triangular_num(unsigned int n);
In our code, we can use
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | unsigned long long nth_triangular_num(unsigned int n); int main(void) { int n = 300; unsigned long long triangular_number = nth_triangular_num(n); return 0; } unsigned long long nth_triangular_num(unsigned int n) { unsigned long long val = 0; for (int i = 0; i < n; i++) { val += i; } return val; } |
and our code will compile just fine since we've added nth_triangular_num
to
the symbol table.
As a shortcut, just copy the first line of the definition of the function and
replace everything after )
with a semicolon (;
).
unsigned long long nth_triangular_num(unsigned int n) { // Definition unsigned long long nth_triangular_num(unsigned int n); // Declaration
You can declare a function as many times as you would like since you're just
adding to the symbol table, but you can only define it once since the compiler
needs to know exactly what you want to do when you call nth_triangular_num
.
Given that we've gotten to this point without mentioning Header Files in
an article called Header Files in C, you can imagine that Header
Files will allow us to copy a file into our source file.
In general, we use header files to store everything we want to be in the symbol
table (including functions, macros, typedefs, etc.) that we don't want to put in
our code manually, which makes our code easier to read and less error prone
since we might copy something incorrectly.
While C
source files have the extension .c
, header files have the extension
.h
.
There is no hard requirement for the extensions on the file but it's standard
across all of C
.
Making header files is good, but we need a way to tell the C
compiler to copy
and paste them into our code without modifying the source code itself.
Since this part of the compiler is going to process the input before the
compiler, we call it the preprocessor.
Your compiler consists of
#
.
We'll only have to work with one macro for now, which we'll discuss in the next
section.),
Different compilers might have another subprogram or two inside them or they might run the assembler before the linker, but it really doesn't matter.
#include
StatementThe #include
statement will copy and paste whatever argument it's been given
into the current file.
There are two use cases that have slightly different syntax:
#include "file.h"
#include <std_file.h>
The difference between <std_file.h>
and "file.h"
is just that the
angle brackets tell it to look for the file in the standard library directories
(which are the same for every C
and C++
program unless you're doing
something weird) and the quotes tell it to look for the file in local
directories.
We could write our previous code with the triangular numbers as
1 2 3 4 5 6 7 | // triangular_numbers.h // If you're experienced in C, you know I'm missing header guards, but I'm // going to neglect them for now and bring them back later in the tutorial // since showing people things before they have a clear application will // often just confuse them. unsigned long long nth_triangular_num(unsigned int n); |
in the header file for the triangular numbers (triangular_numbers.h
),
1 2 3 4 5 6 7 8 9 | // triangular_numbers.c unsigned long long nth_triangular_num(unsigned int n) { unsigned long long val = 0; for (int i = 0; i < n; i++) { val += i; } return val; } |
in the source file for the triangular numbers (triangular_numbers.c
), and
1 2 3 4 5 6 7 8 | // main.c #include "triangular_numbers.h" int main(void) { int n = 300; unsigned long long triangular_number = nth_triangular_num(n); return 0; } |
in the source file for the main
function (main.c
).
As you can see, the source file and the header file containing information about
the nth_triangular_num
function had the same name triangular_numbers
but a
different extension (.c
vs .h
).
You should generally follow this pattern in simple cases, but there are some
cases in which you have to use different file names, but we're not going to
worry about that for now.
If you stop this tutorial right now, you need to make sure to use header guards in your headers or else you could get errors while compiling.
Header guards look like
1 2 3 4 5 6 | #ifndef NAME_OF_THIS_FILE_H #define NAME_OF_THIS_FILE_H // Everything you would normally put inside the header file. #endif |
I will explain header guards in a later article in this series.
Later on, we can improve our algorithm to calculate the triangular numbers by
just modifying the file triangular_numbers.c
like so:
1 2 3 4 5 | //triangular_numbers.c unsigned long long nth_triangular_num(unsigned int n) { return ((unsigned long long)n * (n - 1)/2 + n); } |
In this simple example, it might not make sense to make three different files since we're saving microseconds of compile time and it's still easy to navigate our code, so let's look at a more realistic example: our word counter program.
For now, let's say we're going to use the standard library functions fopen
,
fclose
, fgets
, printf
, and perror
.
We'll also use the "type" (it's actually a struct
, but we'll get into that
later) FILE
.
To use all these functions, we would have to write (don't worry about the
specifics, just look at how much we have to type each time)
1 2 3 4 5 6 7 8 9 | typedef struct _IO_FILE FILE; int fscanf(FILE *, const char *, ...); int fclose(FILE *); FILE * fopen(const char *, const char *); int printf(const char *, ...); int main(void) { // Do stuff } |
except that's not right because I also need to come up with a definition of an
_IO_FILE
(you will never use _IO_FILE
, it's just an under the hood thing to
make C
work with C++
, you will use FILE
instead), which itself
is around 120 lines of code and I might have missed something going through
the standard library C
code, so it could be even longer.
In other words, if you want to print things out to the terminal or do anything
with files, you need to copy at least 120 lines into every file that uses these
functions.
With header files, however, we can use stdio.h
(st
and
ard i
nput and
o
utput h
eader file), which has all the stuff we need written for us already.
There are other header files, but we're just going to use stdio.h
for now.
In this article, we learned about
Here is everything we have up to this point in code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | #include <stdio.h> int check_if_strings_differ(char * str1, char * str2); int main(int argc, char ** argv) { char * program_name = argv[0]; if (argc < 3) { // TODO: Print Usage Message return -1; } char * filename = argv[1]; char * word = argv[2]; // TODO: Count number of occurrences in a file return 0; } int check_if_strings_differ(char * str1, char * str2) { int i = 0; while (str1[i] && str2[i] && (str1[i] == str2[i])) { i += 1; } return str1[i] != str2[i]; } |
The only new thing is the line #include <stdio.h>
, but this line
will allow us to use the functions we need to use for our program.
All we have left to do to write our word counter program is:
printf
, fopen
,
fclose
, and fscanf
),
In the next article, we're going to figure out how to work with Files in C to write our first completed program.
After that, we're going to discuss how to interact with files and the terminal
in C
.
Then, we're going to figure out how to put our code into an IDE/compiler/build
system and then compile it.
After that point, we will have written our first C
program!