Header Files in C

The Glue of C Programs

Posted 15 January 2020 at 1:26 PM

By Joseph Mellor

This is the twelfth article in the Making Sense of C series. In this article, we're going to discuss the symbol table, which will explain how the compiler functions, and header files, which will allow us to use the standard library functions to print things to the terminal, read from files, and write to files.

As we stated back in the introduction to the series, we want a standard library for C that contains all the basic functionality that a user would need, including reading and writing from files, printing out things to the terminal, getting user input in the terminal, doing string things, etc. So, let's say that we've already written the standard library, likely in a combination of machine specific Assembly and C, and we want to let programmers use it.

The Problem

Since we want to compile our programs quickly and organize our code, we have decided to write our code in multiple files, compile them separately, then combine them together into a single program. That way, the next time we make a change, we only have to recompile the changed files and then link everything together. Furthermore, it's easier to read code and find things if you split them into multiple files. Everything is all and good until we look at how the C compiler works.

The C compiler will go through each file it needs to compile from top to bottom exactly once and then either compile the code or throw an error. So now, let's say that you have a function like this

int main(void) {
    a = 4;
    int a = 5;
    return 0;
}

This function obviously won't compile because we haven't declared the variable a with its type in the first line. To really understand where the error is, let's read through this like a compiler.

```
int main(void) {
```
This line means that we want to create a function called main that returns an int and takes in no arguments. All good.
```
a = 4;
```
So we want to set a to be 4. What is a? We've never allocated any memory for it and we don't know its type, so we can't store anything in it, so this line of code is an error.
We can no longer compile this file, so we're just going to go through the file and find the rest of the errors.
```
int a = 5;
```
We should allocate some memory for a and store the value of 5 in that memory.
```
return 0;
```
There isn't an error in this line.
```
}
```
There isn't an error in this line.

We've reached the end of the file, so let's spit out the errors and then finish.

test.c: In function ‘main’:
test.c:2:2: error: ‘a’ undeclared (first use in this function)
  a = 4;
  ^

If you were to correct the code above, your first line containing the a variable would have an int a, and all the lines after it would not have an int a since we've already declared and allocated memory for that variable.

int main(void) {
    int a = 4;
    a = 5;
    return 0;
}

When the compiler reads this code, everything makes sense and we don't find any errors. It should be clear that the order in which we make statements in a C file makes a big difference, as we would expect from our compiler reading the file from the top to the bottom, so if we want to have working code, don't declare things before we use them.

The Symbol Table

When the program gets to a = 5 in the corrected C file, how does it know that a = 5 is valid code? Furthermore, when the program gets to a = 4 in the incorrect C file, how does it know that a = 4 is invalid code? Without going into too much detail about the inner workings of a compiler, your compiler will build a list of all the valid symbols that have been declared as it reads through the file. In the incorrect C file, a in a = 4 isn't in the symbol table because we haven't declared it yet. When the compiler reads int a; or int a = 4; it assigns a memory address to a, which it stores in the symbol table. When the compiler reads a = 5, it looks in the symbol table for the name a and substitues the assigned memory in the output assembly code. If a is not in the symbol table, you will get an error, otherwise, you will get the correct memory address.

The symbol table stores more than just local variables. It also stores functions and other stuff that you don't need to worry about for now (If you're curious, macros, typedefs, etc., but don't worry about them for now.).

Functions and the Symbol Table

Let's say I have the following code:

int main(void) {
    int n = 300;
    int triangular_number = nth_triangular_num(n);
    return 0;
}

int nth_triangular_num(unsigned int n) {
    int val = 0;
    for (int i = 0; i < n; i++) {
        val += i;
    }
    return val;
}

You might think that this should compile properly. Everything is defined properly and there are no syntax errors, so we should be fine. If I try to compile this code, however, I get this warning:

test.c: In function ‘main’:
test.c:3:29: warning: implicit declaration of function ‘nth_triangular_num’ [-Wimplicit-function-declaration]
     int triangular_number = nth_triangular_num(n);
                             ^~~~~~~~~~~~~~~~~~

which means that something might be wrong with the code, but the compiler is going to allow it because it might still work.

Syntax of Compiler Warnings

When most compilers find an error or a warning in your code, they will tell you the name of the file and the line number, then print out the line itself, and the error or warning.

gcc uses the syntax:

[FILE]: In function ‘[function name]’:
[FILE]:[LINE]:[COLUMN]: error: [ERROR DESCRIPTION]
  actual line of code with the error clearly marked
                           ^~~~~~~~~~~~~~~~~~~~~~~~

where everything highlighted like this is something that can vary depending on the specifics of the error or warning. Also, it's possible to have errors outside of functions (which we're going to see in a second) in which the first line of the output will say something like At top level: instead of In function ...:.

Since you can technically turn off warnings (don't), gcc will also tell you what warning was triggered off to the side like it did with [-Wimplicit-function-declaration]. Don't worry about it for now.

Other compilers can print out error codes, but you mostly need just the file, line number, and what the error was.

Lastly, if at any point you get an error or a warning, copy either the error description or the error code into a search engine and search for it. Feel free to remove anything specific to your project like the line number or file.

Unlike with variables, if the compiler sees a function it hasn't seen before, it will implicity assume that the function returns an int and take in any number of parameters of any type. Our function just happens to return an int, so it fits the implicit declaration. On the other hand, if I change the return type of nth_triangular_number to unsigned long long and the type of triangular_number to unsigned long long like so:

int main(void) {
    int n = 300;
    unsigned long long triangular_number = nth_triangular_num(n);
    return 0;
}

unsigned long long nth_triangular_num(unsigned int n) {
    unsigned long long val = 0;
    for (int i = 0; i < n; i++) {
        val += i;
    }
    return val;
}

I get the following warning and error:

test.c: In function ‘main’:
test.c:3:41: warning: implicit declaration of function ‘nth_triangular_num’ [-Wimplicit-function-declaration]
  unsigned long long triangular_number = nth_triangular_num(n);
                                         ^~~~~~~~~~~~~~~~~~
test.c: At top level:
test.c:7:20: error: conflicting types for ‘nth_triangular_num’
 unsigned long long nth_triangular_num(int n) {
                    ^~~~~~~~~~~~~~~~~~
test.c:3:41: note: previous implicit declaration of ‘nth_triangular_num’ was here
  unsigned long long triangular_number = nth_triangular_num(n);
                                         ^~~~~~~~~~~~~~~~~~

Notice that the compiler thinks we declared (more on what it means to declare a function later in the article) nth_triangular_num on line 3 As we did with the variables above, we can solve this by defining nth_triangular_number before we call it.

unsigned long long nth_triangular_num(unsigned int n) {
    unsigned long long val = 0;
    for (int i = 0; i < n; i++) {
        val += i;
    }
    return val;
}

int main(void) {
    int n = 300;
    unsigned long long triangular_number = nth_triangular_num(n);
    return 0;
}

So now, we just need to make sure that we define each function before we call it. Should be easy enough, right? We have two problems with this approach:

If two functions call each other and neither returns an int, we can't write a correct program, period.
We have to put all our code into one file, otherwise the C compiler will run into functions that were defined in another file.

Putting all our code into one file makes it almost impossible to navigate our code and it means we have to recompile our entire code base every time we make a change. For small projects, the compile time will be about the same, but for larger projects, the compile time can get significantly longer.

We need some way to tell the compiler that we want to add something to its symbol table without defining it within the file.

Declaring Functions

Let's say you're a compiler and you see the following code:

int a = 4;
char sample_string[] = "This is a test. Wouldn't you agree, Jean Pierre Polnareff?"
int array[] = { 0, 1, 1, 2, 3, 5, 8, 13, 21 };
a = some_func(a, sample_string, array);

What would you, as a compiler, need to know to determine if a = some_func(a, sample_string, array) is a valid line of code? Try to find out on your own, then mouse over the box below to reveal the answer.

In order to know if a function is going to be used properly, the compiler needs to know

the return type,
the types of its arguments,
and the name of the function.

If we know these details about the function, we should be able to add the function to our symbol table, so we just need to find a way to tell this information to the compiler. We don't want to waste any typing, so let's just use the syntax:

return_type no_argument_function(void);
return_type one_argument_function(first_type);
return_type two_argument_function(first_type, second_type);
return_type three_argument_function(first_type, second_type, third_type);

For nth_triangular_num, we would use

unsigned long long nth_triangular_num(unsigned int);

Even though it isn't strictly necessary to put the parameter names, I do so I can look back and figure out what the arguments should be.

unsigned long long nth_triangular_num(unsigned int n);

In our code, we can use

unsigned long long nth_triangular_num(unsigned int n);

int main(void) {
    int n = 300;
    unsigned long long triangular_number = nth_triangular_num(n);
    return 0;
}

unsigned long long nth_triangular_num(unsigned int n) {
    unsigned long long val = 0;
    for (int i = 0; i < n; i++) {
        val += i;
    }
    return val;
}

and our code will compile just fine since we've added nth_triangular_num to the symbol table. As a shortcut, just copy the first line of the definition of the function and replace everything after ) with a semicolon (;).

unsigned long long nth_triangular_num(unsigned int n) {  // Definition
unsigned long long nth_triangular_num(unsigned int n);   // Declaration

You can declare a function as many times as you would like since you're just adding to the symbol table, but you can only define it once since the compiler needs to know exactly what you want to do when you call nth_triangular_num.

Header Files

Given that we've gotten to this point without mentioning Header Files in an article called Header Files in C, you can imagine that Header Files will allow us to copy a file into our source file. In general, we use header files to store everything we want to be in the symbol table (including functions, macros, typedefs, etc.) that we don't want to put in our code manually, which makes our code easier to read and less error prone since we might copy something incorrectly. While C source files have the extension .c, header files have the extension .h. There is no hard requirement for the extensions on the file but it's standard across all of C.

Making header files is good, but we need a way to tell the C compiler to copy and paste them into our code without modifying the source code itself. Since this part of the compiler is going to process the input before the compiler, we call it the preprocessor.

The Full Compiler

Your compiler consists of

the preprocessor, which processes the files before running the compiler using macros (As a working definition, statements that start with a #. We'll only have to work with one macro for now, which we'll discuss in the next section.),
the compiler, which converts source files into object files containing assembly and other useful information for the linker,
the linker, which glues the compiled object files together into one singular file,
and the assembler, which converts the singular file the linker produces into machine code.

Different compilers might have another subprogram or two inside them or they might run the assembler before the linker, but it really doesn't matter.

The `#include` Statement

The #include statement will copy and paste whatever argument it's been given into the current file. There are two use cases that have slightly different syntax:

If it's a header file you've written or copied and pasted into your local project and it's in a local directory, then use:
```
#include "file.h"
```
If it's a file included in the standard library, then use:
```
#include <std_file.h>
```

The difference between <std_file.h> and "file.h" is just that the angle brackets tell it to look for the file in the standard library directories (which are the same for every C and C++ program unless you're doing something weird) and the quotes tell it to look for the file in local directories.

A Simple Example

We could write our previous code with the triangular numbers as

// triangular_numbers.h

// If you're experienced in C, you know I'm missing header guards, but I'm
// going to neglect them for now and bring them back later in the tutorial
// since showing people things before they have a clear application will
// often just confuse them.
unsigned long long nth_triangular_num(unsigned int n);

in the header file for the triangular numbers (triangular_numbers.h),

// triangular_numbers.c

unsigned long long nth_triangular_num(unsigned int n) {
    unsigned long long val = 0;
    for (int i = 0; i < n; i++) {
        val += i;
    }
    return val;
}

in the source file for the triangular numbers (triangular_numbers.c), and

// main.c
#include "triangular_numbers.h"

int main(void) {
    int n = 300;
    unsigned long long triangular_number = nth_triangular_num(n);
    return 0;
}

in the source file for the main function (main.c). As you can see, the source file and the header file containing information about the nth_triangular_num function had the same name triangular_numbers but a different extension (.c vs .h). You should generally follow this pattern in simple cases, but there are some cases in which you have to use different file names, but we're not going to worry about that for now.

WARNING

If you stop this tutorial right now, you need to make sure to use header guards in your headers or else you could get errors while compiling.

Header guards look like

#ifndef NAME_OF_THIS_FILE_H
#define NAME_OF_THIS_FILE_H

// Everything you would normally put inside the header file.

#endif

I will explain header guards in a later article in this series.

Later on, we can improve our algorithm to calculate the triangular numbers by just modifying the file triangular_numbers.c like so:

//triangular_numbers.c

unsigned long long nth_triangular_num(unsigned int n) {
    return ((unsigned long long)n * (n - 1)/2 + n);
}

In this simple example, it might not make sense to make three different files since we're saving microseconds of compile time and it's still easy to navigate our code, so let's look at a more realistic example: our word counter program.

Using Standard Library Functions

For now, let's say we're going to use the standard library functions fopen, fclose, fgets, printf, and perror. We'll also use the "type" (it's actually a struct, but we'll get into that later) FILE. To use all these functions, we would have to write (don't worry about the specifics, just look at how much we have to type each time)

typedef struct _IO_FILE FILE;
int fscanf(FILE *, const char *, ...);
int fclose(FILE *);
FILE * fopen(const char *, const char *);
int printf(const char *, ...);

int main(void) {
    // Do stuff
}

except that's not right because I also need to come up with a definition of an _IO_FILE (you will never use _IO_FILE, it's just an under the hood thing to make C work with C++, you will use FILE instead), which itself is around 120 lines of code and I might have missed something going through the standard library C code, so it could be even longer. In other words, if you want to print things out to the terminal or do anything with files, you need to copy at least 120 lines into every file that uses these functions.

With header files, however, we can use stdio.h (standard input and output header file), which has all the stuff we need written for us already. There are other header files, but we're just going to use stdio.h for now.

Summary

In this article, we learned about

the symbol table, which helps the compiler recognize valid code,
function declarations, which allow us to add functions to the symbol table,
the syntax of compiler warnings, which will help us debug our code later,
the preprocessor, which can generate code for us during compilation without modifying the original source file,
and header files, which contain function declarations and other stuff that we'll learn about later that allow us to automate some of the process of addings things to the symbol table.

What's Next

Here is everything we have up to this point in code:

#include <stdio.h>

int check_if_strings_differ(char * str1, char * str2);

int main(int argc, char ** argv) {
    char * program_name = argv[0];
    if (argc < 3) {
        // TODO: Print Usage Message
        return -1;
    }
    char * filename = argv[1];
    char * word = argv[2];
    // TODO: Count number of occurrences in a file
    return 0;
}

int check_if_strings_differ(char * str1, char * str2) {
    int i = 0;
    while (str1[i] && str2[i] && (str1[i] == str2[i])) {
        i += 1;
    }
    return str1[i] != str2[i];
}

The only new thing is the line #include <stdio.h>, but this line will allow us to use the functions we need to use for our program.

All we have left to do to write our word counter program is:

figure out the syntax for a few standard library functions (printf, fopen, fclose, and fscanf),
use the functions to write our code,
set up and use a compiler,
and familiarize ourselves with an IDE/text editor.

In the next article, we're going to figure out how to work with Files in C to write our first completed program. After that, we're going to discuss how to interact with files and the terminal in C. Then, we're going to figure out how to put our code into an IDE/compiler/build system and then compile it. After that point, we will have written our first C program!