Since the focus for this article will be on build systems, we won't spend extra
effort on making sure that we can handle lines of arbitrary lengths. For this
article, we're going to assume that all lines are guaranteed to be less than
4095
characters. To deal with longer lines, we'd have to know how to
dynamically allocate memory so we could print out the entire line or we could
just print out the section of the line with an ellipsis before and after, e.g.
user@computer:~/dev/c-tutorial$ ./print-lines-with-word file.txt the
200: ... which I had given to the last individual I had found ...
As always, we need to start with our goal and break it up into high-level tasks. Just like the last article, we need to know the word and the file to read from. Then, if we find the word in a line, we need to print out the line number and the line.
Our algorithm currently looks like
I decided to reuse the name of the section from the last article because what we're going to write here will be almost identical to the last program since we're taking in the same exact info in the exact same way. In fact, I'm just going to copy a part of the program from last time.
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <stdio.h> int main(int argc, char ** argv) { if (3 > argc) { fprintf(stderr, "./print-lines-with-word file_name word_to_count\n"); return -1; } char * program_name = argv[0]; char * file_name = argv[1]; char * word = argv[2]; // TODO: Loop through the file and print out lines containing the word return 0; } |
Again, we can do the exact same thing with the file stuff that we did with the word counter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #include "str-operations.h" #include <stdio.h> int main(int argc, char ** argv) { if (3 > argc) { fprintf(stderr, "./print_lines_with_word file_name word_to_count\n"); return -1; } char * program_name = argv[0]; char * file_name = argv[1]; char * word = argv[2]; FILE * reader = fopen(file_name, "r"); const int line_sz = 4096; char line[line_sz]; while (fgets(line, line_sz, reader)) { // TODO: Print line if word is found } fclose(reader); return 0; } |
strncpy
. We'll need to include string.h
to do so.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | #include "str-operations.h" #include <stdio.h> #include <string.h> int main(int argc, char ** argv) { if (3 > argc) { fprintf(stderr, "./print_lines_with_word file_name word_to_count\n"); return -1; } char * program_name = argv[0]; char * file_name = argv[1]; char * word = argv[2]; unsigned int count = 0; FILE * reader = fopen(file_name, "r"); const int line_sz = 4096; char modified_line[line_sz]; char unmodified_line[line_sz]; while (fgets(unmodified_line, line_sz, reader)) { strncpy(modified_line, unmodified_line, line_sz - 1); if (count_word_in_line(modified_line, word) > 0) { printf("%s", unmodified_line); } } fclose(reader); return 0; } |
We need to keep count of the line we're on and print it out. We also want to add
a bit of white space to the left so the numbers align. We'll assume that the
number of lines is less than 100000, which means that we need 5 digits of
padding. We can use the format string %5u
to print out the line number with
the necessary padding. I added a :
to delineate the line and the number.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | #include "str-operations.h" #include <stdio.h> int main(int argc, char ** argv) { if (3 > argc) { fprintf(stderr, "./print_lines_with_word file_name word_to_count\n"); return -1; } char * program_name = argv[0]; char * file_name = argv[1]; char * word = argv[2]; unsigned int count = 0; FILE * reader = fopen(file_name, "r"); const int line_sz = 4096; char modified_line[line_sz]; char unmodified_line[line_sz]; unsigned int line_num = 1; while (fgets(unmodified_line, line_sz, reader)) { strncpy(modified_line, unmodified_line, line_sz - 1); if (count_word_in_line(modified_line, word) > 0) { printf("%5u: %s", line_num, unmodified_line); } line_num += 1; } fclose(reader); return 0; } |
And we're done with the code.
We could compile the code using
user@computer:~/dev/c-tutorial$ gcc print-lines-with-word.c str-operation.c -o print-lines-with-word
and we could use it with
user@computer:~/dev/c-tutorial$ ./print-lines-with-word test-file.txt the
3: trivial using regular expressions. In fact, we could replace a lot of the things
5: including functions in the standard library such as toupper and strcmp. In fact,
6: if I were tasked to write a program that counts the number of times a word shows
13: Put simply, you have to have something round and roll it on the ground before
16: You might be expected to solve the integral manually in a Calculus class, but
17: in any other class, you would look it up. At this part of the tutorial, we're
22: starting out neither at the fundamental rules of the field nor at the level of
26: the middle and spread towards both ends.
28: The quick brown fox jumped over the lazy dog.
30: the
31: thE
32: tHE
33: tHe
35: THe
36: THE
37: ThE
38: The
40: There should be twenty-one instances of "the."
If we want to compile like this manually, we have to recompile every .c
file
involved in an executable (or library, which we haven't covered yet) every time
we run gcc
. While this problem might not sound like a big deal for us since we
have two small files, something like the Linux kernel has
hundreds of C
files, maybe even thousands. Taking an hour and a half to
recompile the entire project every time you make a change is unacceptable from
both an efficiency and an emotional standpoint (Imagine getting an error an hour
in because you forgot a semicolon.).
This scenario gets even worse when you consider that we're redoing work that we shouldn't need to do. If a file doesn't change, then the compiler is going to compile it in exactly the same way. As you can imagine, we should be able to store the output of that compilation.
gcc
has exactly the feature we're looking for. If we add a -c
, it will
produce the output of the compilation of the file instead of an executable (or
library). So now, we can do
user@computer:~/dev/c-tutorial$ gcc print-lines-with-word.c -c -o print-lines-with-word.o user@computer:~/dev/c-tutorial$ gcc str-operations.c -c -o str-operations.o
where the .o
stands for an object file. If you type ls
, you'll notice
that there are two new files: print-lines-with-word.o
and str-operations.o
. These two
files are the output of compiling the files without trying to link them. If we
want our program, we can then use the object files instead of the C
source
code files like so
user@computer:~/dev/c-tutorial$ gcc print-lines-with-word.o str-operations.o -o print-lines-with-word
Now, let's say we make a change to word-counter.c
. In this case, we can
recompile it and link it using the commands
user@computer:~/dev/c-tutorial$ gcc word-counter.c -c -o word-counter.o user@computer:~/dev/c-tutorial$ gcc word-counter.o str-operations.o -o word-counter
Note that we only had to compile word-counter.c
because str-operations.o
had
already been compiled when we made print-lines-with-word
. If we were working on the Linux
kernel, we'd only have to wait an hour and a half when we first compile it. If
we only change ~10 files and then recompile, we'd only have to wait a few
minutes to see our changes.
So, we seem to have solved the problem. If we want to recompile a project, we
recompile every .c
file we modified and every .c
file that
includes a header file that we modified since we last compiled the project
into object files, and then we link all the relevant object files. Remember that
the #include
directive pastes the contents of the .h
file into the .c
file. The .o
file needs to therefore reflect the .c
file and any included
.h
files.
Before we move on, we're going to remove all the .o
files from the folder
since they'll interfere with our build system demos in the next part. We can do
so with a simple rm *.o
in the terminal.
As programmers, we like to automate as much work as possible because we're lazy
and computers don't make dumb mistakes like forgetting to compile certain files.
We should only have to tell the computer which output files depend on which
input files and we're done. The program that figures out what to compile, the
project file, and the organization of files up what is known as a build
system. People can organize their projects in different ways and all build
system programs can work with them, so I'm not going to focus on the
organization. Instead, I'm going to focus on two programs used in build systems:
make
and cmake
. We're going to start with make
since it's a bit simpler,
but then we're going to move to cmake
since it's cross-platform and it takes
care of a lot of little issues.
make
make
is simple to start, but a little complex to automate. Luckily for us, we
can hard-code everything to get the basic principles and then move to cmake
.
The basic premise behind make
is that you write a file called Makefile
with
a bunch of rules of the form
target: dependency_one dependency_two command (optional) command (optional)
and make
will create a file named target
from the command
s if
dependency_one
or dependency_two
have been modified since target
was last
modified. For example, the rules for print-lines-with-word
would look like
print-lines-with-word: print-lines-with-word.o str-operations.o gcc print-lines-with-word.o str-operations.o -o print-lines-with-word
This would be gross for us to write everywhere, so we can actually automatic
variables to simplify the above code. In this case, we'll want to use $@
for the target
and $^
for all dependencies. We can replace the above line
with
print-lines-with-word: print-lines-with-word.o str-operations.o gcc $^ -o $@
and it will work identically to the previous version. Since the rest of the
Makefile
looks exactly like the rule above, I'll just give it to you.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | print-lines-with-word: print-lines-with-word.o str-operations.o gcc $^ -o $@ word-counter: word-counter.o str-operations.o gcc $^ -o $@ print-lines-with-word.o: print-lines-with-word.c str-operations.h gcc print-lines-with-word.c -c -o $@ str-operations.o: str-operations.c str-operations.h gcc str-operations.c -c -o $@ word-counter.o: word-counter.c str-operations.h gcc word-counter.c -c -o $@ |
Note that I had to add the .c
file into the command since we don't compile
.h
files. If we then go to the terminal we can run make
with the command
make
followed by the target
, like so
user@computer:~/dev/c-tutorial$ make print-lines-with-word
gcc print-lines-with-word.c -c -o print-lines-with-word.o
gcc str-operations.c -c -o str-operations.o
gcc print-lines-with-word.o str-operations.o -o print-lines-with-word
As you can see, make
will print out all the commands that ran. In this case,
to make print-lines-with-word
, it had to first make print-lines-with-word.o
and str-operations.o
,
which it did by following the corresponding rules. Note that it did not compile
word-counter
or even word-counter.o
since we did not need it to make
print-lines-with-word
. We can then make the word-counter
with
user@computer:~/dev/c-tutorial$ make word-counter
gcc word-counter.c -c -o word-counter.o
gcc word-counter.o str-operations.o -o word-counter
Note that make
did not recompile str-operation.o
since it was already
compiled when we made print-lines-with-word
.
If you want, you can test make
by changing one of the files and running make
again. If you run make
without changing any of the files, it will tell you
everything is up to date.
.PHONY
TargetsIt's a bit annoying to make both projects manually, so make
has a solution. If
you declare a .PHONY target
, make
won't treat the target
like a file. So
now, we can modify the Makefile
to look like
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | .PHONY: all all: word-counter print-lines-with-word print-lines-with-word: print-lines-with-word.o str-operations.o gcc $^ -o $@ word-counter: word-counter.o str-operations.o gcc $^ -o $@ print-lines-with-word.o: print-lines-with-word.c str-operations.h gcc print-lines-with-word.c -c -o $@ str-operations.o: str-operations.c str-operations.h gcc str-operations.c -c -o $@ word-counter.o: word-counter.c str-operations.h gcc word-counter.c -c -o $@ |
Furthermore, if you don't give make
a target, it will assume you wanted the
first target in the Makefile
, which is all
in this case. Before we can show
off make
with the new all
rule, we have to first remove all the .o
files
(and it's good practice to remove the targets too, so we'll do that). As it's a
little inconvenient to do so manually, we can have make
do it for us using
another target
, which we'll call clean
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | .PHONY: all all: word-counter print-lines-with-word print-lines-with-word: print-lines-with-word.o str-operations.o gcc $^ -o $@ word-counter: word-counter.o str-operations.o gcc $^ -o $@ print-lines-with-word.o: print-lines-with-word.c str-operations.h gcc print-lines-with-word.c -c -o $@ str-operations.o: str-operations.c str-operations.h gcc str-operations.c -c -o $@ word-counter.o: word-counter.c str-operations.h gcc word-counter.c -c -o $@ .PHONY: clean clean: rm -f *.o print-lines-with-word word-counter |
Note that we used rm -f
to force removal so that we don't get an error if
there are no files to remove. Now, run make clean
first to remove all the
output, then run make
.
user@computer:~/dev/c-tutorial$ make clean rm -f *.o print-lines-with-word word-counter user@computer:~/dev/c-tutorial$ make gcc word-counter.c -c -o word-counter.o gcc str-operations.c -c -o str-operations.o gcc word-counter.o str-operations.o -o word-counter gcc print-lines-with-word.c -c -o print-lines-with-word.o gcc print-lines-with-word.o str-operations.o -o print-lines-with-word
You can run make -j [num-threads]
to compile multiple files in parallel, which
can decrease compilation time.
user@computer:~/dev/c-tutorial$ make -j 4
Since we have a small project, we won't get any noticeable benefit. Compiling multiple files at the same time can also lead to warnings and errors getting mixed up and being harder to read and I haven't been in a situation where compiling the program took a long enough time for me to care, so I normally compile the files without parallelism.
cmake
Now that we know how to write a Makefile
, let's talk about cmake
.
Makefiles
tend to either be tedious to write or annoyingly
complex. This one just requires you to put your .c
files in src
, your
.h
files in include
, external includes in external_includes
, and external
libraries in libs
, and it will take care of everything else. Furthermore, they
aren't cross-platform, so you can't use them on a Windows system without some
extra work. cmake
tries to solve both problems by generating the right project
file for you no matter what platform you're on. Run make clean
again to clean
up and delete the old Makefile
, since we're going to use cmake
in the
future.
cmake
Just in case you haven't yet or don't remember how to, the commands are
user@computer:~/dev/c-tutorial$ sudo apt update user@computer:~/dev/c-tutorial$ sudo apt install cmake
on Ubuntu, Linux Mint, and the Windows Subsystem for Linux and
computer:~ user$ brew update computer:~ user$ brew install cmake
on Mac computers. Now might also be a good time to upgrade your system by
running sudo apt upgrade
or brew upgrade
.
Right now, everything is kind of gross since the output and the input are in the
same folder, meaning you might end up accidentally deleting everything and you
don't care about the .o
files, so they should be hidden away. With those facts
in mind, I have two main build systems I like to use:
src
: Contains .c
files.
include
: Contains .h
files.
bin
: Contains output and intermediate files like object files.
CMakeLists.txt
project1
src
include
CMakeLists.txt
project2
src
include
CMakeLists.txt
bin
CMakeLists.txt
I separate include
and src
because it's annoying when I end up editing the
header file when I wanted to edit the source file, but you could honestly dump
them all in the same directory for project files. That being said, I recommend
that all your output be separated from your input by at least one directory
(e.g. put all your output in bin
and all your input in src
where both are in
the same directory) and your project file (in this case, CMakeLists.txt
) be at
the root directory of your project. For now, we're doing a single project and
we're going to go with separating include
and src
simply because there's an
option in CMake
to specify the include
directory and I want to use it.
First, we'll need to create the directories, then move our files into the proper
directory, then we'll go from there.
user@computer:~/dev/c-tutorial$ mkdir src include bin user@computer:~/dev/c-tutorial$ make clean user@computer:~/dev/c-tutorial$ mv *.c src user@computer:~/dev/c-tutorial$ mv *.h include user@computer:~/dev/c-tutorial$ rm Makefile user@computer:~/dev/c-tutorial$ ls * test-file.txt bin: include: str-operations.h src: print-lines-with-word.c str-operations.c word-counter.c
Now, we'll need to make the cmake
project file, which is known as
CMakeLists.txt
. Create the new file in ~/dev/c-tutorial/
as you did
earlier with the previous files before you moved them. It should be in the same
directory as bin
, include
, and src
.
CMakeLists.txt
First, we specify the minimum version of cmake
we want to use based on the
functions we're using. In this case, we'll go with 3.0.0
because we're not
using anything fancy and 2.X.X
is outdated. We can do this with the command
cmake_minimum_version
like so
1 | cmake_minimum_required(VERSION 3.0.0) |
Let's also give it a project name like c-tutorial
, which we can do with the
project
command like so
1 2 | cmake_minimum_required(VERSION 3.0.0) project(c-tutorial) |
We don't need to give the project a name, but it's nice to have. We could also
use it later. Anyway, let's add the executables. Unlike with make
where we
specify rules, with cmake
, we specify just which .c
files are involved with
which target. In our case, we want to create two executables, so we'll use the
add_executable
command.
1 2 3 4 5 6 7 8 9 10 11 12 | cmake_minimum_required(VERSION 3.0.0) project(c-tutorial) add_executable(word-counter src/word-counter.c src/str-operations.c ) add_executable(print-lines-with-word src/print-lines-with-word.c src/str-operations.c ) |
As you might be able to guess, the first argument is the name of the executable
and the other arguments are the source files needed for it to compile. Note that
we had to specify the path relative to the directory containing
CMakeLists.txt
.
cmake
DirectoriesIn cmake
, there are two main directories: CMAKE_SOURCE_DIR
and
CMAKE_BINARY_DIR
. The first one is wherever the CMakeLists.txt
file is
located. The second is wherever you run the cmake
command. Generally, you want
these two directories to be different so that you don't get build files mixed in
with your code files. For this reason, we're going to be doing what is known as
an out of source build. At this point, all we need to do is move into the
bin
directory and run cmake
with the location of CMakeLists.txt
passed in
as the argument. This process will look like
user@computer:~/dev/c-tutorial$ cd bin user@computer:~/dev/c-tutorial/bin$ cmake .. -- The C compiler identification is GNU 10.2.0 -- The CXX compiler identification is GNU 10.2.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Configuring done -- Generating done -- Build files have been written to: ~/dev/c-tutorial/bin/bin user@computer:~/dev/c-tutorial/bin$ ls CMakeCache.txt CMakeFiles cmake_install.cmake Makefile
You'll be using an earlier version of gcc
than me because we're using
different distros of Linux, but that should be the only difference. If you look
at all the files in the current directory, you'll see the Makefile
that
cmake
created for us. If we then type make
, we'll get output that looks like
user@computer:~/dev/c-tutorial/bin$ make Scanning dependencies of target print-lines-with-word [ 16%] Building C object CMakeFiles/print-lines-with-word.dir/src/print-lines-with-word.c.o ~/dev/c-tutorial/bin/src/print-lines-with-word.c:1:10: fatal error: str-operations.h: No such file or directory 1 | #include "str-operations.h" | ^~~~~~~~~~~~~~~~~~ compilation terminated. make[2]: *** [CMakeFiles/print-lines-with-word.dir/build.make:82: CMakeFiles/print-lines-with-word.dir/src/print-lines-with-word.c.o] Error 1 make[1]: *** [CMakeFiles/Makefile2:97: CMakeFiles/print-lines-with-word.dir/all] Error 2 make: *** [Makefile:103: all] Error 2
We're getting this error because our compiler has no idea where the file
str-operations.h
is since it's not in the same directory as the file it's
compiling, so we need to tell it where it is using the
include_directories
command, which we can do by changing the
CMakeLists.txt
file to
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | cmake_minimum_required(VERSION 3.0.0) project(c-tutorial) include_directories("${CMAKE_SOURCE_DIR}/include") add_executable(word-counter src/word-counter.c src/str-operations.c ) add_executable(print-lines-with-word src/print-lines-with-word.c src/str-operations.c ) |
Now, we need to rerun cmake
as before.
user@computer:~/dev/c-tutorial/bin$ cmake .. -- Configuring done -- Generating done -- Build files have been written to: ~/dev/c-tutorial/bin/bin user@computer:~/dev/c-tutorial/bin$ make Scanning dependencies of target print-lines-with-word [ 16%] Building C object CMakeFiles/print-lines-with-word.dir/src/print-lines-with-word.c.o [ 33%] Building C object CMakeFiles/print-lines-with-word.dir/src/str-operations.c.o [ 50%] Linking C executable print-lines-with-word [ 50%] Built target print-lines-with-word Scanning dependencies of target word-counter [ 66%] Building C object CMakeFiles/word-counter.dir/src/word-counter.c.o [ 83%] Building C object CMakeFiles/word-counter.dir/src/str-operations.c.o [100%] Linking C executable word-counter [100%] Built target word-counter
As you can see, cmake
by default comes with a lot more color, is a bit easier
to use, and takes care of figuring out the header file dependencies.
While there are other build systems (ninja
, premake
, whatever Windows uses,
etc.), make
and cmake
should be enough to set you up for C/C++
development
in the future. If something more complicated comes up, I'll discuss the build
system in more depth. Otherwise, I'm content with this introduction.
In the next article, Compiler Flags, we're going
to discuss the flags of gcc
and how you can use them to do things from
optimize your code to including debugging information to checking for illegal
memory accesses.