TU ACM logo.
blog

Printing Lines Containing a Specific Word

Our first experience with build systems.
Posted 4 January 2021 at 11:33 AM
By Joseph Mellor

This is the seventeenth article in the Making Sense of C series. In this article, we're going to write some new code and combine it with the code we've written in the last article to create a new program that prints out every line in a user specified file that contains a user-specified word.

Disclaimer

Since the focus for this article will be on build systems, we won't spend extra effort on making sure that we can handle lines of arbitrary lengths. For this article, we're going to assume that all lines are guaranteed to be less than 4095 characters. To deal with longer lines, we'd have to know how to dynamically allocate memory so we could print out the entire line or we could just print out the section of the line with an ellipsis before and after, e.g.

user@computer:~/dev/c-tutorial$ ./print-lines-with-word file.txt the
200:    ... which I had given to the last individual I had found ...

Writing the Code

As always, we need to start with our goal and break it up into high-level tasks. Just like the last article, we need to know the word and the file to read from. Then, if we find the word in a line, we need to print out the line number and the line.

Our algorithm currently looks like

  1. Get the user input.
  2. Read through each line and print out the line and the line number if it contains the word.

Boilerplate and Trivial Code

I decided to reuse the name of the section from the last article because what we're going to write here will be almost identical to the last program since we're taking in the same exact info in the exact same way. In fact, I'm just going to copy a part of the program from last time.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#include <stdio.h>

int main(int argc, char ** argv) {
    if (3 > argc) {
        fprintf(stderr, "./print-lines-with-word file_name word_to_count\n");
        return -1;
    }
    char * program_name = argv[0];
    char * file_name = argv[1];
    char * word = argv[2];
    // TODO: Loop through the file and print out lines containing the word
    return 0;
}

File Stuff

Again, we can do the exact same thing with the file stuff that we did with the word counter.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#include "str-operations.h"
#include <stdio.h>

int main(int argc, char ** argv) {
    if (3 > argc) {
        fprintf(stderr, "./print_lines_with_word file_name word_to_count\n");
        return -1;
    }
    char * program_name = argv[0];
    char * file_name = argv[1];
    char * word = argv[2];
    FILE * reader = fopen(file_name, "r");
    const int line_sz = 4096;
    char line[line_sz];

    while (fgets(line, line_sz, reader)) {
        // TODO: Print line if word is found
    }
    fclose(reader);
    return 0;
}

Printing Out Lines

We already have the mechanism for checking if a word is in a line with count_word_​in_line. If count_word_​in_line is greater than zero, it means we found the word at least once in the line. Now, since count_word_​in_line changes the line, we need to create a copy, which we can do with strncpy. We'll need to include string.h to do so.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include "str-operations.h"
#include <stdio.h>
#include <string.h>

int main(int argc, char ** argv) {
    if (3 > argc) {
        fprintf(stderr, "./print_lines_with_word file_name word_to_count\n");
        return -1;
    }
    char * program_name = argv[0];
    char * file_name = argv[1];
    char * word = argv[2];
    unsigned int count = 0;
    FILE * reader = fopen(file_name, "r");
    const int line_sz = 4096;
    char modified_line[line_sz];
    char unmodified_line[line_sz];

    while (fgets(unmodified_line, line_sz, reader)) {
        strncpy(modified_line, unmodified_line, line_sz - 1);
        if (count_word_in_line(modified_line, word) > 0) {
            printf("%s", unmodified_line);
        }
    }
    fclose(reader);
    return 0;
}

Line Numbers

We need to keep count of the line we're on and print it out. We also want to add a bit of white space to the left so the numbers align. We'll assume that the number of lines is less than 100000, which means that we need 5 digits of padding. We can use the format string %5u to print out the line number with the necessary padding. I added a : to delineate the line and the number.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include "str-operations.h"
#include <stdio.h>

int main(int argc, char ** argv) {
    if (3 > argc) {
        fprintf(stderr, "./print_lines_with_word file_name word_to_count\n");
        return -1;
    }
    char * program_name = argv[0];
    char * file_name = argv[1];
    char * word = argv[2];
    unsigned int count = 0;
    FILE * reader = fopen(file_name, "r");
    const int line_sz = 4096;
    char modified_line[line_sz];
    char unmodified_line[line_sz];
    unsigned int line_num = 1;

    while (fgets(unmodified_line, line_sz, reader)) {
        strncpy(modified_line, unmodified_line, line_sz - 1);
        if (count_word_in_line(modified_line, word) > 0) {
            printf("%5u: %s", line_num, unmodified_line);
        }
        line_num += 1;
    }
    fclose(reader);
    return 0;
}

And we're done with the code.

Compiling and Running the Code

We could compile the code using

user@computer:~/dev/c-tutorial$ gcc print-lines-with-word.c str-operation.c -o print-lines-with-word

and we could use it with

user@computer:~/dev/c-tutorial$ ./print-lines-with-word test-file.txt the
    3: trivial using regular expressions. In fact, we could replace a lot of the things
    5: including functions in the standard library such as toupper and strcmp. In fact,
    6: if I were tasked to write a program that counts the number of times a word shows
   13: Put simply, you have to have something round and roll it on the ground before
   16: You might be expected to solve the integral  manually in a Calculus class, but
   17: in any other class, you would look it up. At this part of the tutorial, we're
   22: starting out neither at the fundamental rules of the field nor at the level of
   26: the middle and spread towards both ends.
   28: The quick brown fox jumped over the lazy dog.
   30: the
   31: 	thE
   32:  tHE
   33: tHe
   35: THe
   36: THE
   37: ThE
   38: The
   40: There should be twenty-one instances of "the."
just as we did last time.

The Problem

If we want to compile like this manually, we have to recompile every .c file involved in an executable (or library, which we haven't covered yet) every time we run gcc. While this problem might not sound like a big deal for us since we have two small files, something like the Linux kernel has hundreds of C files, maybe even thousands. Taking an hour and a half to recompile the entire project every time you make a change is unacceptable from both an efficiency and an emotional standpoint (Imagine getting an error an hour in because you forgot a semicolon.).

This scenario gets even worse when you consider that we're redoing work that we shouldn't need to do. If a file doesn't change, then the compiler is going to compile it in exactly the same way. As you can imagine, we should be able to store the output of that compilation.

Object Files

gcc has exactly the feature we're looking for. If we add a -c, it will produce the output of the compilation of the file instead of an executable (or library). So now, we can do

user@computer:~/dev/c-tutorial$ gcc print-lines-with-word.c -c -o print-lines-with-word.o
user@computer:~/dev/c-tutorial$ gcc str-operations.c -c -o str-operations.o

where the .o stands for an object file. If you type ls, you'll notice that there are two new files: print-lines-​with-word.o and str-operations.o. These two files are the output of compiling the files without trying to link them. If we want our program, we can then use the object files instead of the C source code files like so

user@computer:~/dev/c-tutorial$ gcc print-lines-with-word.o str-operations.o -o print-lines-with-word

Now, let's say we make a change to word-counter.c. In this case, we can recompile it and link it using the commands

user@computer:~/dev/c-tutorial$ gcc word-counter.c -c -o word-counter.o
user@computer:~/dev/c-tutorial$ gcc word-counter.o str-operations.o -o word-counter

Note that we only had to compile word-counter.c because str-operations.o had already been compiled when we made print-lines-​with-word. If we were working on the Linux kernel, we'd only have to wait an hour and a half when we first compile it. If we only change ~10 files and then recompile, we'd only have to wait a few minutes to see our changes.

So, we seem to have solved the problem. If we want to recompile a project, we recompile every .c file we modified and every .c file that includes a header file that we modified since we last compiled the project into object files, and then we link all the relevant object files. Remember that the #include directive pastes the contents of the .h file into the .c file. The .o file needs to therefore reflect the .c file and any included .h files.

Cleaning Up

Before we move on, we're going to remove all the .o files from the folder since they'll interfere with our build system demos in the next part. We can do so with a simple rm *.o in the terminal.

Build Systems

As programmers, we like to automate as much work as possible because we're lazy and computers don't make dumb mistakes like forgetting to compile certain files. We should only have to tell the computer which output files depend on which input files and we're done. The program that figures out what to compile, the project file, and the organization of files up what is known as a build system. People can organize their projects in different ways and all build system programs can work with them, so I'm not going to focus on the organization. Instead, I'm going to focus on two programs used in build systems: make and cmake. We're going to start with make since it's a bit simpler, but then we're going to move to cmake since it's cross-platform and it takes care of a lot of little issues.

make

make is simple to start, but a little complex to automate. Luckily for us, we can hard-code everything to get the basic principles and then move to cmake. The basic premise behind make is that you write a file called Makefile with a bunch of rules of the form

target: dependency_one dependency_two
	command (optional)
	command (optional)

and make will create a file named target from the commands if dependency_one or dependency_two have been modified since target was last modified. For example, the rules for print-lines-with-word would look like

print-lines-with-word: print-lines-with-word.o str-operations.o
	gcc print-lines-with-word.o str-operations.o -o print-lines-with-word

This would be gross for us to write everywhere, so we can actually automatic variables to simplify the above code. In this case, we'll want to use $@ for the target and $^ for all dependencies. We can replace the above line with

print-lines-with-word: print-lines-with-word.o str-operations.o
	gcc $^ -o $@

and it will work identically to the previous version. Since the rest of the Makefile looks exactly like the rule above, I'll just give it to you.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
print-lines-with-word: print-lines-with-word.o str-operations.o
	gcc $^ -o $@

word-counter: word-counter.o str-operations.o
	gcc $^ -o $@

print-lines-with-word.o: print-lines-with-word.c str-operations.h
	gcc print-lines-with-word.c -c -o $@

str-operations.o: str-operations.c str-operations.h
	gcc str-operations.c -c -o $@

word-counter.o: word-counter.c str-operations.h
	gcc word-counter.c -c -o $@

Note that I had to add the .c file into the command since we don't compile .h files. If we then go to the terminal we can run make with the command make followed by the target, like so

user@computer:~/dev/c-tutorial$ make print-lines-with-word
gcc print-lines-with-word.c -c -o print-lines-with-word.o
gcc str-operations.c -c -o str-operations.o
gcc print-lines-with-word.o str-operations.o -o print-lines-with-word

As you can see, make will print out all the commands that ran. In this case, to make print-lines-​with-word, it had to first make print-lines-​with-word.o and str-operations.o, which it did by following the corresponding rules. Note that it did not compile word-counter or even word-counter.o since we did not need it to make print-lines-​with-word. We can then make the word-counter with

user@computer:~/dev/c-tutorial$ make word-counter
gcc word-counter.c -c -o word-counter.o
gcc word-counter.o str-operations.o -o word-counter

Note that make did not recompile str-operation.o since it was already compiled when we made print-lines-with-word.

If you want, you can test make by changing one of the files and running make again. If you run make without changing any of the files, it will tell you everything is up to date.

.PHONY Targets

It's a bit annoying to make both projects manually, so make has a solution. If you declare a .PHONY target, make won't treat the target like a file. So now, we can modify the Makefile to look like

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
.PHONY: all

all: word-counter print-lines-with-word

print-lines-with-word: print-lines-with-word.o str-operations.o
	gcc $^ -o $@

word-counter: word-counter.o str-operations.o
	gcc $^ -o $@

print-lines-with-word.o: print-lines-with-word.c str-operations.h
	gcc print-lines-with-word.c -c -o $@

str-operations.o: str-operations.c str-operations.h
	gcc str-operations.c -c -o $@

word-counter.o: word-counter.c str-operations.h
	gcc word-counter.c -c -o $@

Furthermore, if you don't give make a target, it will assume you wanted the first target in the Makefile, which is all in this case. Before we can show off make with the new all rule, we have to first remove all the .o files (and it's good practice to remove the targets too, so we'll do that). As it's a little inconvenient to do so manually, we can have make do it for us using another target, which we'll call clean.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
.PHONY: all

all: word-counter print-lines-with-word

print-lines-with-word: print-lines-with-word.o str-operations.o
	gcc $^ -o $@

word-counter: word-counter.o str-operations.o
	gcc $^ -o $@

print-lines-with-word.o: print-lines-with-word.c str-operations.h
	gcc print-lines-with-word.c -c -o $@

str-operations.o: str-operations.c str-operations.h
	gcc str-operations.c -c -o $@

word-counter.o: word-counter.c str-operations.h
	gcc word-counter.c -c -o $@

.PHONY: clean

clean:
	rm -f *.o print-lines-with-word word-counter

Note that we used rm -f to force removal so that we don't get an error if there are no files to remove. Now, run make clean first to remove all the output, then run make.

user@computer:~/dev/c-tutorial$ make clean
rm -f *.o print-lines-with-word word-counter
user@computer:~/dev/c-tutorial$ make
gcc word-counter.c -c -o word-counter.o
gcc str-operations.c -c -o str-operations.o
gcc word-counter.o str-operations.o -o word-counter
gcc print-lines-with-word.c -c -o print-lines-with-word.o
gcc print-lines-with-word.o str-operations.o -o print-lines-with-word

Parallelism

You can run make -j [num-threads] to compile multiple files in parallel, which can decrease compilation time.

user@computer:~/dev/c-tutorial$ make -j 4

Since we have a small project, we won't get any noticeable benefit. Compiling multiple files at the same time can also lead to warnings and errors getting mixed up and being harder to read and I haven't been in a situation where compiling the program took a long enough time for me to care, so I normally compile the files without parallelism.

cmake

Now that we know how to write a Makefile, let's talk about cmake. Makefiles tend to either be tedious to write or annoyingly complex. This one just requires you to put your .c files in src, your .h files in include, external includes in external_includes, and external libraries in libs, and it will take care of everything else. Furthermore, they aren't cross-platform, so you can't use them on a Windows system without some extra work. cmake tries to solve both problems by generating the right project file for you no matter what platform you're on. Run make clean again to clean up and delete the old Makefile, since we're going to use cmake in the future.

Installing cmake

Just in case you haven't yet or don't remember how to, the commands are

user@computer:~/dev/c-tutorial$ sudo apt update
user@computer:~/dev/c-tutorial$ sudo apt install cmake

on Ubuntu, Linux Mint, and the Windows Subsystem for Linux and

computer:~ user$ brew update
computer:~ user$ brew install cmake

on Mac computers. Now might also be a good time to upgrade your system by running sudo apt upgrade or brew upgrade.

File Organization

Right now, everything is kind of gross since the output and the input are in the same folder, meaning you might end up accidentally deleting everything and you don't care about the .o files, so they should be hidden away. With those facts in mind, I have two main build systems I like to use:

  1. Single-Project
    • src: Contains .c files.
    • include: Contains .h files.
    • bin: Contains output and intermediate files like object files.
    • CMakeLists.txt
  2. Multi-Project
    • project1
      • src
      • include
      • CMakeLists.txt
    • project2
      • src
      • include
      • CMakeLists.txt
    • bin
    • CMakeLists.txt

I separate include and src because it's annoying when I end up editing the header file when I wanted to edit the source file, but you could honestly dump them all in the same directory for project files. That being said, I recommend that all your output be separated from your input by at least one directory (e.g. put all your output in bin and all your input in src where both are in the same directory) and your project file (in this case, CMakeLists.txt) be at the root directory of your project. For now, we're doing a single project and we're going to go with separating include and src simply because there's an option in CMake to specify the include directory and I want to use it. First, we'll need to create the directories, then move our files into the proper directory, then we'll go from there.

user@computer:~/dev/c-tutorial$ mkdir src include bin
user@computer:~/dev/c-tutorial$ make clean
user@computer:~/dev/c-tutorial$ mv *.c src
user@computer:~/dev/c-tutorial$ mv *.h include
user@computer:~/dev/c-tutorial$ rm Makefile
user@computer:~/dev/c-tutorial$ ls *
test-file.txt

bin:

include:
str-operations.h

src:
print-lines-with-word.c  str-operations.c  word-counter.c

Now, we'll need to make the cmake project file, which is known as CMakeLists.txt. Create the new file in ~/dev/​c-tutorial/ as you did earlier with the previous files before you moved them. It should be in the same directory as bin, include, and src.

CMakeLists.txt

First, we specify the minimum version of cmake we want to use based on the functions we're using. In this case, we'll go with 3.0.0 because we're not using anything fancy and 2.X.X is outdated. We can do this with the command cmake_minimum_version like so

1
cmake_minimum_required(VERSION 3.0.0)

Let's also give it a project name like c-tutorial, which we can do with the project command like so

1
2
cmake_minimum_required(VERSION 3.0.0)
project(c-tutorial)

We don't need to give the project a name, but it's nice to have. We could also use it later. Anyway, let's add the executables. Unlike with make where we specify rules, with cmake, we specify just which .c files are involved with which target. In our case, we want to create two executables, so we'll use the add_executable command.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
cmake_minimum_required(VERSION 3.0.0)
project(c-tutorial)

add_executable(word-counter
	src/word-counter.c
	src/str-operations.c
)

add_executable(print-lines-with-word
	src/print-lines-with-word.c
	src/str-operations.c
)

As you might be able to guess, the first argument is the name of the executable and the other arguments are the source files needed for it to compile. Note that we had to specify the path relative to the directory containing CMakeLists.txt.

cmake Directories

In cmake, there are two main directories: CMAKE_SOURCE_DIR and CMAKE_BINARY_DIR. The first one is wherever the CMakeLists.txt file is located. The second is wherever you run the cmake command. Generally, you want these two directories to be different so that you don't get build files mixed in with your code files. For this reason, we're going to be doing what is known as an out of source build. At this point, all we need to do is move into the bin directory and run cmake with the location of CMakeLists.txt passed in as the argument. This process will look like

user@computer:~/dev/c-tutorial$ cd bin
user@computer:~/dev/c-tutorial/bin$ cmake ..
-- The C compiler identification is GNU 10.2.0
-- The CXX compiler identification is GNU 10.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to:
~/dev/c-tutorial/bin/bin
user@computer:~/dev/c-tutorial/bin$ ls
CMakeCache.txt
CMakeFiles
cmake_install.cmake
Makefile

You'll be using an earlier version of gcc than me because we're using different distros of Linux, but that should be the only difference. If you look at all the files in the current directory, you'll see the Makefile that cmake created for us. If we then type make, we'll get output that looks like

user@computer:~/dev/c-tutorial/bin$ make
Scanning dependencies of target print-lines-with-word
[ 16%] Building C object CMakeFiles/print-lines-with-word.dir/src/print-lines-with-word.c.o
~/dev/c-tutorial/bin/src/print-lines-with-word.c:1:10: fatal error: str-operations.h: No such file or directory
    1 | #include "str-operations.h"
      |          ^~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [CMakeFiles/print-lines-with-word.dir/build.make:82: CMakeFiles/print-lines-with-word.dir/src/print-lines-with-word.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:97: CMakeFiles/print-lines-with-word.dir/all] Error 2
make: *** [Makefile:103: all] Error 2

We're getting this error because our compiler has no idea where the file str-operations.h is since it's not in the same directory as the file it's compiling, so we need to tell it where it is using the include_​directories command, which we can do by changing the CMakeLists.txt file to

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
cmake_minimum_required(VERSION 3.0.0)
project(c-tutorial)

include_directories("${CMAKE_SOURCE_DIR}/include")

add_executable(word-counter
	src/word-counter.c
	src/str-operations.c
)

add_executable(print-lines-with-word
	src/print-lines-with-word.c
	src/str-operations.c
)

Now, we need to rerun cmake as before.

user@computer:~/dev/c-tutorial/bin$ cmake ..
-- Configuring done
-- Generating done
-- Build files have been written to: ~/dev/c-tutorial/bin/bin
user@computer:~/dev/c-tutorial/bin$ make
Scanning dependencies of target print-lines-with-word
[ 16%] Building C object CMakeFiles/print-lines-with-word.dir/src/print-lines-with-word.c.o
[ 33%] Building C object CMakeFiles/print-lines-with-word.dir/src/str-operations.c.o
[ 50%] Linking C executable print-lines-with-word
[ 50%] Built target print-lines-with-word
Scanning dependencies of target word-counter
[ 66%] Building C object CMakeFiles/word-counter.dir/src/word-counter.c.o
[ 83%] Building C object CMakeFiles/word-counter.dir/src/str-operations.c.o
[100%] Linking C executable word-counter
[100%] Built target word-counter

As you can see, cmake by default comes with a lot more color, is a bit easier to use, and takes care of figuring out the header file dependencies.

Summary

While there are other build systems (ninja, premake, whatever Windows uses, etc.), make and cmake should be enough to set you up for C/C++ development in the future. If something more complicated comes up, I'll discuss the build system in more depth. Otherwise, I'm content with this introduction.

In the next article, Compiler Flags, we're going to discuss the flags of gcc and how you can use them to do things from optimize your code to including debugging information to checking for illegal memory accesses.

A picture of Joseph Mellor, the author.

Joseph Mellor is a Senior at TU majoring in Physics, Computer Science, and Math. He is also the chief editor of the website and the author of the tumd markdown compiler. If you want to see more of his work, check out his personal website.
Credit to Allison Pennybaker for the picture.