TU ACM logo.
blog

Coming Up with Our First Program in C

Before we can figure out what we want C to do, we need to figure out what we want to do.
Posted 15 January 2020 at 1:26 PM
By Joseph Mellor

This is the fourth article in the Making Sense of C series. In this article, we're going to come up with the requirements for a basic program so that we can figure out what features we need to implement in C. This article will serve as an intermission of sorts, so it's not going to be too long or cover as many topics.

As always, there's a (sort of) relevant xkcd for our situation.

In our case, a lot of the difficulty will come from a difference between what we as humans consider easy and what computers consider easy. For example, reading a million books is almost impossible for humans to do in a reasonable amount of time, but trivial for computers. Understanding a single sentence, however, is trivial for humans and extremely hard for computers to do. Likewise, recognizing elements in a picture or finding patterns in data are difficult for computers, but easy for humans.

Anyway, the point of this whole discussion is to emphasize that programs you might think will be easy to write may be way harder than you would expect. For example, making a game, even a small one, is an insanely huge task. In fact, it's such a huge task that almost all game development tutorials will tell you to use a game engine, or at least a rendering engine. I know of exactly one in-depth tutorial on how to make a game engine, and even then it requires that you know C++ and OpenGL. Luckily, TheChernoProject has tutorials for all three. Besides the actual code, you'll have to do all the artwork, sound design, etc. on your own.

Of course, companies make games all the time after writing all their own code, but you have to remember that these companies have millions of dollars to pay multiple teams of experienced developers, artists, actors, etc. to make a game. As a novice programmer in C, you don't want to bite off more than you can chew, so we're going to try to come up with a program simple enough for you to write without understanding linear algebra, physics, collision detection algorithms, networking, etc., but complex enough to where you need to come up with features you would need in C.

Projects We Won't Do (For Now!)

First, anything to do with graphics, images, or audio is out. While most graphical and audio software is written in C or C++ in one way or another (even if it's written in python, it still uses C or C++ under the hood through the libraries used and the interpreter), graphics processing requires you to, at the very least, understand the format of different file types, know how to use the GPU, understand shaders, linear algebra, etc., while audio processing would require you to learn stuff like Fourier Transforms if you wanted to do anything cool.

Once again, most software involving graphics uses C or C++, it's just that you as a beginner would have a hard time trying to make any graphical software.

Second, anything specific to an operating system is out for now (though we might try to implement platform specific software in a cross-platform way later in the series) because I don't want anyone reading this tutorial to end up unable to go any further.

Third, anything involving a computer understanding a natural language like English or Spanish is out because natural language processing is still an active area of research. We can still read English text, we just can't require it to understand why the sentence Colorless green ideas sleep furiously is meaningless.

Rudimentary text-based games would not fall into the previous category, but we still shouldn't do them for now because a full text-based game might be too large for a beginner.

Lastly, anything involving a ton of math or theory (AI, Machine Learning, Statistics, Numerical Methods, etc.) is out. While you should learn a lot of relevant math and CS theory if you want to be a great programmer, we have to take things one step at a time.

What Projects Can We Do?

I have four related projects that I think are simple enough for us to implement but require us to set up the features of the language.

For example, if the first program is given the text "The quick brown fox jumps over the lazy dog." and the word "the", it should print out 2. If the second program is given the text (Gettysburg Address) below

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
Fourscore and seven years ago our fathers brought forth, on this continent, a
new nation, conceived in liberty, and dedicated to the proposition that all men
are created equal. Now we are engaged in a great civil war, testing whether that
nation, or any nation so conceived, and so dedicated, can long endure. We are
met on a great battle-field of that war. We have come to dedicate a portion of
that field, as a final resting-place for those who here gave their lives, that
that nation might live. It is altogether fitting and proper that we should do
this. But, in a larger sense, we cannot dedicate, we cannot consecrate—we cannot
hallow—this ground. The brave men, living and dead, who struggled here, have
consecrated it far above our poor power to add or detract. The world will little
note, nor long remember what we say here, but it can never forget what they did
here. It is for us the living, rather, to be dedicated here to the unfinished
work which they who fought here have thus far so nobly advanced. It is rather
for us to be here dedicated to the great task remaining before us—that from
these honored dead we take increased devotion to that cause for which they here
gave the last full measure of devotion—that we here highly resolve that these
dead shall not have died in vain—that this nation, under God, shall have a new
birth of freedom, and that government of the people, by the people, for the
people, shall not perish from the earth.

and told to search for the word "the", it should print out:

2:new nation, conceived in liberty, and dedicated to the proposition that all men
9:hallow—this ground. The brave men, living and dead, who struggled here, have
10:consecrated it far above our poor power to add or detract. The world will little
12:here. It is for us the living, rather, to be dedicated here to the unfinished
14:for us to be here dedicated to the great task remaining before us—that from
16:gave the last full measure of devotion—that we here highly resolve that these
18:birth of freedom, and that government of the people, by the people, for the
19:people, shall not perish from the earth.

The third program, when given the same text, should print out something like

13	that
11	the
10	we
9	here
8	to
7	a
6	and
5	of
5	nation
5	it
5	have
5	for
4	this
4	in
4	dedicated
3	who
3	us
3	they
3	so
3	shall
3	people
...

Finally, the last program, when given the same text, should print out something like

the people: 0.272727272,3
the proposition: 0.090909090,1
the earth: 0.090909090,1
the great: 0.090909090,1
...
that nation: 0.153846153,2
that all: 0.076923076,2
...

where each line follows the format [first word] [second word]: [probability that second word comes after first word],​[number of times second word follows first word].

Why Choose These Projects?

I'm in a weird situation where I don't want to come up with any contrived usage for features in C, but I still want to bring in features from C. I think this project strikes the right balance is simple enough for a beginner to do but still useful enough for anyone who wants to do things like machine learning for natural language processing.

Coming Up With the Features We Need

Now that we've decided what we want to do, we need to come up with the features we need for each project. Once we've come up with the features we need, we can then start discussing how we can implement them.

First Project

For the first project, we need to be able to count, so we're going to need some basic arithmetic and some way of storing the count. We're also going to need a way to print our results to the screen and read from a file. We'll also need some way of changing what words and what files we use. Finally, we'll also need to come up with some way to get individual words from text, which means we'll have to write our own basic parsing algorithm.

Second Project

We'll need everything from the first project and we'll need to expand our printing capabilities to more than just numbers. Lastly, since we don't want to fill up the screen with text, we'll need some way to write data to files.

Third Project

This project is similar to the first project, except we'll need to do the first project for every word. While we could do this project inefficiently by writing a program that finds all the words in the file and then runs the first project for every word, we're going to introduce and implement a hash table, which will organize our data in such a way that we can efficiently read and write it.

Fourth Project

This last project will require everything from the third project plus floating point numbers to represent a probability. We'll also have to change up our algorithm a bit.

Other Features

In implementing these features, we're going to end up needing other features from the language, but we'll burn those bridges when we come to it.

Summary

Up to this point, we've decided that we're going to put our code into a file and give compiler the name of the file so it can turn it into machine code. We've also established that we can use comments with // or /* and */ to tell the compiler to ignore sections of the file and that our file will consist of a series of statements separated by semicolons.

In this article, we started moving away from just adding features we think we'll probably need to come up with clear goals that will help us figure out what features we need. We decided to stay with numerical and text-based projects for now, since those projects would be easiest for a beginner but still require us to have the necessary features for the language. We decided that C should have the ability to

Since we're putting ourselves in the shoes of Ritchie et al, we're going to design C in such a way that we can use all these features.

What's Next

While we could start with almost any feature of the language, we'll start simple with some basic arithmetic and storing values for later use.

A picture of Joseph Mellor, the author.

Joseph Mellor is a Senior at TU majoring in Physics, Computer Science, and Math. He is also the chief editor of the website and the author of the tumd markdown compiler. If you want to see more of his work, check out his personal website.
Credit to Allison Pennybaker for the picture.