This is the fourth article in the Making Sense of C series. In this article, we're going to come up with the requirements for a basic program so that we can figure out what features we need to implement in C. This article will serve as an intermission of sorts, so it's not going to be too long or cover as many topics.
As always, there's a (sort of) relevant xkcd for our situation.
In our case, a lot of the difficulty will come from a difference between what we as humans consider easy and what computers consider easy. For example, reading a million books is almost impossible for humans to do in a reasonable amount of time, but trivial for computers. Understanding a single sentence, however, is trivial for humans and extremely hard for computers to do. Likewise, recognizing elements in a picture or finding patterns in data are difficult for computers, but easy for humans.
Anyway, the point of this whole discussion is to emphasize that programs you
might think will be easy to write may be way harder than you would expect.
For example, making a game, even a small one, is an insanely huge task.
In fact, it's such a huge task that almost all game development tutorials will
tell you to use a game engine, or at least a rendering engine.
I know of exactly one in-depth
tutorial on how to make a game engine, and even then it requires that you
know C++
and OpenGL.
Luckily, TheChernoProject has tutorials for all three.
Besides the actual code, you'll have to do all the artwork, sound design, etc.
on your own.
Of course, companies make games all the time after writing all their own code,
but you have to remember that these companies have millions of dollars to pay
multiple teams of experienced developers, artists, actors, etc. to make a game.
As a novice programmer in C
, you don't want to bite off more than you can
chew, so we're going to try to come up with a program simple enough for you to
write without understanding linear algebra, physics, collision detection
algorithms, networking, etc., but complex enough to where you need to come up
with features you would need in C
.
First, anything to do with graphics, images, or audio is out.
While most graphical and audio software is written in C
or C++
in one way or
another (even if it's written in python
, it still uses C
or C++
under the
hood through the libraries used and the interpreter), graphics processing
requires you to, at the very least, understand the format of different file
types, know how to use the GPU, understand shaders, linear algebra, etc., while
audio processing would require you to learn stuff like Fourier Transforms if you
wanted to do anything cool.
Once again, most software involving graphics uses C
or C++
, it's just that
you as a beginner would have a hard time trying to make any graphical software.
Second, anything specific to an operating system is out for now (though we might try to implement platform specific software in a cross-platform way later in the series) because I don't want anyone reading this tutorial to end up unable to go any further.
Third, anything involving a computer understanding a natural language like English or Spanish is out because natural language processing is still an active area of research. We can still read English text, we just can't require it to understand why the sentence Colorless green ideas sleep furiously is meaningless.
Rudimentary text-based games would not fall into the previous category, but we still shouldn't do them for now because a full text-based game might be too large for a beginner.
Lastly, anything involving a ton of math or theory (AI, Machine Learning, Statistics, Numerical Methods, etc.) is out. While you should learn a lot of relevant math and CS theory if you want to be a great programmer, we have to take things one step at a time.
I have four related projects that I think are simple enough for us to implement but require us to set up the features of the language.
For example, if the first program is given the text "The quick brown fox jumps
over the lazy dog." and the word "the", it should print out 2
.
If the second program is given the text (Gettysburg Address) below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | Fourscore and seven years ago our fathers brought forth, on this continent, a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived, and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting-place for those who here gave their lives, that that nation might live. It is altogether fitting and proper that we should do this. But, in a larger sense, we cannot dedicate, we cannot consecrate—we cannot hallow—this ground. The brave men, living and dead, who struggled here, have consecrated it far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us—that from these honored dead we take increased devotion to that cause for which they here gave the last full measure of devotion—that we here highly resolve that these dead shall not have died in vain—that this nation, under God, shall have a new birth of freedom, and that government of the people, by the people, for the people, shall not perish from the earth. |
and told to search for the word "the", it should print out:
2:new nation, conceived in liberty, and dedicated to the proposition that all men 9:hallow—this ground. The brave men, living and dead, who struggled here, have 10:consecrated it far above our poor power to add or detract. The world will little 12:here. It is for us the living, rather, to be dedicated here to the unfinished 14:for us to be here dedicated to the great task remaining before us—that from 16:gave the last full measure of devotion—that we here highly resolve that these 18:birth of freedom, and that government of the people, by the people, for the 19:people, shall not perish from the earth.
The third program, when given the same text, should print out something like
13 that 11 the 10 we 9 here 8 to 7 a 6 and 5 of 5 nation 5 it 5 have 5 for 4 this 4 in 4 dedicated 3 who 3 us 3 they 3 so 3 shall 3 people ...
Finally, the last program, when given the same text, should print out something like
the people: 0.272727272,3 the proposition: 0.090909090,1 the earth: 0.090909090,1 the great: 0.090909090,1 ... that nation: 0.153846153,2 that all: 0.076923076,2 ...
where each line follows the format [first word] [second word]: [probability
that second word comes after first word],[number of times second
word follows first word]
.
I'm in a weird situation where I don't want to come up with any contrived usage
for features in C
, but I still want to bring in features from C
.
I think this project strikes the right balance is simple enough for a beginner
to do but still useful enough for anyone who wants to do things like machine
learning for natural language processing.
Now that we've decided what we want to do, we need to come up with the features we need for each project. Once we've come up with the features we need, we can then start discussing how we can implement them.
For the first project, we need to be able to count, so we're going to need some basic arithmetic and some way of storing the count. We're also going to need a way to print our results to the screen and read from a file. We'll also need some way of changing what words and what files we use. Finally, we'll also need to come up with some way to get individual words from text, which means we'll have to write our own basic parsing algorithm.
We'll need everything from the first project and we'll need to expand our printing capabilities to more than just numbers. Lastly, since we don't want to fill up the screen with text, we'll need some way to write data to files.
This project is similar to the first project, except we'll need to do the first project for every word. While we could do this project inefficiently by writing a program that finds all the words in the file and then runs the first project for every word, we're going to introduce and implement a hash table, which will organize our data in such a way that we can efficiently read and write it.
This last project will require everything from the third project plus floating point numbers to represent a probability. We'll also have to change up our algorithm a bit.
In implementing these features, we're going to end up needing other features from the language, but we'll burn those bridges when we come to it.
Up to this point, we've decided that we're going to put our code into a file and
give compiler the name of the file so it can turn it into machine code.
We've also established that we can use comments with //
or /*
and */
to
tell the compiler to ignore sections of the file and that our file will consist
of a series of statements separated by semicolons.
In this article, we started moving away from just adding features we think we'll
probably need to come up with clear goals that will help us figure out what
features we need.
We decided to stay with numerical and text-based projects for now, since those
projects would be easiest for a beginner but still require us to have the
necessary features for the language.
We decided that C
should have the ability to
Since we're putting ourselves in the shoes of Ritchie et al, we're going to
design C
in such a way that we can use all these features.
While we could start with almost any feature of the language, we'll start simple with some basic arithmetic and storing values for later use.