This article is the first article in the Making Sense of C series.
Most programming tutorials seem to take three approaches:
While these tutorials have their use, none of these approaches will teach you how to program. To see why, forget programming for a second and consider what a tutorial for learning a foreign language would look like if it tried to teach the language like a programming tutorial.
I personally learned from working on several projects, using the Dictionary
Approach when I needed to know about a specific feature and the One-Off
Approach to expose me to a new thing to learn.
I also went through the Toy Approach, but I didn't learn much from it.
Plus, I can only name one tutorial or class that used the Toy Approach,
so I won't focus on it.
I can't speak for everyone, but given that I was unable to find any tutorials
C in the first two pages of Google's search results that didn't follow
these approaches, I don't think I'm an exception.
Now, experienced programmers know of an exception to the approaches I listed
for tutorials on the
C programming language: Brian Kernighan and Dennis
Ritchie's book The C
This book, written by the creators of
C themselves, is a good resource for
This tutorial does not intend to be a history lesson, but I will make it
clear when I'm making a historically incorrect claim with an explanation off to
My goal with this tutorial is to help you understand
C, which may mean
simplifying the history of
C's direct predecessor,
B, had plenty of influence on the syntax
C, as did several other languages.
B's influence on
C isn't necessary to understand the design
C, so I probably won't mention it that often.
In fact, I'm going to present the historical development of
C as a linear
Once again, there are tons of programming languages that had an influence on
CPL, and the
ALGOL family) and there were previous
programming languages that had little to do with
If you do want to read about the history of
C, check out Dennis Ritchie's paper on the evolution
It also does a good job of explaining design choices for the language.
CNeeds to Solve
Since I neither want to write yet another
C dictionary nor rehash The C
Programming Language, this tutorial will explain why
C is the way it is by
looking at the problems Ritchie et al faced when designing the
C solved them.
In other words, we're not just going to list all the syntax in
C, we're going
to build the language from the ground up.
This tutorial will also:
Cin such a way that you only need to read the previous articles in the series to understand the article.
At the end of this tutorial, you shouldn't just know the features and syntax of
C, but how you should use them to write good
First, let's talk about a program from the computer's perspective. Most, if not all computers use a von Neumann architecture, which means, when given a program, it will:
where instructions are basic operations (like addition or fetching memory) that are performed directly by the CPU, ALU, or GPU.
Check out Tom Scott's video on what your computer actually does for a more in depth explanation.
Of course, it gets a little more complicated once you introduce stuff like threads, vectorized instructions, and other forms of parallelism, but it's still the same basic process. To actually get a program running, it needs to be in machine code, so why not just write our programs in machine code?
To see why programming in machine code was painful, consider the following example that I pulled from David A. Patterson and John L. Hennessy's Computer Organization and Design: The Hardware/Software Interface, Fifth Edition (p. A-5):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
0010 0111 1011 1101 1111 1111 1110 0000 1010 1111 1011 1111 0000 0000 0001 0100 1010 1111 1010 0100 0000 0000 0010 0000 1010 1111 1010 0101 0000 0000 0010 0100 1010 1111 1010 0000 0000 0000 0001 1000 1010 1111 1010 0000 0000 0000 0001 1100 1000 1111 1010 1110 0000 0000 0001 1100 1000 1111 1011 1000 0000 0000 0001 1000 0000 0001 1100 1110 0000 0000 0001 1001 0010 0101 1100 1000 0000 0000 0000 0001 0010 1001 0000 0001 0000 0000 0110 0101 1010 1111 1010 1000 0000 0000 0001 1100 0000 0000 0000 0000 0111 1000 0001 0010 0000 0011 0000 1111 1100 1000 0010 0001 0001 0100 0010 0000 1111 1111 1111 0111 1010 1111 1011 1001 0000 0000 0001 1000 0011 1100 0000 0100 0001 0000 0000 0000 1000 1111 1010 0101 0000 0000 0001 1000 0000 1100 0001 0000 0000 0000 1110 1100 0010 0100 1000 0100 0000 0100 0011 0000 1000 1111 1011 1111 0000 0000 0001 0100 0010 0111 1011 1101 0000 0000 0010 0000 0000 0011 1110 0000 0000 0000 0000 1000 0000 0000 0000 0000 0001 0000 0010 0001
Unless you've written a MIPS assembler, you will not be able to tell me what the program does. Also, you probably wouldn't be able to write the program like this, as every eight bits would be a byte (basically, a character) and there would be no spaces or newlines. If you really wanted to, you could convert the instructions into an assembly language.
Fortunately, memory and register size on these early computers was small and they could only perform a few operations, meaning each of these instructions would have been between one and two bytes. Unfortunately, rendering text to the screen would have been too computationally inefficient for you to program in, so you would have to set all the memory in your program manually by flipping switches or using punch cards.
If you want to see a video on what that would look like, Ben Eater made a video on it where he made a simple computer with sixteen bytes of memory and programmed it to calculate the Fibonacci numbers less than 256. He made a slight logic error in his code where it prints out before it calculates the next number, meaning it will calculate 233, but it will reset before it prints it out.
As computers got more powerful, Kathleen
Booth figured that, instead of remembering all the binary instructions off
the top of her head, she would write her code using keywords related to the
instruction she wanted to run.
For example, she would type something like
add $1, $2, $3 instead of typing
some string of ones and zeros.
She would then have to write a program to convert her code into machine
In doing so, she wrote the first assembler for the first assembly language.
To see the improvement, let's look at the bare bones MIPS version that directly converts to the program above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
addiu $29, $29, -32 sw $31, 20($29) sw $4, 32($29) sw $5, 36($29) sw $0, 24($29) sw $0, 28($29) lw $14, 28($29) lw $24, 24($29) multu $14, $14 addiu $8, $14, 1 slti $1, $8, 101 sw $8, 28($29) mflo $15 addu $25, $24, $15 bne $1, $0, -9 sw $25, 24($29) lui $4, 4096 lw $5, 24($29) jal 1048812 addiu $4, $4, 1072 lw $31, 20($29) addiu $29, $29, 32 jr $31 move $2, $0
This program can be converted into machine code quite easily, as each line has
a one-to-one conversion rule.
Assembly languages are much easier to follow than the machine code (
mult- does multiplications,
j- jumps around in the code,
stores values in memory,
l- loads values from memory,
$n refers to register
stli checks if the value is less than the other value,
b- is a
conditional branch a.k.a. an
if statement, etc.), but it's still hard to
The highlighted lines, for example, say "Set the value at the memory location
28($29), get the value at
28($29), increment it by one, check if it's less
than 101, store the new value back in the specified memory location, jump
backwards 9 lines from the next line to line 7 if it is, and go to the next
line if it isn't."
If you have any experience with programming, you may recognize this as a
loop with the iteration variable being the memory at
While you can improve assembly with stuff like macros, labels, ASCII text, and
named registers instead of numbers, assembly languages suffer from a few
From the code snippet provided, it's easy to see several problems with writing code in assembly:
-9part of line 15 so that it goes to the proper line or use labels.
There are several other issues with assembly languages, such as the use of goto or jump statements, which lead to what is known as spaghetti code, code that is characterized by a bunch of nonlocal and nonlinear jumps throughout a program such that the control flow of the program is almost impossible to follow. If you were to draw the logical program flow of spaghetti code, you would get something that looks like a plate of spaghetti, hence the name. Spaghetti code is sometimes used as a general pejorative for bad code, but not all bad code is spaghetti code.
Spaghetti code strips lines of code from their context. If you run your code and you get an error on line 27, you won't be able to tell me what caused the error without diving deep into the code and following the execution from beginning to end. You also won't have any semblance of a call stack because you've turned the unit of execution into individual lines of code. Lastly, there's no semblance of what data each line of code can modify, so you end up with global variables.
Ravioli code, on the other hand, uses functions to break code into separate modules that interact through well-defined interfaces, usually argument passing or some sort of shared state like in object oriented programming. Doing so leads to a linear program flow with only local jumps and local variables.
All the tasks listed are menial tasks that you should be able to automate, but there's still one huge problem with writing code in assembly: it won't work on any CPU that does not accept the instruction set you wrote it in. The code provided will run on any system with the MIPS architecture, which doesn't include Intel, AMD, or any CPU you would find on a smartphone. Furthermore, you actually have different instructions available for different CPUs within the same architecture, though most of the code is still similar. You just want to loop through a range of numbers, but you're going to have to learn multiple instruction sets to implement the same program.
Ideally, you would like the computer to take care of these menial, but easy to automate tasks so that you can deal with the higher level algorithms and data organization.
Just as Booth came up with the idea of using a program to convert something intelligible to machine code, people like Grace Hopper came up with the idea of writing code in a more intelligible language, and then using a different program to convert it into assembly (or directly into machine code). In doing so, these people developed some of the first programming languages. Since they couldn't have a programming language without a compiler or interpreter, they also made compilers or interpreters for their language. If you want to be able to use the language on a specific architecture, you can write your own compiler or modify your interpreter.
A compiled language uses a compiler to convert source code into machine code, producing a whole new executable file. When you run the executable, you give the generated machine code directly to the operating system. You do not need the compiler to run the generated executable.
An interpreted language uses an interpreter to read source code and the interpreter generates the proper machine code line by line. When you run the executable, you give the original source code to the interpreter, which then converts it line by line into machine code, which then gets sent to the operating system. You do need the interpreter to run the program, though you can almost always generate a standalone executable that comes with the interpreter built in.
C is a compiled language, we will completely ignore interpreters and
only talk about compilers, even if the claim we're making about compilers
applies to interpreters.
For a programming language to be useful, it has to distribute the work fairly between the compiler and the programmer. The programmer should give the compiler enough information for it to generate the proper code and nothing more.
The first iteration of
C was implemented on the PDP-11 system, which had 24 KB
of memory with 12 KB reserved for the Unix operating system.
At best, you could get around 300,000 instructions per second.
To put that into perspective, my low-end Intel i-5 laptop has an L1
cache, 1.5 times larger than the entire available memory of the
In other words, the high speed memory built into the chip of my computer is
large enough for me to simulate the entirety of the PDP-11.
It can also handle around 53,000,000,000
instructions per second, which is around 180,000 times faster.
If you want a programming language on the PDP-11, it has to be close to the metal and simple enough for the compiler itself to run on the machine. Neither the compiler nor the program itself can take up a ton of memory or time.
In the modern era, most code doesn't run on the PDP-11, but instead advanced
computers hundreds of thousands of times faster with multithreaded, multicore
processors and around six orders of magnitude more RAM.
C still has the low level access to memory and almost direct conversion into
machine code that it did when it ran on the PDP-11, meaning it takes little
memory and runs incredibly fast compared to many other languages.
If you don't believe me, check out the
benchmarks for different programming languages for yourself.
If you're programming embedded software, however, you're almost guaranteed to
C or an assembly language because
C was made for computers with
Types with defined rules are easier for the compiler to convert into machine code. For example, there are only 128 possible ASCII characters, (though there are extensions), so it would make sense to only use a byte for each character. Ten ASCII characters should take up ten bytes and a thousand characters should take up a thousand bytes. Integers and floating point numbers, however, require more memory.
There is a lot of basic functionality (such as printing things to a terminal or
reading from a file) that everyone should be able to use if they need to use it
without having to write the code themselves.
While some wheels should be reinvented, printing stuff to the terminal should
Furthermore, if your goal is to write code that can be compiled on any computer,
you should take care of anything that could differ from computer to computer.
Therefore, if we want to make
C work, we need a standard library that contains
all the basic functionality a programmer would need.
C to be a good programming language, it must
-9in line 15 of the MIPS program).
We didn't actually discuss any specific features, design choices, or syntax of
C language in this article, but we did set up the ground work for what we
need to accomplish to make the
C language a good language.
The rest of the articles in this series will discuss these topics in the order
Some of these topics may take an article or two and some of these topics may be merged into other topics, depending on how much I need to say about it and whether it fits in with another topic.
Each topic will set the ground work for future topics. Every article will only require you to have read the previous articles to fully understand the topics in the article, though I might mention future topics in passing.