This is the ninth article in the Making Sense of C series. In this article, we're going to come up with ways to modify the behavior of the program at runtime.
So far, we've
//
for single line comments and /*
and */
for multiline comments,
+-*/%
for arithmetic,
[type] [variable] = [expression]
which will allow us to store
values for later use,
char
, short
, int
, and long long
) and
the floating point types (float
and double
),
char
type and invented the
NULL
character, which indicates that we're ending a string,
char
.
&
),
*
),
type * variable_name;
,
type
array[num_elements];
,
char
array using double quotes
("Hello!"
),
variable_name[offset]
.
Once we get to the projects and we have some actual example code, I'll stop listing everything we've covered up to this point. For now, I think it's important to keep all these ideas in your working memory so that once we get to the projects, you won't have to go back to the earlier articles and forget everything up to this point.
In the last article, we ended by saying that we need some way to tell if two strings have the same characters, which means we'll need the ability to do two things:
In this article, we're going to introduce these methods.
==
, !=
, <
, >
, <=
, >=
&&
, ||
, and !
if
and else
while
and do
for
First, we're going to need some sort of way to evaluate whether or not a statement is true, so we're going to introduce a few operators to help us out. In a natural language (e.g. English), you can read these operators as
==
:equals
!=
:does not equal
<
:less than
<=
:less than or equal to
>
:greater than
>=
:greater than or equal to
and they all use the syntax [expression 1] ▢ [expression 2]
, where
▢
represents any of the relational operators.
a ▢ b
returns 1
if a ▢ b
and 0
otherwise.
For example:
5 == 5
⇒1
because "5
equals 5
" is a true statement
5 == 6
⇒0
because "5
equals 6
" is a false statement
5 != 5
⇒0
because "5
does not equal 5
" is a false statement
5 != 6
⇒1
because "5
does not equal 6
" is a true statement
25 == 6 * 4 + 1
⇒1
because the right side of the ==
evaluates to 25
and "25
equals 25
" is a true statement.
8 <= 100
⇒1
because "8
is less than or equal to 100
" is a true
statement
8 <= 8
⇒1
because "8
is less than or equal to 8
" is a true
statement
8 < 8
⇒0
because "8
is less than 8
" is a false statement
8 <= -7
⇒0
because "8
is less than or equal to -7
" is a false
statement
I could keep going, but the general idea remains the same.
What if you want to check two things?
For example, what if you want to check if a number is between 4
and 20
?
You can use the logical operators to check for multiple conditions.
These operators represent the fundamental operators of boolean logic.
Both &&
and ||
have the syntax [expression 1] ▢ [expression 1]
,
but !
has the syntax ![expression]
.
All logical operators return 1
or 0
just like the relational operators.
&&
represents and
, which will return 1
if both the expressions to
the left and right of &&
are not zero.
||
represents or
, which will return 1
if either of the expressions to
the left and right of ||
is not zero.
!
represents not
, which will return 1
only if the expression to its
right is zero.
Usually, you would use the logical operators with the relational operators.
For example, say you want a username to be between 4
and 20
characters long.
Assuming you have the length of the username stored in a variable called
username_len
, you could check if it was more than four characters using 4
< username_len
and you could check if it was less than 20
characters
long using 20 > username_len
.
To check if the username is both longer than four characters and less
than twenty charachters, you would use the &&
operator like (4 <
username_len) && (20 > username_len)
.
In C
, 0
is interpreted as false and everything else is interpreted as true.
C
does not have dedicated boolean types built into the langauge, but all
conditional branches and loops work under 0
⇒false
.
The logical and relational operators are mainly used with conditional branches
and loops.
For example, let's go back to the username example in the Logical Operators section, where we figured out an expression that would help us get a username that was more than four characters and less than twenty. We'll go through three cases.
if
and else
Since we need to come up with some keyword or symbol or something to tell the
computer to do one thing if a condition is true and do something else if a
condition is false.
Well, since we used if
and else
in describing what we want the computer to
do and since they're short keywords, let's just use if
to indicate that we
want to do something and else
to indicate that we want to do something else.
A conditional branch has five parts:
if
keyword,
if
statement is true,
else
keyword,
if
statement is false.
We also need some way of separating these parts from the rest of the code.
The grouping symbols we have are (){}[]
.
Since we already use []
for arrays, we can only use (){}
.
We'll put the condition we want to test in ()
and the code we want to execute
in {}
.
If we have an else
statement, we can put it after the code we want to execute.
Our generic conditional branch will look like:
// Other code if (condition) { // Do stuff you would want to do if condition is true } else { // Optional // Do stuff you would want to do if condition is false } // Other code
Here is how we would apply if
statements to deal with the three cases to check
the username.
// FIRST CASE: WE ONLY WANT TO DO STUFF IF THE USERNAME IS VALID if ((4 < username_len) && (20 > username_len)) { // Do stuff you would want to do if the username is valid } // Stuff to do regardless of whether the username is a valid length //------------------------------------------------------------------------------ // SECOND CASE: WE WANT TO DO STUFF IF THE USERNAME IS VALID AND DIFFERENT STUFF // IF IT IS INVALID if ((4 < username_len) && (20 > username_len)) { // Do stuff you would want to do if the username is valid } else { // Do stuff you would want to do if the username is invalid } // Stuff to do regardless of whether the username is a valid length //------------------------------------------------------------------------------ //THIRD CASE: THREE DIFFERENT OPTIONS if (4 > username_len) { // Do stuff you would want to do if the username is too short } else { if (20 < username_len) { // Do stuff you would want to do if the username is too long } else { // Do stuff you would want to do if the username is valid } } // Stuff to do regardless of whether the username is a valid length
The third case is closest to something you would see in actual code, though the
programmer might use a switch
statement instead.
Plus, the last case is kind of weird, so let's come up with the else if
syntax
to shorten everything.
//THIRD CASE: SIMPLER FORM if (4 > username_len) { // Do stuff you would want to do if the username is too short } else if (20 < username_len) { // Do stuff you would want to do if the username is too long } else { // Do stuff you would want to do if the username is valid } // Stuff to do regardless of whether the username is a valid length
Just to be absolutely clear on what the computer will do when it sees this code,
I'm going to list out all the steps the computer will take when it enters the
if
statement.
4 > username_len
.
4 > username_len
is true, then go to step 3, else go to step 5.
if (4 >
username_len)
.
20 < username_len
.
20 < username_len
is true, then go step 6, else go to step 8.
if (20 <
username_len)
.
else
.
if-else
statements.
==
Instead of =
In general, you should not put variables on the left of the ==
operator
because it's common to accidentally type =
instead, which leads to valid C
code that acts differently from what you expected, which is what causes
most bugs and security exploits.
If you want to test if a variable equals an expression, [variable] ==
[expression]
will return a 1
if [variable]
equals [expression]
and a 0
if [variable]
does not equal [expression]
.
On the other hand, [variable] = [expression]
will return [expression]
.
Since only 0
is considered false, what you're actually testing is
[expression] != 0
.
If you do [expression] == [variable]
, however, it functions exactly like
[variable] == [expression]
, but now if you forget the second =
, the compiler
will catch the error because you cannot assign an expression to another
expression.
In code it would look like:
int a = 6; // This form is bad because you can forget one of the equal signs, but it works // exactly as expected. if (a == 5) { // do stuff } // This is what the typo would look like. a = 5 will always return 5, which will // always be interpreted as true, so the stuff inside the if statement will // always run regardless of whether a equals 5 or not if (a = 5) { // do stuff } // This functions exactly like you would expect it to if (5 == a) { // do stuff } // This doesn't compile because you can't assign anything to an rvalue if (5 = a) { // do stuff }
In general, you should try to code in such a way that if you make a mistake, the compiler will catch it before it can even run.
if
in Our CodeNow that we have a way for us to check if some condition is true, we can check
if two words match.
For example, let's say that our word is "the" and we only want a match if the
characters are the same case (It makes our code easier.).
To check if two words match, we have to check if the individual characters
match.
Remember that we can access individual char
s in an array using the syntax
array[offset]
, we can check if two characters match by comparing them using
a == b
, and we can check if every character matches using the syntax cond1
&& cond2
.
Putting it all together, we get
// Assume we store the word we want to compare in the variable word, which is an // array of more than four characters for now // This if statement can go on multiple lines if (('t' == word[0]) && ('h' == word[1]) && ('e' == word[2]) && ('\0' == word[3]) { // Forgetting to check for the null character // means you'll also match words like "these" // and "there" // Do stuff you would want to do if the word matches }
Of course, this code has a ton of problems:
word
has at least four characters.
We should be able to handle pretty much any word of any length without writing the word directly in the source code. In this case, we will have to introduce a loop.
Loops are pretty straightforward.
While some condition is true, it will run through the code in the block, jump
back to the top of the block, and then keep executing the code in the block
again.
We have two types of looping: unindexed and indexed.
To check if two strings match, we need to check if each individual character
matches including the null character ('\0'
) and we should stop once we see the
null character.
Since we don't know how long the string will be beforehand, we should use
unindexed looping.
There are two types of unindexed loops: while
loops and do-while
loops.
A while loop is the simplest, and it has a similar syntax to an if
statement.
while (condition) { // Stuff to do while the condition is true }
While condition
is true, the loop will execute all the code inside the curly
braces, then jump back to the top of the curly braces, check the condition, then
repeat until condition
is false.
For example, let's go back to the username_len
example.
We never specified how we would calculate it because we didn't have the tools to
calculate it properly.
Now that we have a while
loop, we can using this code (assume that the
username is stored in the variable username
and that username
is a char
array):
unsigned short username_len = 0; // Using a short since I don't expect a // username longer than 64000 characters while (username[username_len]) { username_len += 1; } // username_len now has the number of characters in the username.
A few things to note here:
username_len
to 0
before I started.
username[username_len]
returns the character at (username + username_len)
.
username[username_len]
keeps changing because we keep changing the value of
username_len
.
username[username_len]
is the condition, and it works because 0
is false in
C
and the null terminator is actually 0
in ASCII.
Let's say for a moment that our username is "jpm"
.
The value in username
is { 'j', 'p', 'm', '\0' }
.
The code above will execute these exact steps:
username_len
.
while
loop.
while
loop.
username + 0
('j'
) is 0
.'j'
is not 0
, we move into the block.username_len
.while
loop.
username + 1
('p'
) is 0
.'p'
is not 0
, we move into the block.username_len
.while
loop.
username + 2
('m'
) is 0
.'m'
is not 0
, we move into the block.username_len
.while
loop.
username + 3
('\0'
) is 0
.'\0'
is 0
, we exit the while
loop.username_len
now has the value 3
, which is the number of characters in the
string.
We can also use a modification of a while
loop known as a do-while
loop.
do { // Stuff to do while the condition is true } while (condition);
The only difference between a do-while
loop and a while
loop is that a
do-while
loop will check the condition after running the code in the block.
If we had used a do-while
loop for figuring out the length of the username,
the computer would execute the steps in this order:
username_len
.
while
loop.
while
loop.
username_len
.username + 1
('p'
) is 0
.'p'
is not 0
, we jump back to the top of the block.while
loop.
username_len
.username + 2
('m'
) is 0
.'m'
is not 0
, we jump back to the top of the block.while
loop.
username_len
.username + 3
('\0'
) is 0
.'\0'
is 0
, we exit the while
loop.username_len
now has the value 3
, which is the number of characters in the
string.
If you're wondering why we used a while
loop instead of a do-while
loop for
finding out the length of the username even though they produces similar
results, notice that since username_len += 1;
always runs before the condition
is checked, username_len
will always be greater than 1
, even if you pass in
the empty string ""
, which should have a username_len
of 0
.
Since we're covering control flow in this article, we should cover the other
type of looping in C
: for
loops.
We'll have a use for them later, but for now, I'm just going to talk about them
and bring them up later.
If you can figure out the range of values, you should use a for
loop, which
has the syntax:
for (initialization; condition; the next step) { // do stuff }
First, initialization
will run before anything else in the loop runs.
Then, the loop starts.
For every iteration of the loop, it will go to the next iteration of the loop if
condition to check
is true, then it will do whatever is between the curly
braces, then it will do the next step
, then it will move to the next iteration
of the loop.
For example, let's say we want to calculate
12+22+32+…+1002.
Since we know where to start (1
), where to end (100
), and the step between
each number (1
), we should use a for
loop.
int sum_of_squares = 0; for (int i = 1; 100 >= i; i++) { // i is only visible inside the for loop sum_of_squares += i * i; } // sum_of_squares now contains 338 350, which is the sum of the first 100 square // numbers. //------------------------------------------------------------------------------ // EXAMPLE TO CALCULATE THE SUM OF THE ODD SQUARE NUMBERS BETWEEN 17 and 1001 // INCLUSIVE int sum_of_squares = 0; for (int i = 17; 1001 >= i; i += 2) { sum_of_squares += i * i; } // sum_of_squares now contains the number 167 667 821, which is the sum of odd // square numbers between 17 and 1001 inclusive
for
and When to Use while
?To be clear, you can convert any for
loop into a while
loop easily.
for (initialization; condition; next_step) { // do stuff } // is almost exactly (I have to explain scope for you to understand the // difference.) equivalent to initialization; while (condition) { // do stuff next_step; }
In general, I use a for
loop when I know the range and the step I want to
iterate over and a while
loop when I don't know the range.
As an example, I will use a for
loop when you want me to read from the
seventeenth element of a list until the thirtieth and I will use a while
loop
when I want to start reading at the seventeenth element and continue reading
until I reach a specified element.
Since strings end once we reach the null terminator, I will generally use a
while
loop to read strings and a for
loop otherwise.
Invariants are simply things that remain constant between iterations of a loop. For loops, it's better for performance and intelligibility to pull invariants out of the loop body and store them before you get to the loop.
// BAD WAY TO DO THE CALCULATION int total_sum = 0; for (int i = 0; 100 >= i; i++) { int sum_of_squares = 0; for (int j = 0; 100 >= j; j++) { sum_of_squares += j * j; } total_sum += i * sum_of_squares; } // Since sum_of_squares will not change, between iteratrions, we can pull it out // of the for loop so the program doesn't have to waste time recomputing it. // It also improves readability since it separates the code into two different // parts. // TOTAL NUMBER OF OPERATIONS: 10,100 multiplications and 10,100 additions //------------------------------------------------------------------------------ //BETTER WAY TO DO THE CALCULATION int total_sum = 0; for (int i = 1; 100 >= i; i++) { total_sum += i; } int sum_of_squares = 0; for (int i = 1; 100 >= i; i++) { sum_of_squares += i * i; } total_sum *= sum_of_squares; // a * b + a * c is the same as a * (b + c) and it saves you some // multiplications. // TOTAL NUMBER OF OPERATIONS: 101 multiplications and 200 additions
Note about the code above: In practice, it would probably be best to
either calculate the value of total sum using a calculator and just set int
total_sum = 1708667500;
or use the various formulas.
Doing so will guarantee that your compiler can optimize the code out.
Now that we can loop through all the characters in a word and we can check for certain conditions, we should be able to check if two words are the same easily.
For now, we're going to consider "the" and "The" different words because the 't' and the 'T' are different characters in ASCII, but we'll introduce a way to deal with it later.
You should try to come up with a way to check if two words are the same using
what you've learned in this article before going further.
Remember that you must read characters from both words until you've found a
character that doesn't match or until you've read to the end of the word which
is denoted with the null terminator, '\0'
.
Two characters match only if you reach the null terminator for both strings
without finding a mismatch.
Don't worry if you come up with an inefficient or incomplete way. As long as you come up with something to work from, you're fine. If it's inefficient, you can figure out why it's inefficient, and if it's incomplete, you can figure out why it's incomplete. Once you want to see the answer, hover over the program below.
Assuming the strings are properly formatted, I would write the following code:
int i = 0; while (str1[i] && str2[i] && (str1[i] == str2[i])) { i += 1; } if (str1[i] == str2[i]) { // Do stuff you would want to do if the two strings match } else { // Do stuff you would want to do if the two strings don't match }
Let's break this down.
First, we get ourselves an integral index named i
since i
represents an
index in mathematics.
We might be able to get away with an unsigned short
instead of an int
, but
an optimization like that isn't going to do much for us, so we're sticking with
an int
.
Then, we enter the while
loop.
We have three conditions joined with &&
s: str1[i]
, str2[i]
, and
str1[i] == str2[i]
.
The first two conditions check if either str has reached the null terminator,
since '\0'
is 0
, 0
is false, and false && anything_else
⇒false
.
In other wordss, str1[i] && str2[1]
make sure that we're only performing an
iteration of the while loop if both strings still have characters we need to
check.
The other condition, str1[i] == str2[i]
checks if the two characters at the
same position match.
If they don't match, then we don't need to check any more characters since the
strings don't match, so we can exit the while loop.
The line i += 1
adds one to i
, effectively moving us to the next character
in both strings.
Lastly, we leave the while
loop, enter the if
statement, and check if the
characters we stopped on are equal.
If they aren't equal, we would have left the while
loop without changing i
because str1[i] == str2[i]
was false, meaning that str1[i]
and str2[i]
are
still different characters and the condition str1[i] == str2[i]
is still false
when it's evaluated in the if
statement.
If either str1[i]
or str2[i]
is '\0'
, we'll also exit the while loop, at
which point the only way the two strings are equal is if both str1[i]
and
str2[i]
are '\0'
.
You could have also done
int i = 0; while (('\0' != str1[i]) && ('\0' != str2[i]) && (str1[i] == str2[i])) { i += 1; } if (str1[i] == str2[i]) { // Do stuff you would want to do if the two strings match } else { // Do stuff you would want to do if the two strings don't match }
since while (a)
and while (0 != a)
are the same thing.
We've made quite a lot of progress thoughtout this series.
We went from coming up with a vague idea of stuff we should have in our language
to coming up with a consistent format for the file itself, creating some syntax
for comments and arithmetic, declaring and setting variables of different types,
using variables, accessing memory and memory addresses, handling arrays, and
representing strings of text.
In this article, we introduced several new operators (the Relational and
Logical operators) and a few keywords (if
, else
, while
, do
, and
for
) that will help us control the flow of the program.
Now that we have a way to test if two strings are equal, we still need to be able to read from a file and print the count to the screen. Since we're putting ourselves in the shoes of Ritchie et al, we need to come up ways for programmers to do both. We don't want programmers to copy and paste chunks of code into their program because it makes it harder for them to find their code and because it means that if we have any bugs in our code or features we want to add to our code, programmers have to go to each pasted block of code and update every block of code. There could also be an efficiency cost, but it's situation specific. We should have exactly one block of code so that we can fix or upgrade it exactly once for every feature or bug.
In the next article, we'll introduce Functions in C, which will allow us to reuse code without copying it.