This is the fifteenth article in the Making Sense of C series. In this article, we're going to finish our discussion of basic file interaction by introducing a way to print things out to the terminal and to read a file word by word using format strings.
I suggest you go back to the article on Strings in C and look at the escape characters section because we're going to use them in this article.
As you can imagine, at this point we can easily introduce the functionality to
do things like "read a line from a file" and "print characters out to a file".
For reading a line from a file, we just need to come up with a function that
takes in the FILE *
of the file we want to read, some output buffer to
store it, and the number of characters we want to read (since we need to
make sure that we can actually hold the file in memory).
It might also be useful to return a char *
that points to the output buffer if
it works, and NULL
if it doesn't.
Likewise, printing characters out to a file requires us to specify the buffer we
want to print to the file and the FILE *
of the file we want to write to.
fgets
and fputs
These functions also exist in the standard library, and they're known as fgets
(f
ile get
s
tring) and fputs
(f
ile put
s
tring).
They have the syntax
char * fgets(const char * str, int count, FILE * file_reader); int fputs(const char * str, FILE * file_writer);
fputs
returns an int
, but different implementations do different things.
In all implementations, returning a non-negative number means that your program
successfully wrote to the file.
If it can't write to the file, then fputs
will return a constant known as
EOF
for (E
nd O
f F
ile), which you can test for in an if
statement.
An example usage of a simple program that copies up to 1024 characters from the first line of a file to the end of another would look like
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | #include <stdio.h> int copy_first_line(FILE * source, FILE * dest); int main(int argc, char ** argv) { if (3 > argc) { return -1; } char * source_file_name = argv[1]; char * dest_file_name = argv[2]; FILE * source = fopen(source_file_name, "r"); FILE * dest = fopen(dest_file_name, "a"); int errno = copy_first_line(source, dest); fclose(source); fclose(dest); return errno; } int copy_first_line(FILE * source, FILE * dest) { if (NULL == source || NULL == dest) { return -1; } const int buff_sz = 1024; char buffer[buff_sz]; if (NULL == fgets(buffer, buff_sz, source)) { return -1; } if (EOF == fputs(buffer, dest)) { return -1; } return 0; } |
I put the code for actually copying from one file to the other file in its own
function so that I don't have to close the files if I get an error in opening,
reading, or writing to a file.
I could probably make the code above a little cleaner by moving the code from
lines 13
to 19
into their own function and returning errno
from the new
function, but it would be diminishing returns.
At least to me, main
should handle getting user input from the command line
and calling the functions that drive the program and nothing else, though you
can make a lot of exceptions.
Since we haven't yet covered how we can implement error messages, I just close
the files and exit the program with a -1
to indicate something going wrong.
Later, we can use the return value from main
to figure out what went wrong.
There are some specific error numbers reserved in C
that we can use in the
standard library, but we aren't going to worry about them for now.
In Unix (and later Linux and Mac OS), everything is a file (descriptor),
though most people leave off the (descriptor) part.
In particular, this means that things like printers, the terminal, network
connections, etc. can be represented with FILE *
objects.
Up to this point, I've been using names like file_reader
and file_writer
for
the FILE *
parameters in the standard library functions, but the documentation for C uses the
more general parameter name of stream
char * fgets(const char * str, int count, FILE * stream); int fputs(const char * str, FILE * stream);
since you could be taking in a file as an input or a network connection or
anything else (there are better functions to use than these, though).
Since the terminal is also a FILE *
object, we should be able to pass it into
our file manipulation functions.
In C
, we have three preexisting FILE *
objects:
stdout
for printing stuff to the terminal,
stderr
for printing errors out to the terminal,
stdin
for getting user input from the terminal.
You don't need to open or close them since the operating system itself will take care of them for you.
As an example, I modified the main
function so that it prompts you for the
text that you want to add to the file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | int main(int argc, char ** argv) { if (2 > argc) { // \n is the newline character fputs("Not enough arguments provided.\n", stderr); // \t is a tab character fputs("usage:\t", stderr): fputs(argv[0], stderr); fputs(" output_file\n", stderr); return -1; } char * dest_file_name = argv[1]; FILE * dest = fopen(dest_file_name, "a"); // \" is the double quote character and we need to use it because just // a " would end the string. fputs("What line do you want to add to \"", stdout); fputs(dest_file_name, stdout); fputs("\"?\n", stdout); int errno = copy_first_line(stdin, dest); fclose(dest); return errno; } |
First, now that I can print things out to the terminal (specifically stderr
),
I added a few lines above to tell the user that he or she provided too few
arguments to the program and print out the proper usage (printing out the usage
is standard if the command line input isn't formatted correctly).
Also, since we only expect to have two arguments (the name of the program and
the file we're adding to), we changed the first highlighted line to check if
there are at least two arguments.
We're no longer using a source
file, I just removed all the lines involving
it.
I also replaced source
with stdin
in the call to copy_first_line
.
Then, I added a few lines to print out a line to stdout
asking the user what
line he or she wanted to add to the file.
stdin
Using stdin
will pause your program and allow the user to type something into
the terminal.
The program will remain paused until the user hits Enter
.
Until then, it's just like typing something into a form or a login screen where
you can hit backspace to remove characters.
In IDEs that close the terminal window immediately after the program finishes,
some users will add a read from stdin
to pause the program so they can see the
output.
If you look back at the example program, you'll notice that I had to break one
line in the output into multiple calls to fputs
because I needed to tell it to
print out everything before the name of the file, then the name of the file,
then everything after the name of the file.
It would be nice to be able to write one line of code to print out the message,
so we're going to invent something called a format string.
A format string is a sequence of char
s (just like a regular string)
that the computer will read as instructions on what to print out.
For example, we want to print out "What line do you want to add to
"dest_file_name
"?".
We should be able to have something like an escape character in our format
strings that tells the computer to stop printing out the characters in the
string, get something different to print out, then continue printing out the
rest of the characters in the string.
We've already reserved the '\'
character for the normal escape character for
things like newlines and tabs, so we need something else.
For C
, Ritchie decided to use the %
sign again, probably because it has only
been used up to this point for the remainder operation and everything else on a
keyboard (letters, numbers, punctuation, etc.) is already in use.
In our case, we want to tell the program to print out the normal series of
characters, then print out a different string, then print out the rest of the
characters.
Since we're printing out a string, let's use %s
to tell the computer to print
out a string.
Our format string should then use
"What line do you want to add to \"%s\"?"
We'll also need to tell the computer which information to print out.
printf
and fprintf
We'll make a new function called printf
(print
f
ormat string)
int printf(const char * format_string, ...);
which will print data out to the terminal.
Variadic Functions are functions that can have a variable number of
arguments.
You can say that the first few arguments needs to have a specific format, such
as in printf
having a const char * format_string
as its first argument, then
say the rest of the arguments can be whatever.
In our case, whenever printf
hits one of our escape sequences (%s
), it needs
to know what it should print out.
After the format_string
argument, the rest of the arguments are what it should
print out in order.
For example:
printf("%s %s %s %s.", "This", "is", "a", "test");
will print out
This is a test.
As a more realisitic example, let's say that we want to print out a word and how
many times it shows up in a text file.
In this case, for each word, we would want to print out the word, a space or a
colon, and then the number of times it shows up in the text.
Using a format string and assuming that word
is a char *
and count
is an
unsigned int
(we can't have a negative count), we can use
printf("%s: %u\n", word, count); // %u is for an unsigned int
which tells the computer to print word
, then a colon, then a space, then print
count
, then print a newline.
In our example program, we can replace our three fputs
lines with
printf("What line do you want to add to \"%s\"?\n", dest_file_name);
If, instead, we want to print to a file, we can use fprintf
, which has the
syntax
int fprintf(FILE * stream, const char * format, ...);
where the only difference between fprintf
and printf
is that you have to
specify the FILE *
first for fprintf
while printf
prints to stdout
.
You could replace every printf
in your program with an fprintf
with its
first argument being stdout
and see no difference in your program.
For example, we could have written our printf
above using fprintf
using
fprintf(stdout, "What line do you want to add to \"%s\"?\n", dest_file_name);
Later, we'll extend the format strings to include things like restricting the number of characters we read.
stdio.h
FunctionsAs you can see by the printf
vs fprintf
example, above, a lot of the
standard library functions will do similar things but with slightly different
arguments.
We're going to go through most of them.
In C
, we have four base operations for I/O:
put
, which prints out its argument,
get
, which takes in its argument and stores it in a variable,
printf
, which prints out its arguments as dictated by a format string,
scanf
, which takes in input and stores it in a variable as dictated
by a format string.
By default, the four base functions interact with the terminal.
To change the source for get
and scanf
and to change the destination for
put
and printf
, we add a letter or two to the front of the name.
f
added to the front of any of the base functions changes the input
source or output destination to a FILE *
, which is generally specified as the
first argument.
s
will change the input source to a const char *
and the output
destination to a char *
, which is generally specified as the first argument.
sn
is the same as s
, except it has another argument specifying the
number of characters it can read.
sn
only applies to later versions of C
and it only applies to printf
.
Note that sget
would essentially just copy data that's already in your
program, so there are no sget
functions.
I'm also skipping the v
prefix since it's a more advanced feature that I have
never used.
We're done with the two format string functions, printf
and scanf
, so let's
show you the syntax for all the derived functions:
// Write to file int fprintf(FILE * stream, const char * format, ...); // Read from file int fscanf(FILE * stream, const char * format, ...); // Print to terminal int printf(const char * format, ...); // Read from terminal int scanf(const char * format, ...); // Write to an array of characters int snprintf(char * string, size_t n, const char * format, ...); int sprintf(char * string, const char * format, ...); // Read from an array of characters int sscanf(const char * string, const char * format, ...);
Since get
and put
don't have a format specifier but we would like to print
different things and we've already added stuff to the beginning of the word for
the input, we need to create new functions with stuff added to the end of the
name.
s
will tell the computer to expect a string as the input or output.
c
will read a character from or write a character to a FILE *
.
char
will read a character from or write a character to stdin
, a.k.a.
the terminal.
As a quick disclaimer, I don't use the get
and put
functions that often
since I can do everything with format strings and I don't need to remember as
much.
I won't go into the syntax here, but you can find the syntax for these functions
on the c++ website
(C++
contains all the standard library files of C
and some more for its
standard library, so there isn't as much of a reason to make another website.).
If you remember back in the article on Memory
Addresses in C, I had a warning about how by allowing you to directly
interact with memory, C
introduces several security vulnerabilities that
mainly consist of accessing memory outside of a buffer.
I ended the warning with:
As of right now, we neither have the capability to allow or prevent a malicious user from accessing memory outside of a buffer, so we'll save that for a later article.
Now, we have ways for users to change how our program behaves, so we need to
prevent mailicious users from accessing memory outside of a buffer, which we can
do with format string functions, fgets
, etc. since we can specify the maximum
number of characters we want to read.
We can't, however, prevent malicious users from exploiting gets
, which reads
input from stdio
.
gets
, however, doesn't specify the number of characters the user can input,
meaning that a malicious user could use it to exploit our code and do something
like launch
a Denial of Service campaign that shut down the internet for a few days.
The "Morris Worm", named after Robert Morris, was a white hat hacking attempt
intended to highlight several security vulnerabilities in commonly used
programs, including a buffer overflow based on gets
.
For this reason, gets
was deprecated before the first official C
standard
was released and removed entirely from the C
standard library in a later
release.
Even then, if you somehow get a working program, your compiler will actually
contain another warning saying that gets
is dangerous and should not be used.
Reading from streams is similar to writing from streams except that it's
generally easier to write to a stream than read from a stream.
For example, let's say that you want the program to print out the ID number of
the student when they log in.
In that case, assuming that the student ID is a base 10 number (has the digits
zero to nine) and the value is stored in an unsigned int
, then you can print
it out using
printf("Student ID: %u\n", student_id);
Pretty straightforward, right?
Likewise, if you want to print to a file, you can use fprintf
and specify the
stream as the first argument.
On the other hand, let's say that you want to read a student ID number from a
file.
In that case, you could use
scanf("Enter your student ID: %u", &student_id);
right?
If we were to use that format string, a user with ID number 1234567 would have
to type "Enter your student ID: 1234567", not just "1234567".
If you type anything else, scanf
will just return 0
since it could fill zero
arguments.
Instead, you should use something like
printf("Enter your student ID: "); while (1 != scanf("%u", &student_id) { printf("Enter your student ID: "); }
Except that won't actually work because you need to use scanf
again to clean
up stdin
because scanf
won't remove anything it can't convert, so whatever
you type in will still be in stdin
waiting to be read.
Your computer will essentially go through this process:
"Enter your student ID: "
.
while
loop
The correct way to do it is to clear out standard input if the input is invalid.
printf("Enter your student ID: "); while (1 != scanf("%u", &student_id) { scanf("%*[^\n]"); // Remove anything in stdin that isn't a newline. printf("Enter your student ID: "); }
In short, scanf
can only read characters in the exact format you give
it.
I tried to write the word counter with scanf
, and it didn't really work, so
we're going to write our own simple parsing function.
Later, we're going to try to incorporate an external library that will take care
of most of this functionality for us.
If you want an in
depth criticism of the scanf
functions, read the linked article.
Now, we've introduced the most common I/O functions and their uses. To recap, I'm going to describe them and their use cases.
printf
when you want to print something out from the terminal.
fprintf
when you want to print something to a file.
fgets
when you want to read from a file or get user input.
puts
and fputs
when you want to print a plain string (i.e. no format
strings or anything).
Treat them like shorthand for printf("%s\n", string)
and fprintf("%s",
string)
.
These are the main functions that you will use for file I/O or terminal
interaction.
While you can certainly use other functions (besides gets
because it's an
abomination), I mainly use these functions in the order shown above.
After this article, we now have all the tools we need to write the first of our goal programs: the word counter.