This is the sixth article in the Making Sense of C series.
In this article, we're going to discuss the types built into the
We'll focus mainly on the numerical applications of these types and how they
interact with each other.
I apologize for the type punning in the tagline above. Credit to Jake Doerfler for showing me that pun.
So far, we've determined that we're going to give the compiler a file we want it
to read, we've come up with a few basic rules about the format of the file (e.g.
ending statements with semicolons), we've come up with a way to tell the
compiler to ignore our comments within the code, we've established that we can
do arithmetic, and we've set up a system for declaring variables, but we've only
specified one type, an
We've also never discussed how we can use floating point types in
C, nor what
we should do when different types interact.
If you haven't read the article on managing memory, you should read it to understand why we have multiple types to represent whole numbers. If you want to know more about how your computer represents integers, read the article on integer representation. Lastly, if you want to know more about how your computer represents floating point numbers, read the article on representing floating point numbers.
So, we need to have a few different integral types because we want people to use memory effectively. If they want big numbers, we can let them use big numbers. If they don't need to use big numbers, we can let them use numbers that take up less memory to help them write efficient programs. If they're using computers with little memory (such as embedded computers), then they'll be able to save space. We should also let them declare signed and unsigned versions of the types since sometimes they won't need negative numbers, and doing so will allow them to use twice as many positive numbers.
On most non-embedded computers, you'll find the following setup:
|Integral Types in C|
I'm just going to apologize if you're reading on a phone, as I don't have a good
way to display the table without breaking apart the numbers in an unnatural way.
unsigned char can represent numbers up to 255, an
short can represent numbers up to 64 000, an
unsigned int can represent
numbers up to four billion, and an
unsigned long long can represent numbers up
to eighteen billion billion.
Signed types can represent numbers half as large as their corresponding unsigned
If you declare a variable with just the type, it will be signed.
If you declare a variable with
unsigned in front of the type, it will be
int a; // a is a signed int that can go from -2 147 483 648 to // 2 147 483 647 unsigned int b; // b is an unsigned int that can go from 0 to 4 294 967 295
You can also put
signed in front of a type to make it signed, but it doesn't
add any information since integral types are signed by default.
long long are technically
short int and
long long int,
but adding the
int is unnecessary.
int represents numbers in the normal range for most calculations and a
short represents a smaller range, so both
short make sense, but
why is the eight bit type a
char and why is the 64 bit type a
What happened to the
Considering that you can't even store the result of small calculations like
12 * 12 in eight bits, why even have an eight bit type?
While you can still do arithmetic with
chars, they are mostly used to
represent characters in text.
For example, the letter 'A' has the value of
65, the letter 'B' has the value
Since ASCII only uses 128 characters, there's no reason to use more than a byte
of memory for a character.
C actually has a
long type, but I didn't mention it because the length in
bits of the
long types vary from computer to computer.
In the case of the lengths of integer types, the
C standard only specifies a
For example, an
int must be at least
16 bits long and a
long must be at
32 bits long.
On some computers, an
int is only
16 bits instead of
32 bits, which means
you could see bugs if your program relies on an
It's not as much of a problem as you would expect since most compilers use the
32 bits for an
int and those that don't will make it explicit that
long long, however, is guaranteed to be at least
64 bits long, but most
compilers don't go beyond
64 bits, so a
long long is effectively guaranteed
Now that we've set up the integral types, let's move onto the floating point types. Because of how floating point numbers work, it won't make sense to have a floating point number be one or two bytes. Since we do powers of two, let's make floating point types with four bytes and eight bytes.
C, we have two standard floating point types:
|Floating Point Types in C|
|Type||Size in Bits||Range||Precision
Both floating point types represent numbers in signed scientific notation, which
is why the ranges for both are in scientific notation.
float can represent numbers up to a hundred billion billion billion billion
double can represent pretty much any number you'll encounter in
Both however, cannot represent large numbers exactly,
so you shouldn't use either if you're doing anything that requires an exact
amount, like counting money or doing anything with number theory.
You should use
doubles when you don't need to be exact (physics
and graphics calculations in video games, sciences where you use scientific
Tom Scott did a video on
floating point numbers that you should watch for just some general warnings
and use cases for floating point numbers.
Unlike with the integral types, these floating point types are standardized,
meaning you'll never see a 64 bit
float or a 32 bit
Since we're dealing with such big numbers, we should be able to initialize numbers with scientific notation, so let's add that to the language.
We'll use the syntax
[decimal]e[exponent] for scientific notation literals,
double a = 7.03e5; // a now contains 703 000, as the 'e' can be read // as "times ten to the" float b = a;
We've established how arithmetic is going to work for numbers of the same type, but what happens if we try to perform arithmetic with numbers of different types?
In general, we should avoid making different types interact with each
Like different tools in a toolbox, each type has specific scenarios when you
should use it.
Just as it doesn't make sense to unscrew a screw with a hammer, it doesn't make
sense to use an
int to represent a probability.
If multiple types are interacting, it's likely that at least one of them is
being used for the wrong job.
To be clear, there are specific use cases where you need to use multiple types, such as converting money (which should be stored in an integral type) between countries using an exchange rate (which should be stored in a floating-point type).
So that you have no surprises when you do have to have different types interact, I will discuss the type interactions in the rest of the section.
To figure out what to do, we should try to follow the principle
of least surprise, which essentially states that anything that a user (in
our case, a
C programmer) can interact with should work exactly as the user
As a simple example, the plus sign should do some kind of addition, but never
Just because we represent a number with fewer bytes doesn't mean it should stop
acting like a number.
The sum of
7000000000224 should be
7000000000256 if possible, but
the only way to have that happen is to temporarily convert the
short into a
Since temporarily converting a
short to a
long long is trivial, let's just
temporarily convert the
short into a
long long, then do the addition.
When you do arithmetic with two numbers of different types, the number of the smaller type will be converted to the larger type, then the two numbers are added just as if they were of the same type. For example:
long long a = 1000000000000; // one trillion short b = 3000; long long c = a + b; // c now contains one trillion three // thousand
In fact, when you perform any arithmetic operation with numbers from two different types, the smaller type will get converted to the larger type before the operation gets applied.
Just like adding a
short to a
long long, the smaller type (
float) will be
converted to the larger type (
double), then added together like two doubles.
Just like adding a
short to a
long long, the smaller type (the non-floating
point type) will be converted into the larger type (the floating point type),
then the two types will be added together like two floats.
int a = 40; float b = 45.6; float c = a + b; // a becomes 40.0 and is added to 45.6 to get 85.6, which is // then stored in c
Since the larger type can completely represent the shorter type, it gets stored without a problem. This applies to all types, but there can be a loss of precision in converting from an integral type to a floating point type.
Because we floating point numbers trade precision for range, there can be a loss
of precision in converting from an integral type to a floating point type, so
you will lose some precision of the number if the number is larger than
224 if you want to store it in a
253 if you want to store it in a
int a = 2147483611; float b = 45.6; float c = a + b; // c does not contain 2147483656.6, but 2.147483648e9 // because a became 2.147483648e9 in the addition (a is // still 2147483611, it's just being temporarily // converted to a floating point number for the // addition) and b was too insignificant to change it, // just like 2e100 + 1 is still 2e100
In the last example, although it looks like
2.147483648e9 has 10 sigfigs, it
only has 8 accurate sigfigs.
But if it only has 8 sigfigs, then why wasn't it
Why did it include the
48 at the end?
Remember that the computer works in binary, not base ten.
When I said that
2.147483648e9 in the addition, I was giving you
the number in a correct, but misleading format.
To the computer,
a was actually first converted to
1.111 1111 1111 1111 1111
1111 1011 011e0001 1110, which corresponds to 231 -
Since the computer can only represent the first twenty four digits as a
it will round to the nearest representable number it can, which is
0000 0000 0000 0000e0001 1111 or 231.
Notice that the computer version has 24 significant bits with all the zeros.
In other words, it has the proper number of significant figures in base
two, not in base ten.
b were instead a
double, then we would convert
a to a
actually has the required precision to represent
a, meaning you would get the
correct answer of
2147483656.599998, where the remaining
0.000002 comes from
the fact that
0.6 can't be represented in a finite number of bits, just as we
have to represent
0.666...6667 in a finite number of digits.
For more details, check out Tom Scott's video on floating
point numbers, which I referred to earlier.
long longin a
First, if you try to directly set a type with a value too large for it to
handle, the compiler will print out some warning when you try to compile your
program but it won't throw an error.
If you set
a equal to some expression it can't figure out at compile time,
then it might not print a warning, so you should be on your guard.
short a = 1000000000; // WARNING: overflow in implicit constant conversion long long b = 1000000000; a = 5 * b; // The compiler might not print a warning for this // line, even though it will overflow.
In general, the compiler will chop off any digits it can't fit into the assigned
In the example above,
1000000000 in base ten is
0011 1011 1001 1010 1100 1010
0000 0000 in binary, but since a short can only hold
16 bits, it will only
take the rightmost sixteen bits and treat the rest as overflow, meaning that you
would be setting
0011 1011 1001 1010 1100 1010 0000
1100 1010 0000 0000, which is
You could have replaced the highlighted line with
short a = -13824; and the
results of the program would be exactly the same.
C was made way back when people were building computers using
nonstandard methods like one's compliment and sign-magnitude representation for
integers and computers with different sizes for
int and short,
C wanted to
accomodate all the different behaviors of computers, so some things are left
undefined in the
C standard and it's up to the people building the hardware
what to do in these cases.
For example, nothing in the
C standard states that you have to use two's
compliment to represent integral types, which means overflow for signed integral
types has to be undefined.
All three representations of binary agree that
0111 1111 1111 1111 1111 1111
1111 1111 represents the largest positive signed 32 bit number.
Adding one to that number, however, leads to three different results.
1000 0000 0000 0000 0000 0000 0000 0000 represents
-2 147 483 648 in two's
-0 in sign-magnitude, and
-2 147 483 647 in one's compliment.
C standard actually specifies that both a
short and an
have to represent the numbers from
-32 767 to
32 767 because one's
compliment and sign-magnitude can't represent
-32 768 in sixteen bits and
some computers represent
ints with sixteen bits instead of thirty two bits.
Once again, most computers follow the layout I specified above, with the notable
exception of embedded computers.
The trick to avoiding this whole mess is just to avoid undefined behavior in the
Don't let numbers overflow and don't rely on needing to use
-2 147 483 648.
If the value of the
int is less than 224, then it can be
stored in the
float completely, otherwise, it can be rounded to the nearest
float that can be represented.
You already saw an example above in the Loss of
Simply put, if the value of the
double can fit into the
float, then it gets
stored without a problem.
double has more precision than the
float can store, the extra
precision is thrown away.
If the magnitude of the value of the
double is greater than a
store, it just stores
±inf (which represents infinity) in the
double can_fit_in_float = 1.125; float example = can_fit_in_float; // example is now 1.125 double too_precise = 1.0000000009e7; example = too_precise; // example is now 1e7, where the // 0.0000000009e7 is dropped double too_large = 1e100; // too_large is now the power of two nearest // to 1e100 because of how double precision // works example = too_large; // example is now inf, which represents // infinity, because 1e100 is too large for // a float to represent double set_to_infinity = example; // set_to_infinity is now inf because // example is inf set_to_infinity = too_large; // set_to_infinity is now the power of two // nearest to 1e100 since too_large can fit // into set_to_infinity
If the value of the
double can fit into an
int, then the
converted into an
int, which means it just removes any fractional part.
double a = 1.5; int b = a; // b is now 1 a *= -1; // a is now -1.5 b = a; // b is now -1, since the fractional part gets removed
If the value of the
double can't fit into an
int, then you'll probably get
some garbage value, so don't try to store a
double in an
int if you know
it won't fit.
To force a conversion from a type to another type in an expression (you cannot change the type of any variable), you can use a type cast, which consists of putting the resulting type in front of the variable or expression. For example, if you want to exchange 100 USD for some yen, you would multiply 10000 cents (since the cent is the smallest unit of US currency) by the cents to yen conversion rate. Both cents and yen are discrete quantities, so it wouldn't make sense for your resulting money to be floating point numbers. On the other hand, the cents to yen conversion rate is a floating point number, meaning that to convert from cents to yen requires multiplying an integral number by a floating point number, which results in a floating point number, which you have to convert back into an integral number, either implicitly by doing anything involving multiple types as specified above or explicitly by using a type cast.
Since the only thing the compiler needs to know to convert one type to another
is what you're converting and the resultant type, it uses the syntax
In our example:
// cents_to_yen has been intialized as a double // cents has been initialized as a long long // Implicit conversion by storing long long result = cents * cents_to_yen; // Explicit conversion using type casting (long long)(cents * cents_to_yen)
Type casts will show up from time to time, so just know that you can convert types to other types.
Up to this point, we've established that we're going to give our compiler a file
with a series of statements, but we've only used the plain
In this article, we explained what
ints are doing under the hood and we
introduced several new types that function like
We also introduced the concept of signed and unsigned integers.
Our file should now look like
int a = 7; unsigned long long b = 1000000000000; b *= a; // Multiplying an unsigned long long and an // int works exactly as expected. // b is now seven trillion other statement; // We still can't do anything more than // arithmetic. short c = 32 * a; // Since 32 * 7 is less than the maximum // value for a short, we're good even though // a is an int. // c is now 224. double d = b * 1e7; // 1e7 is 10000000. // d now contains 70000000000000000000. b += c; // Adding a short to an unsigned long long // works as expected. // b is now 7000000000224 d /= b; // d now contains 9999999.999680 b *= 0.5; // b is now 3500000000112
Remember that our goal for now is to count the number of times a word shows up in some particular text. We've set up counting, so now our next move should be to handle and manipulate text, which will be the topic of the next article. Since text manipulation will be too complicated to fully explain in a single article, we will first come up with an abstract, kind of hazy system for representing text in the next article, then we will try to come up with features in the language that will allow us to implement the system we came up with. We might also need to modify our system when future constraints come up.
See you in the next article: Representing Text in C.