Getting close to the metal in this article.

*In this article, we're going to discuss how we can represent integers in
binary.
*

**Using a Limited Number of Digits****Signed vs Unsigned****Binary Representation****Overflow**

I'm going to describe a scenario that's weird to humans but that your computer actually has to do all the time.

Let's say that you have a 200 page book with exactly 50 000 words, and I ask you
to tell me what the 37 420^{th} word is.
To make it easier for you, the author of the book wrote the number of words on
each page at the bottom of the page, since the number of words can vary from
page to page.
To determine the actual word, you would have to keep a running total of the
number of words you've seen so far, stop as soon as your total goes above 37
420, and then count words from the beginning of the page until you reach the
word.

Now, let me make a small change.
Let's say you have a 200 page book with exactly 50 000 words **and exactly 250
words on every page**, and I ask you to tell me what the 37 420^{th}
word is.
Well, the first page is going to have 250 words on it, the first two pages will
have 500 words, the first three pages will have 750 words, etc., so the 37
420^{th} word will be the `(37 420 % 250) + 1`

⇒`171`

^{th}
word on page `(37 420 / 250) + 1`

⇒`150`

.
Now, instead of having to go through more than a hundred pages and keep a total
of the number of words so far, you can now just flip to page 150 and find the
171^{th} word.

As you can see, when the pages have a fixed number of words on them, finding an arbitrary word given an index is quick and easy. Types in a lot of high performance languages will have a fixed number of bytes for the same exact reason. If we don't, the computer either won't know when the number ends or will have to waste time trying to figure out when the number ends.

By making the types have a fixed number of bytes, we limit the number of
possible values we can represent.
For example, if we have a type one byte long, then that type can only represent
*2 ^{8}=256* different values.

We have to make a distinction between signed and unsigned integers before we go
any further.
**Signed integers** can represent **zero, positive numbers, and negative
numbers** and **unsigned integers** can only represent **zero and positive
numbers**.

If you want to count the number of people who visited your webpage, you only
want positive numbers since you can't get a negative number of views.
If you have a rating system that allows people to upvote and downvote posts,
then you want both positive and negative numbers for the net upvotes.
If you just want negative numbers, add a '`-`

' in front of an unsigned number
when you write your results.

Since we want people to use the right type for the job, we'll allow people to
use signed and unsigned versions of the `int`

.
We'll let `int`

represent signed integers and `unsigned int`

represent unsigned
integers.

As you already know, computers don't understand anything except `on`

and `off`

,
which we generally abbrieviate with `1`

and `0`

.
For us to represent something any numeric types, we have to work with this
constraint.
We should also be able to handle arithmetic as efficently as possible, so we
should pick a good representation.
To see why picking a good representation for you number system is important, see
how long you can multiply *MMMCDLXXXIII* and *MMCMXVIII* without
giving up.

To start out, we're just going to say that we can use eight bits (a.k.a. a byte) to represent a number, which means

- We can only represent
*2*different numbers since we only have eight bits, just as we could only represent^{8}=256*10*different numbers if we have four digits.^{4}=10 000 - Because we want to represent as many different numbers as possible, each number should only have one binary representation.
- Because we want to be consistent, there should be exactly one base ten number for each binary representation
- If
`a • b`

⇒`c`

in base ten (where`•`

is an arbitrary arithmetic operation), the binary representations of`a`

,`b`

, and`c`

should also satisfy`a • b`

⇒`c`

. - In our representation
`a • b`

should be quick to calculate, where`•`

is any arithmetic operation. - Determining if
`a > b`

should be quick.

Since we have 256 possible numbers and we want to represent all the positive
numbers we can and zero, we can just let the binary representation of a number
be the number in base two and the resulting binary representations will satisfy
everything an unsigned type should satisfy, meaning we'll represent the numbers
from `0`

to `255`

.
Converting from base ten to binary is trivial, so I'm not going to rehash it
here.
If you need a more in-depth explanation, check this
article.

Since the algorithms for arithmetic in binary are just like the algorithms in base ten, we can just use the normal algorithms for arithmetic.

Since we can represent all the numbers from `0`

to `255`

, we should be able to
represent the result of any arithmetic operation so long as the result is
between `0`

and `255`

.
For example:

```
```
Binary
Base 10
111
1
0010 0101
37
+ 0001
0011
+ 19
0011 1000
56

We will have a problem, however, if the result is greater than `255`

or less
than `0`

.
For example, let's calculate `255 + 1`

.

```
```
1 1111 111
1111 1111
+ 0000 0001
~~1~~ 0000 0000

Since we only have eight bits we just drop the ninth bit, meaning ```
255 +
1
```

⇒`0`

.
Since `a + b`

⇒`c`

means `c - b`

⇒`a`

by the basic rules of algebra,
`255 + 1`

⇒`0`

means `0 - 1`

⇒`255`

.
In general, **if you go above the highest number we can represent, you'll loop
back around to the lowest number we can represent**, and, **if you go below
the lowest number we can represent, you'll loop back around to the highest
number we can represent**.

With signed Types, we also need to satisfy a few other constraints, such as:

- For every number, we should be able to represent its negative.
`-a + a`

⇒`0`

, since that's how you define the negative (a.k.a. additive inverse) of a number without changing how the computer adds numbers based on the sign.- Determining if a number is negative or positive should be quick.

Before we can come up with a representation for something, we need to establish what we're representing. In our case, we need to figure out what numbers we want to represent.

We want `0`

since `0`

is pretty useful, so let's include it.
Since `0`

is its own inverse and we only want one binary representation for each
number, we don't need to include a "negative zero".
We'll also want `1`

, so we'll include `1`

.
Since we want to be able to represent the negative of every number we add to the
list, we'll also have to include `-1`

.
Let's also add `2`

and `-2`

into the list, `3`

and `-3`

to the list, and so on
until we run out of numbers.

After adding `127`

and `-127`

, we have a problem: we only have one more binary
representation available.
If we added `128`

and `-128`

, we would have to represent 257 different numbers,
which we can't do since we've run out of space.
Since `0`

is its own inverse, we don't want to include it twice, and we can only
have *2 ^{n}* binary representations, we will never be able to
completely represent a consecutive series of numbers centered around

`0`

.
Since it's only one number, though, we won't have to worry.
Right now, since we don't really know what to do with the last number, we're
just going to say that we definitely want to represent every integer from `-127`

to `127`

.
We'll determine the last number we want to represent when we figure out how
we're going to represent our numbers.

As I showed earlier when I asked you to multiply two numbers in Roman numerals,
choosing a good representation for numbers will help you do math quickly and
effectively.
Ideally, we would like to use the same exact algorithms for arithmetic
operations for signed and unsigned integers so that we don't have to have
separate hardware for signed and unsigned operations.
Since using the number in base two works for all the nonnegative numbers, let's
keep that system for all the nonnegative numbers, meaning `0000 0000`

is `0`

,
`0000 0001`

is `1`

, `0000 0010`

is `2`

, ..., and `0111 1111`

is `127`

.

We have a problem, though.
While representing something like `81`

as `0101 0001`

is all fine and good, how
do we represent `-81`

?

Remember, taking the negative of a number `a`

produces another number `-a`

such
that `-a + a`

⇒`0`

.
To figure out what `-a`

is, we'll use the equivalent statement ```
0 -
a
```

⇒`-a`

.
First, we'll figure out the representation for `-1`

.
Since we want to use the same exact algorithms for arithmetic, we'll subtract
`1`

from `0`

just like we normally would.

Doing the subtraction yields

```
```
^{1}~~0~~^{1}~~0~~^{1}~~0~~^{1}~~0~~ ^{1}~~0~~^{1}~~0~~^{1}~~0~~^{1}~~0~~
-^{ }0^{ }0^{ }0^{ }0 ^{ }0^{ }0^{ }0^{ }1
~~1~~^{ }1^{ }1^{ }1^{ }1 ^{ }1^{ }1^{ }1^{ }1

where we drop the leftmost `1`

from the result since we can't store it anywhere
and doing anything else requires us to modify our addition algorithm.

If we add the binary representations for `-1`

and `1`

, we'll get

, which overflows to ~~1~~ 0000
0000`0`

, which is exactly the behavior we're looking for.
We now have two ways of figuring out the rest of the negative numbers: subtract
a positive number from `0`

or multiplying by `-1`

.
Once we do the calculations, we'll find that `-2`

is `1111 1110`

, `-3`

is ```
1111
1101
```

, ..., and `-127`

is `1000 0001`

.

We could also express `-a`

as `-1 - a + 1`

, which may be easier to understand.
For example to calculate `-81`

:

```
```
Binary
Base 10
1111 1111
-1
- 0101 0001
- 81
1010 1110
-82
+ 0000 0001
+ 1
1010 1111
-81

The new `-1 - a + 1`

process we came up with is equivalent to **flipping the
bits (replacing zeros with ones and ones with zeros) and adding one**, which
is how people normally describe taking the negative of a number in this system.

We've set up a system called two's complement, and it's the standard on the overwhelming majority on most computers since IBM released the System/360 in 1964. The only other options at the time were "one's compliment", in which you just flipped the bits to get the negative of a number, and the "sign-magnitude" format, in which you use the leftmost bit to represent the sign and you use the other bits to represent the absolute value.

One's compliment suffered from several problems, but the worst was `0`

would
have two representations: the standard `0000 0000`

and `1111 1111`

.
Since zero could be coded using either representation, you would have to check
both representations whenever you wanted to make a comparison with zero.

The sign-magnitude format also has two representations for `0`

(`0000 0000`

and
`1000 0000`

), but checking if a number is zero is pretty easy since you can just
ignore the leftmost bit entirely.
Other than checking for zero, however, the sign-magnitude format requires you to
turn addition into subtraction and vice versa and switch the order of the
numbers you're adding or subtracting depending on the sign.

- Every number has one binary representation and every binary representation represents one number.
- Adding signed and unsigned numbers works in exactly the same way.
- Negative numbers work exactly as they should.
- All negative numbers will have a
`1`

as the most significant (leftmost) bit and all non-negative numbers will have a`0`

as the most significant bit. - Determining whether a number is greater than or less than a number is easy, since you can just subtract the two numbers and look at the most significant bit (a >.

We've established that we're going to use two's complement to represent our
integers and that we are definitely going to represent the integers from `-127`

to `127`

, but we're still missing one number.
We can't have a representation we don't know how to interpret, or else our
program could fail.

To figure out how to deal with the missing number, we're going to first find out which binary representation hasn't been used yet and then we're going to use the properties of the binary representation to figure out what it should be.

We could do this by going through all the numbers from `0`

to `127`

, finding
their negatives, but that would take too long.
Plus, it's no fun.
Since we can flip all the bits and add one to any binary representation, we know
that we can negate the binary representation of the missing number.
The missing number, however, can't have any of the numbers from `-127`

to `127`

as its negative since each of them already have a negative and the negative of a
negative is the original number.

Putting all this together means that when we negate the missing number, we have
to end up with the number itself.
Since the number is its own negative, adding it to itself must equal zero, and
since adding the number to itself is equivalent to multiplying by 2 (`10`

in
binary) and multiplying by 2 will be easier in this case, we're going to
multiply by two.
Let's represent the missing number by `abcd efgh`

, where each letter represents
an unknown digit.

```
```
abcd efgh
x 10
0
a bcde fgh0
~~a~~ bcde fgh0

For `abcd efgh`

to be its own inverse `bcde fgh0`

must equal `0000 0000`

,
meaning the missing number must be `a000 0000`

.
Since `a`

can only be a `0`

or a `1`

and `0000 0000`

is zero, `1000 0000`

must
be the representation for the missing number.

`1000 0000`

Represent?The missing number is kind of weird since it's its own negative and not zero.
If we subtract one, we will end up with `127`

, so maybe it should be `127`

.
If we add one to it, however, we will end up with `1000 0001`

, which is `-127`

(to see it for yourself, negate `1000 0001`

and convert from binary to base 10).
Our choices are either the number one more than `127`

or the number one less
than `-127`

, which are `128`

and `-128`

.

Note that the most significant bit (leftmost digit) is a `1`

.
If we were to make it positive, it would be the only positive number that starts
with a `1`

.
As a general principle, every special case you have to check will make your
program more inefficient, and making `1000 0000`

positive would be a special
case, so let's just make it negative.

Since we've decided that our missing number will be negative, `1000 0000`

will
represent `-128`

.

The system we set up works identically with more bits.
If we want unsigned numbers, we use the number in base two as the number's
binary representation, and we can represent numbers from *0* to
*2 ^{n}* with

`0000 ... 0000`

, `0000 ... 0001`

, `1111 ... 1111`

, `0111 ... 1111`

, and
`1000 ... 0000`

.
We explained how computers represent integer numbers using a system called two's complement which will allow us to represent integers of any size.

If you're coming from the *Making Sense of C* series, the next article in
the series is Types in C, but
you should read The Memory
Hierarchy before reading about types in `C`

since it will provide use cases
for the different types.