You can't go very far in computer science and programming without encountering binary and hexadecimal. Binary is important to understand because it's literally what computers are based on. Don't let your eyes glaze over when you see hexadecimal. It's easy to read and kinda useful.
Bits and bytes
A binary choice has only two options. Binary numbers are the same. There are only two possible states. A single binary value is known as a bit. It can only be 0 or 1. The equivalent in the decimal system is a digit, which can be one of ten values: 0-9. Computers use binary because it's easier to implement in circuitry.
Having only two possible values doesn't seem all that great. How can we express a value greater than decimal 2 in binary? Well, in decimal if we want to express a value greater than 9 we just add more digits. Each new digit allows ten times more numbers:
|103||102||101||number of possible values|
You might remember learning in school how the right-most column in a number is the units, the next one to the left is the tens, the next the hundreds and so on. In general n columns can express 10n values. The principle is exactly the same in binary, except we use 2n:
|24||23||22||21||number of possible values|
Try writing out all of the different combinations of three bits (
010 and so on). You will find that there are eight. Adding an extra bit doubles the number of possible values. So adding more bits allows for a greater range of values.
We need to decide how many bits should be in each binary number. Is
010101011010101 one number of 15 bits or three numbers of 5 bits? There's no way to tell just by looking at the bit pattern. We need a convention. The computing industry has managed to agree on making eight bits the basic building block. This is known as a byte. Values in computers are expressed as multiples of bytes.
Why eight and not seven or nine bits to a byte? Why can't we have 1.5 bytes? Firstly, complex components handling large values are built from simpler components handling smaller values by pairing them together. These pairs are then combined with other pairs and so on. Each pairing allows the new component to handle values that are twice as big. We're back to the 2n scaling of binary. It's much easier to work with values that are a power of two, such as 8 (23). Once you have a component that works on bytes it's easier to make one that works on multiples of bytes than on fractions.
Secondly, eight bits allow for 28 or 256 different values. This is sufficient for many use cases. It's a happy medium. For example, English-language text can be expressed as bytes using an encoding called ASCII (yes, pronounced “ass-key”). It assigns lowercase letters, uppercase letters, numbers, punctuation marks and formatting symbols to values between 0 and 127 (27). As long as you know the mapping you can convert numbers to letters and back again. The numbers:
72 101 108 108 111
Can be mechanically converted (or “decoded”) into:
H e l l o
This was awesome for everyone speaking English but the European world was unhappy at not being able to write their favourite ä, é and friends. Extended ASCII makes use of an extra bit to gain space for 128 more values including letters with diacritics and mathematical symbols.
Extended ASCII was sufficient until the computing industry caught up with the fact that non-European speakers would also like to be able to use computers in their native languages and writing systems, necessitating a shift to encodings that use more than one byte and therefore allow a greater range of values. The Unicode standard assigns a number (known as a code point) to well over a million written characters and marks. There are various encodings from the Unicode code points to actual byte values. Because there are so many code points a straightfoward encoding would need four bytes (232) to hold all of the possible values. This is what the UTF-32 encoding does. However, it's very space inefficient because many of the bit values will be mostly zeroes (the character ‘p’ is encoded as
0b00000000 00000000 00000000 01110000!). The most common encoding, UTF-8, uses a clever variable-length encoding to more efficiently store code points in the same number of bytes.
Bytes can be written bit by individual bit. Here is 255:
0b11111111. Binary numbers are conventionally written with a leading
0b so that you know that
0b11 represents binary three and not decimal eleven. However, for humans it's slow and difficult to read long sequences of 1s and 0s. How long does it take you to tell whether
0b1111111111111111 is the same as
Hexadecimal is a more compact way of expressing numbers. It uses sixteen as its base value: the standard 0-9 for the first ten values and then A-F to express the remaining six (10-15 in decimal). Looking back to the base two table above we see that four bits are enough to hold a single hexadecimal value. This means that a byte can be expressed as two hexadecimal values:
- Take a byte value e.g.
0b10110011and split into halves:
- Convert each half to decimal (if you can't go directly to hexadecimal in your head):
- Convert each half to hexadecimal:
- Squish halves together:
Hexadecimal values are conventionally prefixed with
0x to make clear that we're working with hexadecimal.
0x10 is not decimal 10, it's 16. It's important to see that expressing a number in a different base doesn't change its value. Hexadecimal is just a more convenient, human-readable representation that works nicely with the hardware's preferred binary because it's also a power of two.
Decimal is good if you have ten fingers but tricky for eight-bit bytes because it's hard to know where one byte ends and the next begins:
This is the ASCII-encoded “Hello” from above but with the spaces removed it's impossible to know how to split the digits. Is the first byte value 7 or 72? In hexadecimal you know that every pair is one byte:
Hexadecimal is commonly used when presenting bit patterns to the user or counting things that are naturally hexadecimal (e.g. memory addresses). Often such details are fairly low level and so perhaps for many a blue screen full of hexadecimal is an indication that something has gone badly wrong.
As a web developer your main exposure to hexadecimal is almost certainly going to be CSS colour values. The RGB format is made up of one byte encoding the amount of red, one for the amount of green and one for the amount of blue. White is therefore full red, full green and full blue:
0xFF FF FF. Black is no red, no green and no blue:
0x00 00 00. Armed with this understanding of hexadecimal you can impress your friends and colleagues by changing colours by editing the hex values directly! You might have a grey like
#ADADAD but think it looks a bit too dark. You can brighten it by increasing the value for each component by the same amount:
#DEDEDE. Hours of fun!