Bits, bytes and hexadecimal

You can't go very far in computer science and programming without encountering binary and hexadecimal. Binary is important to understand because it's literally what computers are based on. Don't let your eyes glaze over when you see hexadecimal. It's easy to read and kinda useful.

Bits and bytes

A binary choice has only two options. Binary numbers are the same. There are only two possible states. A single binary value is known as a bit. It can only be 0 or 1. The equivalent in the decimal system is a digit, which can be one of ten values: 0-9. Computers use binary because it's easier to implement in circuitry.

Having only two possible values doesn't seem all that great. How can we express a value greater than decimal 2 in binary? Well, in decimal if we want to express a value greater than 9 we just add more digits. Each new digit allows ten times more numbers:

103 102 101 number of possible values
9 10
9 9 100
9 9 9 1000

You might remember learning in school how the right-most column in a number is the units, the next one to the left is the tens, the next the hundreds and so on. In general n columns can express 10n values. The principle is exactly the same in binary, except we use 2n:

24 23 22 21 number of possible values
1 2
1 1 4
1 1 1 8
1 1 1 1 16

Try writing out all of the different combinations of three bits (000, 001, 010 and so on). You will find that there are eight. Adding an extra bit doubles the number of possible values. So adding more bits allows for a greater range of values.

We need to decide how many bits should be in each binary number. Is 010101011010101 one number of 15 bits or three numbers of 5 bits? There's no way to tell just by looking at the bit pattern. We need a convention. The computing industry has managed to agree on making eight bits the basic building block. This is known as a byte. Values in computers are expressed as multiples of bytes.

Why eight and not seven or nine bits to a byte? Why can't we have 1.5 bytes? Firstly, complex components handling large values are built from simpler components handling smaller values by pairing them together. These pairs are then combined with other pairs and so on. Each pairing allows the new component to handle values that are twice as big. We're back to the 2n scaling of binary. It's much easier to work with values that are a power of two, such as 8 (23). Once you have a component that works on bytes it's easier to make one that works on multiples of bytes than on fractions.

Secondly, eight bits allow for 28 or 256 different values. This is sufficient for many use cases. It's a happy medium. For example, English-language text can be expressed as bytes using an encoding called ASCII (yes, pronounced “ass-key”). It assigns lowercase letters, uppercase letters, numbers, punctuation marks and formatting symbols to values between 0 and 127 (27). As long as you know the mapping you can convert numbers to letters and back again. The numbers:

72 101 108 108 111

Can be mechanically converted (or “decoded”) into:

H e l l o

This was awesome for everyone speaking English but the European world was unhappy at not being able to write their favourite ä, é and friends. Extended ASCII makes use of an extra bit to gain space for 128 more values including letters with diacritics and mathematical symbols.

Extended ASCII was sufficient until the computing industry caught up with the fact that non-European speakers would also like to be able to use computers in their native languages and writing systems, necessitating a shift to encodings that use more than one byte and therefore allow a greater range of values. The Unicode standard assigns a number (known as a code point) to well over a million written characters and marks. There are various encodings from the Unicode code points to actual byte values. Because there are so many code points a straightfoward encoding would need four bytes (232) to hold all of the possible values. This is what the UTF-32 encoding does. However, it's very space inefficient because many of the bit values will be mostly zeroes (the character ‘p’ is encoded as 0b00000000 00000000 00000000 01110000!). The most common encoding, UTF-8, uses a clever variable-length encoding to more efficiently store code points in the same number of bytes.

Hexadecimal

Bytes can be written bit by individual bit. Here is 255: 0b11111111. Binary numbers are conventionally written with a leading 0b so that you know that 0b11 represents binary three and not decimal eleven. However, for humans it's slow and difficult to read long sequences of 1s and 0s. How long does it take you to tell whether 0b1111111111111111 is the same as 0b111111111111111?

Hexadecimal is a more compact way of expressing numbers. It uses sixteen as its base value: the standard 0-9 for the first ten values and then A-F to express the remaining six (10-15 in decimal). Looking back to the base two table above we see that four bits are enough to hold a single hexadecimal value. This means that a byte can be expressed as two hexadecimal values:

  1. Take a byte value e.g. 0b10110011 and split into halves: 0b1011 0b0011
  2. Convert each half to decimal (if you can't go directly to hexadecimal in your head): 11 3
  3. Convert each half to hexadecimal: 0xB 0x3
  4. Squish halves together: 0xB3

Hexadecimal values are conventionally prefixed with 0x to make clear that we're working with hexadecimal. 0x10 is not decimal 10, it's 16. It's important to see that expressing a number in a different base doesn't change its value. Hexadecimal is just a more convenient, human-readable representation that works nicely with the hardware's preferred binary because it's also a power of two.

Decimal is good if you have ten fingers but tricky for eight-bit bytes because it's hard to know where one byte ends and the next begins:

72101108108111

This is the ASCII-encoded “Hello” from above but with the spaces removed it's impossible to know how to split the digits. Is the first byte value 7 or 72? In hexadecimal you know that every pair is one byte:

48656C6C6F

Hexadecimal is commonly used when presenting bit patterns to the user or counting things that are naturally hexadecimal (e.g. memory addresses). Often such details are fairly low level and so perhaps for many a blue screen full of hexadecimal is an indication that something has gone badly wrong.

As a web developer your main exposure to hexadecimal is almost certainly going to be CSS colour values. The RGB format is made up of one byte encoding the amount of red, one for the amount of green and one for the amount of blue. White is therefore full red, full green and full blue: 0xFF FF FF. Black is no red, no green and no blue: 0x00 00 00. Armed with this understanding of hexadecimal you can impress your friends and colleagues by changing colours by editing the hex values directly! You might have a grey like #ADADAD but think it looks a bit too dark. You can brighten it by increasing the value for each component by the same amount: #DEDEDE. Hours of fun!

Did you find this useful?

Sign up to the mailing list and get free content sent to your inbox every two weeks.

* indicates required