In the space of just a few decades, computer networking has gone from being a curiosity involving screeching modems to being the primary reason for having a computer. Many modern laptops and smartphones aren’t much more than an interface to the Internet. Using a computer without an Internet connection feels so limited that it’s hard to believe that the first version of Windows 95 didn’t even ship with Internet Explorer!
A large amount of modern day software development involves at least some web work. For this reason, it’s absolutely critical that you have a firm understanding of computer networking, the Internet and the associated web technologies. In this chapter we’ll develop a conceptual model of networking using stacked layers of protocols. We’ll then use that model to study the Internet, its major protocols and the web.
Let’s begin by getting our terminology straight. Computers can be connected together to form a network. A host is any device connected to a network. Hosts on a network are able to exchange information. The network might involve physical connections using cables (so retro!) or, more likely these days, wireless connections using WiFi or Bluetooth. Either way, there will be a component in each host known as the network interface card (NIC) that is responsible for encoding and decoding messages to and from whatever physical signals the network uses.
If you want to call someone, you need to know their phone number. It uniquely identifies the phone you’re calling out of all of the phones connected to the phone network. It’s the same for computer networks. Each host on the network has a unique identifying address determined by the NIC. One host can talk to another by using its NIC to broadcast across the network a message addressed to the NIC of the receiving host. The receiving host’s NIC will listen to all the messages on the network. Whenever it detects a message addressed to it, the NIC will excitedly read the message into the computer’s memory and notify the OS using the interrupt system we saw in the OS chapter.
If a network only allows machines to communicate with other machines on the same network, then it is referred to as an intranet because all of the communication stays within (intra) the bounds of the network. Lots of businesses run their own private intranets, accessible only to computers within the office, to share access to storage, printers and so on.
Some networks include machines that are connected to more than one network. Such machines can be configured to function as routers by accepting messages from a host on one network and passing it on to a host on another network. This enables inter-network communication. A message from a sender in one network can reach a receiver in another network even though there is no direct connection between the sender and receiver.
The Internet, with a capital I, is one single, planet-wide inter-network of networks. Any two points on the Internet can communicate with each other by sending a message across their own little sub-network to a router, which sends it across another sub-network to another router and so on across successive sub-networks until the message reaches the destination’s sub-network and can find the destination machine.
Once the bits have arrived at their destination, whether via an intranet or the Internet, the receiver has to be able to correctly interpret them. We use protocols to specify the format of information that’s sent from machine to machine. The Internet is based on a bundle of protocols known as TCP/IP, clever routing algorithms and systems for managing and retrieving device addresses. We will look at all of these in this chapter.
In this diagram we have an intranet on the left. One host on the intranet, the router, also has a connection to the Internet, most likely provided by an internet service provider (ISP). There is a path through the Internet routers to the server, which exists on a separate intranet with just its own router for company. The PCs and servers might be on opposite sides of the world.
Messages are not sent in a single burst but are split into chunks known as packets. Each packet is sent independently of the others and may take a different route depending on network congestion and availability. The Internet is designed, in theory at least, to be decentralised. No single person or body controls the Internet. It is also designed to be resilient. If a node on the network – one of these machines straddling multiple sub-networks – fails for whatever reason, the network traffic will automatically find another route to its destination. It also improves the network’s utilisation by encouraging each packet to find the fastest route. A physical analogy would be a convoy of cars all heading from one place to another. If a car ahead gets stuck in traffic the cars behind can try taking a different route in the hope of avoiding the congestion.
The Internet forms a reliable communication layer on top of which various “applications” have been built. I don’t mean applications in the sense of desktop applications. Internet applications are closer in meaning to use cases for the Internet. Lots of things that you might think of as “the Internet” are actually applications in this sense. Email is a collection of protocols which enable electronic messages to be sent across the network. Another, known in the 1990s as the World Wide Web and now just the web, allows users to request and receive multimedia content. I’m pushing things here, but the Internet is a little like an operating system for networks. It provides a way to reliably and consistently get a message from A to B across diverse networking machinery, just as an OS reliably and consistently runs programs on many disparate computer architectures. You don’t need to worry about how your message is delivered because the Internet handles all of that for you.
In non-technical use, the Internet and the web are basically synonymous. This is because the web has become by far the most dominant application on the Internet. Despite this, it’s important to remember that the Internet and the web are not the same thing. The Internet is a communication layer that enables devices to talk to each other across network boundaries. The web is just one of many uses we have found for the Internet. The Internet predates the web by several decades.
By this point you may be wondering what exactly a protocol is. Can it be found somewhere in your computer? How does it actually do anything? How does the computer know that this huge stream of incoming bits should be interpreted with this or that protocol?
Cast your mind back to how the processor decodes instructions. How did the processor “know” that a particular bit pattern corresponds to a particular instruction? It didn’t, really. The humans who designed the processor imposed meaning by designing the processor’s decoding circuitry in a particular way. The processor is blindly operating according to its design and it relies on the incoming instructions being encoded as expected. The processor’s instruction set architecture forms a protocol between the programmer and the processor: give me an instruction in this format and I’ll execute this behaviour.
A protocol is a set of rules and expectations specifying how data should be transmitted. The target of a protocol is really other people. They’re the ones who will sit down and program their computers to interpret the bitstream you’re sending according to the agreed protocol. As long as every machine follows the protocol correctly, everything will work as desired. Everyone will agree which meanings should be assigned to which bits. The computer doesn’t “know” that the incoming communication is using a particular protocol. It merely attempts to interpret it according to whatever protocol the programmer has instructed it to use. It will happily try and interpret a message with an incorrect protocol if you tell it to do so.
The Internet is on a vastly different scale to the gates and components we’ve looked at in previous chapters. Performance costs that were almost infinitesimally small on the scale of an individual computer suddenly matter very much when we’re working on the scale of a planet.
From the processor’s perspective, it takes an extremely long time for messages to traverse the Internet. Network requests are slower than local, system requests by several orders of magnitude. Latency measures how long it takes a message to reach its destination. A certain, unavoidable amount of latency is due to the time it takes the message to travel across the Earth. Additional latency can be added by the performance of the machines along the message’s path through the network. Latency is usually measured in milliseconds and lower is better. As a rough estimate, it takes 150ms for a message to go from London to San Francisco. That is thousands of times slower than a local memory access.
The other important performance consideration is how much data can fit into the network’s piping at once. This is known as the bandwidth. Clearly, if we can transmit more bits of data at once, then we can transmit a message more quickly. Bandwidth is measured in bits per second and higher is better. In my lifetime we’ve gone from a humble 56Kbps (kilobits per second) to megabits and now gigabits per second.
Both latency and bandwidth are important for performance but in different ways. If you want to transfer a lot of content (e.g. streaming video), then you’re interested in bandwidth. It doesn’t really matter whether it takes 50ms or 500ms for the first bit of content to reach you so long as the bandwidth is sufficient to keep the data buffers full and your video playing. If the network activity involves lots of round-trips, when the sender transmits something and waits for a response from the receiver, even fairly small changes in latency can have a large impact on the total transmission time. On the other hand, low latency won’t do us much good if a lack of bandwidth means we can only send a minuscule amount of data per second.
The complexity of the Internet means that each message has to convey a lot of information. Let’s take the example of posting a comment on someone’s blog. So that your message can get to the right host and so that the receiving computer can know what the hell to do with it, your transmission’s packets will need to specify: the receiver’s network address; the application protocol; tracking information so that the complete transmission can be reconstructed; and the actual content of your comment itself.
As ever, when faced with great complexity, we can make it more manageable by breaking down the problem. The solution in networking is to layer multiple protocols on top of each other to create a network stack. Each layer has a different role. This modularisation creates a highly effective separation of concerns. The layer that’s responsible for delivering a message across a network is completely separate to the layer that routes the message between networks and that layer is different from the layer that instructs the receiver how to interpret the message.
There are two networking stack models that you will come across. The more general one is known as the OSI model (after Open Systems Interconnection, though you’ll never need to know that). It has seven layers. The simpler model is that of the Internet protocol suite, which defines how the Internet works. It has four layers that broadly correspond to specific OSI layers. We’ll focus on the Internet protocol suite’s stack model in the interests of simplicity.
|Internet layer||OSI number||Common protocols|
|Application||7||HTTP, SMTP, FTP|
The functionality of each layer is implemented by the various protocols that operate at that layer. At the heart of the communication is the payload: the actual message that you want to transmit. In this example it’s the blog comment you want to post. Each layer wraps around the content of the previous layer, including that layer’s metadata. It’s like the nested Russian dolls known as matryoshki.
The NIC is at the bottom of the stack and user programs are at the top of the stack. Starting from the top, the outgoing payload generated by the user program is wrapped in each layer’s metadata. This means that the link layer, at the bottom, forms the outermost metadata wrapper. On the receiving computer, the process works in reverse. First it unwraps the link layer metadata, then the internet layer and so on until the application layer passes the payload to whatever user program is configured to handle the message. The code that performs all of this work is referred to as the operating system’s networking stack and generally resides in the kernel.
The final, wrapped message that is broadcast or received on the network has layers of metadata that look roughly like this:
Let’s look at each layer in more detail, starting from the bottom. As mentioned above, the foundations of Internet networking is TCP/IP. This is actually a combination of two protocols working on different layers.
The link layer represents the computer’s physical connection to the network via the NIC. This layer is responsible for sending the packets, technically referred to as frames at this layer, out over the physical network. The protocols at this level can only communicate with hosts that the machine is physically connected to. A common example of a link layer protocol is Ethernet. If you’re not using WiFi, you probably plug your computer into the router using an Ethernet cable.
One of the main tasks of the link layer is to handle situations when multiple devices attempt to broadcast over the network simultaneously. This is resolved by having both parties wait for a random time before reattempting the communication. One will go first and successfully broadcast its message. It’s a bit like that awkward thing when you’re speaking to someone on the phone and there’s a bit of a delay (i.e. latency) so you both try to speak at the same time and then wait a bit for the other person to speak. This is technically referred to as media access control, or MAC, from which we get the name “MAC addresses”.
Every physical connection to the network is assigned a unique MAC address in the form of six bytes usually expressed in hexadecimal e.g.
60:f8:1d:b1:6f:7a. MAC addresses are usually assigned permanently to the NIC hardware and do not change. The host retains its MAC address as it moves between Ethernet networks.
Usually the OS will allocate a block of memory that is shared between the OS and the NIC. When the NIC receives a packet, it can write it directly to the shared memory. This is known as direct memory access and is more efficient that the NIC having to ask the processor to copy the packet into memory. The processor carries on with its own work while in the background the NIC takes care of moving the packet into memory. Once the packet is ready, the NIC notifies the kernel via an interrupt. When the kernel wants to transmit data, it writes to the shared memory and signals to the NIC to start transmission.
At this layer we break out of the confines of a single network. Here is where we meet the IP part of TCP/IP – the Internet Protocol (IP). The IP network is an overlay over interconnected link layer networks.
The Internet Protocol provides routing and packet forwarding across networks. Each host on the IP network is assigned its own IP address. The Internet Protocol is responsible for routing the packet across network boundaries to the correct host. It is thanks to the Internet layer that we can communicate with machines that are not directly connected to our network. One important limitation of IP is that it only offers best-effort delivery. It provides no guarantee that the packet will ever reach its destination, nor does it tell you if it arrived. Sending an IP packet is the networking equivalent of screaming into the void.
Since each layer builds on the previous one, a device connected to the Internet will have two addresses: a MAC address from the link layer and an IP address from the Internet layer. Each address only has meaning at its own layer. It doesn’t make sense to address an IP packet to a MAC address. The MAC address represents the physical machine while the IP address, in a more abstract sense, represents a point on the network. Usually your Internet service provider (ISP) assigns you an IP address when you connect to the Internet.
We will look at IP in more detail below.
The transport layer provides services that the basic Internet layer lacks. For example, we saw that IP connections are unreliable. If we want to be sure that a message has been delivered, we need some additional functionality. Protocols at this layer provide services such as delivery guarantees or the concept of a sender-receiver connection that survives between transmissions.
The Transmission Control Protocol (TCP) is the main protocol the Internet uses at this layer. It builds on IP by providing important functionality that IP does not offer. We’ll see some of them when we look at TCP in more detail below. For now, just bear in mind that these features come with a performance cost because the protocol has to send various confirmation and acknowledgement messages back and forth. In TCP, packets are technically referred to as segments but we’ll stick with “packets” for simplicity. Just remember that the term means something slightly different at each layer.
Another common transport protocol is the User Datagram Protocol (UDP). This protocol forgoes a lot of TCP’s bells and whistles in order to deliver a higher throughput of data. It’s used for things such as streaming media and online games, where lots of information needs to be sent quickly and it doesn’t matter so much if a few packets get lost along the way.
At the top of the stack we have the application layer. The lower layers are the networking nuts and bolts which ensure that a message is successfully delivered without really caring about the message itself. The protocols on the application layer play no role in the transmission. They are concerned with the message itself. Each application on this layer specifies a protocol for the data that it communicates over the network. “Application” here doesn’t strictly mean a particular program running on your computer. It’s more akin to the use case you have for the network in the first place: sharing a file, sending a textual message and so on.
There are a whole range of protocols covering everything you can do on the Internet. Very important protocols include the File Transfer Protocol (FTP) and the Simple Mail Transfer Protocol (SMTP). Hopefully their purposes are obvious. These protocols have been around for decades; more recent examples include BitTorrent and Bitcoin.
By far the most common application layer protocol is the Hyper Text Transfer Protocol (HTTP). We’ll look at it in detail later in the chapter. As we’ll see, HTTP has expanded far beyond its original use case and morphed into a kind of protocol behemoth that’s used as a general communication protocol on the Internet.
Let’s now look at the major Internet suite protocols in more detail. We’ve already seen that the Internet Protocol (IP) is responsible for getting packets of data across network boundaries to the correct destination. Each device on the Internet is allocated an IP address. If a machine needs to be easily accessible, such as a server, it will often have a static IP address that doesn’t change. That makes it easier to locate on the network. When you connect your home router to your ISP’s service, however, your router is likely assigned a dynamic IP address from a pool of available addresses. A dynamic IP address might change at any moment but that’s not a problem because the only Internet traffic going to your address will be in response to requests initiated by you as you go about your online business.
How do the servers you talk to know where to send their responses? The answer lies within the packets themselves. IP packets are made up of a header containing metadata and data containing the message payload. The Internet Protocol defines the structure of the header so that both the sender and receiver have a shared understanding of which bits mean what:
Don’t worry too much about what everything means. The important thing is to understand how the information is laid out. Each little rectangle represents a single bit. The numbers along the top count the bits. We have six rows of 32 bits so 192 bits in total. Different bit ranges are allocated to different metadata fields. The most interesting sections are the
Source Address and
Destination Address fields. When your computer sends out an IP packet, it writes its own IP address as the source. When the server generates the response, it sets that address as the destination.
IP addresses are 32 bits long. As we know from the computer architecture chapter, a 32-bit value can be split into four bytes, each holding 256 values. This is how an IPv4 address is usually written:
This means that there are in total 232 or 4,294,967,296 unique IP addresses. More than four billion addresses seems like a huge number but in recent years this limit has become an issue. We’ll see some workarounds and solutions later.
It’s pretty straightforward to send a message to a computer on the same network. Every device listens to all of the traffic on the network, hoping that there will be something addressed to it. All you need to do is broadcast a message with the correct address and the receiver will notice. Things aren’t so clear when the sender and receiver are on different networks. It isn’t immediately obvious how to get from my computer to, say, Netflix’s servers. This is where IP’s secret sauce comes in.
When the packets leave my home network, the first place they reach is the router belonging to my ISP. Routers aren’t anything special. In theory, any computer can be a router so long as it’s connected to more than one network. What makes a computer a router is that it’s running a program that routes packets. It’s the activity that makes it a router, not some special physical characteristics. You can make your own home router with a Raspberry Pi, if you like. Of course, ISPs’ routers have to handle huge amounts of traffic so they have optimised hardware. They don’t do anything a Pi router couldn’t do – they just do it on a much, much bigger scale.
A router maintains a list of known IP addresses. When the router receives a packet, it will inspect the packet’s destination address. If the router knows the address, it will forward the packet directly to the destination. If the router doesn’t know how to deliver the packet, it will instead try and route it to another router that’s a bit closer to the destination. Each step from router to router is known as a hop. Each hop takes the packet a bit farther along its path until it hops on to a router that does know the destination and can successfully deliver the packet. The specifics of how the next-hop router is chosen depends on the particular algorithm used by the router.
In the packet header you can see a field called the time to live (TTL). This records how many hops the packet is allowed to make. At each hop the receiving router decrements the packet’s TTL. Once the TTL hits zero the router drops the packet and will not forward it any further. This prevents packets from getting caught in a loop, endlessly cycling through the same routers and clogging up the network. Packets can also be dropped if a router can’t process incoming packets quickly enough. Every packet will eventually leave the network by either reaching its destination or getting dropped.
You can see the route taken by packets across the Internet by using a tracerouter such as mtr:
This shows the complete route from my computer to Google’s server
1e100.net (10100 being a googol). Host number one is the one closest to me and number twelve is Google’s server. Each host is a router on the way. The tracerouter sends out a bunch of packets with increasing TTLs to the destination IP address. The first packet, with a TTL of one, will only get as far as the first hop before it gets dropped, the second, with a TTL of two, will get as far as the second hop. The packets sent by
mtr use a protocol called Internet Control Message Protocol (ICMP) to request identifying information from each router. Not every router will agree to provide information, which is why you see
??? in the output above. The pings indicate, in milliseconds, how long each hop took.
Note that it’s possible for an IP communication to fail if the packet’s TTL runs out before it reaches its destination. The packet header also doesn’t include any information about whether the packet is part of a bigger sequence. Because of these limitations, the client cannot know whether the packet was successfully delivered and the receiver cannot know whether any packets were dropped by the network.
To solve these problems we must turn to the Transmission Control Protocol (TCP). Remember that TCP sits in the layer above IP in the network stack. When the OS receives an IP packet, it strips off the header to leave just the data payload. The payload is a TCP segment, which once again consists of a TCP header and a data payload. Here’s what a TCP header looks like:
The TCP header contains useful information such as its position in the overall transmission, known as the sequence number. The receiver can use this to put the segments in the correct order and detect gaps where packets were dropped by the network. The receiver can then notify the sender and request their retransmission. TCP can almost guarantee that the packets are delivered in the correct order. I say “almost” because if the underlying IP layer can’t deliver the packets, there isn’t much TCP can do to remedy that. What it does offer is the certainty that it will successfully deliver the packet or generate an error response. This is a big improvement on IP, which doesn’t give any response.
TCP implements a hugely helpful abstraction in the form of a two-way connection between the sender and receiver. The hosts at either end can read and write streams of data from the connection, much as they would with a local file. The protocol hides the fact that it splits the message up into lots of little packets and recombines them on the other end. On the receiving host, the program that consumes the TCP stream just sees the message pop up in the same form as it was originally sent.
Remember that TCP sends its messages using IP. It builds the abstraction of a reliable connection on top of something that fundamentally lacks both reliability and the concept of a connection. TCP utilises a sophisticated state machine of messages and acknowledgements to help the sender and receiver maintain the same expectations about who is sending what at any given moment. However, because the intermediate network is unreliable, it is impossible to completely guarantee consistency between the sender and receiver. The sender could send a message to check that the last packet was successfully received but, because the IP layer is unreliable, that message and any acknowledgement from the receiver could both be lost.
In computer science this is known as the Two Generals problem (also Byzantine Generals) after a thought experiment in which two generals try to coordinate an attack by sending messages to each other through territory controlled by an enemy. It has been proven that it is impossible for the generals to guarantee that they will both attack at the same time (i.e. achieve consistency). They can never be sure that any message they send will be successfully delivered and not intercepted by the enemy.
Nevertheless, a TCP connection is much more reliable and easier to use than sending raw IP packets. Unfortunately, this does not come for free. Negotiating the initial connection requires a few round trips, inevitably adding latency at each step. Time is also spent waiting to see if missing packets turn up and requesting the retransmission of dropped packets. In addition, neither the sender nor the receiver know what data transfer rate the intervening network can handle. Although TCP has the ability to automatically tune the transfer rate to the network’s capabilities, each connection has to begin at a conservative rate that gradually increases until packets start dropping off. This ramp up is required every time a new connection is made, even between the same hosts.
Thanks to the combination of TCP/IP, we now have the ability to reliably deliver a complete message to an IP address of our choosing. How does the receiver know what to do with the messages it receives? The TCP header has a defined format but the data payload could be anything. Referring back to the network layers diagram, we observe that above the transport layer is the application layer. How does the receiver know which application protocol to use?
Note that the TCP header specifies a source port and a destination port. By convention, each application protocol is assigned a port number. The OS takes the delivered message and passes it to the correct port. Programs running on the receiver can listen on the port by registering themselves with the OS. The convention is that a program listening on a given port will correctly parse messages using the port’s assigned protocol.
The combination of IP address and port number uniquely identifies an application on a particular host:
So we have one number that indicates the host and another that tells the host what to do with the message. If you squint and look sideways, this might remind you of how Linux uses one interrupt number (
0x80) for all system calls and disambiguates them with a second system call identifier. Both are examples of multiplexing. Remember from the computer architecture chapter that multiplexing involves sending signals intended for different destinations along the same route. The receiver demultiplexes the signal by examining the destination of each signal and dispatching it correctly. Networking makes heavy use of multiplexing. The central parts of the network carry lots of packets all on their way to different destinations, just like a motorway carries vehicles that are on their way to different destinations. Within a particular connection between two hosts, there is further multiplexing at the port level. This is like two people travelling in the same car to the same building but then going to separate floors within the building.
On Linux, ports are implemented using sockets, a special type of file. At one end of the socket, the OS kernel writes the network messages once they’ve been reconstructed by the TCP/IP stack. The listening application simply reads the messages from the socket. This creates a neat separation between the network-oriented part of the kernel and the listening applications, bringing the usual benefits of separating concerns. It means that user programs can read from and write to the connection as easily as a local file (via system calls, of course). The OS hides all the hard work and provides user programs with a simple interface.
Ports numbered below 1024 are designated “well-known” ports and have an assigned protocol. There’s nothing stopping you from making your own protocol and using one of the well-known ports, but you won’t be able to use it to communicate with anyone else because they will expect that port to use the standard protocol. Remember that in networking convention is key!
On Unix-like systems, a program normally needs superuser privileges (i.e. using
sudo) to connect to a well-known port. The original idea was to stop some random user logging into a shared computer and replacing a well-known service such as FTP with their own nefarious program. This restriction has actually become something of a security issue itself because it means that lots of Internet-facing programs need to run with escalated privileges. Servers such as nginx try to mitigate this by using the superuser privileges to listen on the port and then immediately de-escalate themselves to lower privilege levels as much as possible.
We’ve seen that every host on the Internet is assigned a unique IP address. Let’s now explore some problems that have arisen as the Internet has grown. The first thing to be aware of is that there are actually two versions of IP in common usage: IPv4 and IPv6 (we don’t talk about v5). IPv4 is the currently dominant form and what we’ve been looking at so far. As we know, IPv4 addresses are 32-bit integers, giving over four billion possible addresses.
Four billion is not so many in a world of billions of Internet-connected phones and devices. The situation is made worse because not all of the address space is available. Various sections of it have been allocated for special purposes. For example, addresses
10.0.0.0 - 10.255.255.255,
172.16.0.0 - 172.31.255.255 and
192.168.0.0 - 192.168.255.255 are designated for use on private networks (see below) and should not be used for addresses on the open Internet. In the early days of the Internet, many companies and institutions were allocated vast, virgin tracts of the IP address space in a way that seems rather overgenerous nowadays. The U.S. Department of Defense, Apple and Amazon all own millions of IP addresses, making them unavailable for general use. Simply put, the Internet has run out of IPv4 addresses.
An initial solution involved simply hiding lots of connected devices from the public Internet. You don’t need a public IP address if you’re not on the public Internet! When I ask Google “what is my IP address?”, it tells me
126.96.36.199. Yet when I ask my computer by running
ifconfig | grep inet, I get a different address:
10.6.6.128. What is going on? There is a clue in the IP address my computer tells me. The address is within the range
10.0.0.0 - 10.255.255.255, which is reserved for private networks. This means my computer isn’t directly connected to the Internet! In fact, the only device connected to the Internet is my router. My computer sits behind the router in a private network inaccessible from the public Internet. Every time a device connects to my WiFi network, the router assigns it an IP address on my local private network. This is why my computer has a private IP address.
How then does my computer communicate with hosts on the Internet? Every IP packet my computer sends will have the private address in the source address header field. The first destination of every outgoing packet will be the router. Before routing the packet on to the public Internet, my router will replace the private IP address with the router’s own public IP address. When the router receives the response, it will reverse the process and redirect the packet to my computer’s private IP address.
No matter how many computers are connected to my network, I only need one public IP address for my router. In this way a single public IP address can multiplex a large number of computers. Technically the router is acting as a gateway between my local private network and the wider Internet. As well as drastically reducing the number of IP addresses needed, this also offers benefits to network administrators who can put a firewall in front of the gateway or monitor all the traffic going through it.
Useful as private networking is, we have not addressed the fundamental problem of insufficient IP addresses. To increase the number of addresses we need to allocate more bits to the address fields in the packet headers, which in turn requires amending the protocol. IPv6 is an updated version of the protocol using 128-bit addressing. 2128 is an absurdly large number that will provide sufficient IP addresses even for a dystopian, Internet-of-things future in which every kitchen utensil has its own Internet connection. An IPv6 address is written as eight blocks of hexadecimal characters:
This means that IPv4 and IPv6 are not compatible. An IPv4 address can be expressed as an IPv6 address but a router that only understands IPv4 won’t know what to do with an IPv6 address. The uptake of IPv6 has been pretty slow, partly because of this, but you will see more and more of it.
Using binary addresses, whether 32-bit or 128-bit, is all well and good for computers but not very useful for humans. How would you like to have to remember
188.8.131.52 (Google’s IP address) every time you wanted to search for something? Not a great user experience. From a developer’s perspective, IP addresses tie you very tightly to a particular host. If you redeploy a web app on to a different server, the web app’s IP address will change. For most purposes an IP address is simply too low level. A big improvement is human-readable domain addressing. These are the
www.google.com addresses that we all know and love. They are implemented by the Domain Name System (DNS).
DNS allows hosts on the Internet to be identified by a domain name that maps to an IP address. This mapping can be easily updated by the owner of the domain, relegating IP addresses to an implementation detail of no concern or interest to users. The domain owner can update a DNS record to point to a new IP address without the user even realising.
The structure of DNS addressing is hierarchical. This is reflected in the domain name by separating each level of the hierarchy with full stops e.g.
www.example.co.uk. Reading a domain from right to left will take you from the top level to the most specific. At the very top you have the root zone. The identifier for this zone is an empty string so it’s not visible in the domain name. Next are the top-level domains (TLDs) such as
.uk and so on. At each point of the hierarchy there is an administrator who is responsible for domains within their zone. The administrator of each TLD can allocate domains within the zone. A domain can point directly to a host or contain more subdomains.
We will take as our example
cs.ox.ac.uk, the domain of the computer science department at the University of Oxford. After the implicit root we have the
uk TLD, managed by the non-profit company Nominet UK. Within that TLD is the
ac subdomain, which is reserved for academic institutions. The University of Oxford has been assigned the
ox domain. The university’s administrators are responsible for the domains within this zone. They have decided to give the computer science department its own
cs domain. The departmental administrators responsible for this domain have decided to create a
www subdomain pointing to a web server so that whenever you navigate to
www.cs.ox.ac.uk it renders a web page.
www doesn’t have any special meaning. It is only by convention that a
www subdomain points to a web server. As the web has exploded in popularity, it’s more common than not that the only service a domain offers is a web server and so often the domain administrator will set the domain itself to point to the same place as the
www subdomain. We’ll see how this is done soon.
Since humans prefer to use domains and computers prefer to use IP addresses, we need a way for a computer to find the IP address of a given domain. Let’s look at what happens when you type
www.cs.ox.ac.uk into your browser, both in theory and in practice.
In theory, your computer will make a request for the domain’s IP address to the root zone server. The root zone server will tell you to go and ask the authoritative name server of the top-level domain. The TLD’s name server will tell you to ask the name server of the next level down in the hierarchy. This continues until you query a name server that has the IP address mapping. Here’s what it looks like:
First we query the root server, then Nominet UK (
nic.uk), then the
ac domain administrators (
ja.net) and then finally the name server of the
Starting at the root means we can always find our way to a particular domain but it’s inefficient and puts a lot of pressure on the root servers. In practice, the system works in reverse, thanks to caching. When you navigate to
www.cs.ox.ac.uk, your computer will first check if it has a local cache of the IP address mapping. If not, it requests the address mapping from your ISP’s name server. This server maintains a cache of previously-seen domains and very likely can answer your request straight away. If not, it will certainly have cached the authoritative name server for the
ac.uk subdomain and can request the mapping from that name server. Extensive caching saves the root servers from having to handle every name request on the Internet. They do still play a central role, since they are ultimately authoritative, and so you sometimes see them referred to as the “backbone of the Internet”.
Each cached mapping in the name servers has a time-to-live (TTL) value set by the domain’s authoritative name server. Once this expires, the mapping is removed from the cache and the name server will need to re-fetch the mapping the next time the domain is requested. If you’ve ever configured a domain name, you may have encountered a warning that changes might take several hours to propagate across the Internet. This is because you need to allow time for the name servers to flush the old IP address mapping from their caches.
DNS stores its mappings in DNS records. There are multiple types but they all map a domain (or subdomain) to another value.
NS records provide the location of the authoritative name server.
MX records are used for handling email addresses.
A records contain the actual IP address mapping.
CNAME are canonical records which map one subdomain name to another. There can be only one
A record but there can be multiple
CNAME records. A typical DNS record for a domain might look like this:
When you do an
A record lookup to
www.example.com, the server will respond by telling you to do the lookup to
example.com. Repeating the request to the new domain will result in the IP address we’re looking for. Sadly, I always get
CNAME the wrong way round because to me
A sounds like it should stand for “alias” and “canonical” should refer to the IP address. I wish you better luck remembering this.
We end this chapter by looking at the most popular Internet application protocol: the Hypertext Transfer Protocol (HTTP).
HTTP is what powers the web. It’s a protocol for requesting and delivering hypertext. Despite the futuristic name, hypertext is nothing more than blocks of structured text with connections, or hyperlinks, to other blocks of hypertext. The text is structured through the use of HTML (Hypertext Markup Language), which semantically tags each bit of hypertext. A client sends an HTTP request, specifying the requested resource, to a server which provides an HTTP response. And with that we have the foundations of the web!
The server is so called because it has a set of resources, or content, that it is willing to serve out to anyone who requests it. A server is any machine running a program capable of responding to requests. The term “server” is rather overloaded: you’ll see it used to refer to both the physical machine and the program. When the client navigates to a resource, the browser renders the content, requesting any further resources (e.g. images, fonts) that are specified in the content. Clicking on links or otherwise interacting with the resource will generate further requests.
That’s pretty much it. By itself HTTP doesn’t have the concept of an ongoing relationship between the client and server. It’s just a sequence of independent requests. Many requests from one client are dealt with in the same way as one request from many clients.
When the user clicks on a link, the browser generates an HTTP request, described below, to the domain for the specified resource. The browser first asks the OS to perform a DNS lookup to get the server’s IP address. It then opens, via OS system calls, a TCP connection to the IP address on port 80. The browser writes the request to the connection. The OS’s networking stack is responsible for the various protocol steps: negotiating the connection, transmitting the packets, resending dropped packets etc. When the server’s OS receives the packets, it does the work in reverse to reconstruct the original message. The server OS then passes the message off to whatever application is listening on port 80. This will hopefully be an HTTP server. The server parses the request, generates an appropriate response (e.g. retrieving an HTML file from disk), and passes it off to the OS to be sent back to the client over the TCP connection.
HTTP uses TCP port 80. The original idea of different application layer programs using different ports fell foul of network administrators who, in the interests of security, preferred to block traffic on all but the most important ports. HTTP is essential so it’s guaranteed to work. This has led to the current situation, in which HTTP is used as a generalised communication protocol simply because it’s safe to assume that the communication won’t be blocked. Another advantage is that HTTP is simple and easy to read. An HTTP request is just plaintext over multiple lines:
The first line is known as the request line. The first word (
GET) specifies the HTTP request method. This tells the server which action the client is requesting. The second element specifies the requested resource. At the end of the line we specify the particular form of the HTTP protocol we are using.
HTTP/1.1 has been dominant since the early 90s, though its successor,
HTTP/2, is gradually becoming more widespread.
Here we are requesting the root resource (
/) of the
my-great-book.com domain, which defaults to
index.html. If we want to request a different resource, we simply need to specify it:
GET /static/styles.css HTTP/1.1.
After the request line comes a series of headers, which are encoded as key-value pairs separated by a colon. The example request has only a single header specifying the host name. Other headers include
User-Agent, which provides information about the client, and various
Accept headers, which specify the response types the client is capable of understanding. After the headers comes an empty line and an optional message body. The body is also known as the payload. Data submitted by the client, such as the content of an HTML form, is stored here. The example above has no payload.
There are nine request methods but only a few are commonly used. A useful way of understanding the different methods is to map them to four operations: create, read, update and delete. A server that responds to requests in this way is said to provide a RESTful interface.
GETtells the server that we want to read the resource at the specified path e.g.
GET /users/1requests information on the user with ID 1.
POSTsays that we want to create a resource using the data submitted in the request body e.g.
POST /users/creates a new user with the submitted parameters. The response should specify how to access the newly created resource e.g. by providing the new user’s ID.
PUTsays that we want to update the specified resource with the request payload e.g.
PUT /users/1updates the user’s parameters to match the submitted payload.
DELETEtells the server to delete the resource at the specified path e.g.
DELETE /users/1deletes the user with ID 1.
It is very important to use request methods correctly.
GET is known as a safe method that only retrieves information and does not change the server’s state. A
GET operation should be idempotent, meaning that it produces the same result if performed multiple times.
DELETE may all change the server’s state. Webcrawlers, used by search engines to index the web, will follow
GET links but will not follow links with unsafe methods to avoid changing the server’s state. Make sure you get this right. If you implemented a deletion functionality using links like
GET /photos/1/delete, everything would appear to work correctly until one day a webcrawler came across your site, followed every
GET link and deleted every resource on your website!
An HTTP response is also plaintext over multiple lines:
The first line is the status line, which specifies the protocol version and the response’s status code. You may already be familiar with
200 OK, meaning success. As with the request, the next section is a list of headers containing the response metadata. There are too many to go through in detail, but from a cursory examination we can see the content type and length, server information, metadata for caching and others. No header is absolutely required but many are very useful. After the headers there is an empty line and then an optional message body. In response to a
GET request it contains the requested resource. In this case, it’s the HTML resource requested by the client.
The response code is a number that indicates the response type. Sometimes the response code alone conveys enough information and the response contains no further data. In other cases, the response includes more data in the message body (e.g. the
200 OK above). Response codes fit into five general categories as indicated by the first digit:
1xx: information responses, rarely used. The server might send
100 Continueif the client is sending a lot of information and the server wants to reassure the client that everything’s going fine.
2xx: successful responses.
200 OKis most common and means that the request was successful.
3xx: redirection messages. The resource is available but not at the requested path. The response body will indicate to where the client should make the request.
4xx: client error messages. The client has made an invalid request. The classic
404 Not Foundmeans the client requested a resource that doesn’t exist.
400 Bad Requestmeans that the request was invalid and the server couldn’t understand it.
5xx: server error messages. The request is valid but the server has encountered some kind of error. As a web developer, you’ll become tragically familiar with
500 Internal Server Error.
How browsers handle responses depends on the response code. It’s very important that you use the correct response code. A common mistake is to indicate an error by returning a JSON encoded error in a
200 OK response. The browser sees the
3xx response code.
HTTP specifies simple, human-readable request and response formats. It’s easy to observe network traffic just by looking at the HTTP messages going back and forth. You can open your browser’s developer tools and read network requests and responses, which is frequently helpful when debugging client-side code.
The main limitation of HTTP is that it is unencrypted. This means that HTTP traffic is visible to anyone watching network traffic. Plain HTTP is obviously unsuitable for anything requiring secrecy. The HTTPS protocol, running on port 443, acts as an encrypted wrapper around HTTP. An observer can see that the HTTPS connection has been made but cannot see what passes through it. HTTPS availability has increased massively in recent years, in large part due to concerns about online snoopers. If you are responsible for any web servers, I strongly encourage you to ensure you serve content over HTTPS. You may not think that your content is important or sensitive enough to justify the effort, but don’t forget that you are also exposing your users’ online activity. You cannot decide on your users’ behalf whether their personal activity should be public and so you should give them the option to use a secure connection.
In this chapter we saw how computers can communicate through networking. The Internet is a global network of networks. We saw that a useful conceptual model for networking is a stack of layers, each responsible for delivering a different part of networking functionality.
We looked in detail at the Internet protocol stack. At the bottom are the physical transmission media like Ethernet and WiFi. Above them is the Internet protocol, which is responsible for routing packets of data across the Internet. In the next layer, TCP adds the abstraction of a persistent, reliable connection between two hosts. The TCP connection is used by applications in the topmost layer. The most popular of these applications is HTTP, which underpins the modern web. We examined the HTTP request-response cycle to better understand how browsers request and render online resources. We also saw how addressing on the Internet is implemented via IP and domain name addressing.
There are two ways in which you can continue your study of computer networking. You can focus on the upper half of the networking stack and learn how to deploy applications so that they are secure, performant and scalable. Much of this comes under the modern “devops” banner. Alternatively, you can take the perspective of a network administrator and focus on the lower parts of the stack. I recommend that you start from the application level and continue studying down the stack as far as your interest takes you.
A useful practical exercise is to deploy a web application on a virtual machine in the cloud. If you’ve only ever deployed things on a managed service such as Heroku, I’d go so far as to say that this is an essential exercise. Take a plain virtual machine on AWS or Google Cloud and set everything up manually yourself: private local network, Internet gateway, firewall, DNS records, load balancing and so on. You’ll have a much better understanding of how your application connects to the Internet. Once you’ve got everything working, it’s perfectly acceptable to decide that you’d rather just pay Heroku (or some other service) to save you the hassle in future. By knowing what these services are doing, you can make that decision in an informed way.
My favourite resource for developers is High Performance Browser Networking. In a short space it covers the most important Internet protocols (IP, TCP, HTTP) and includes lots of practical advice on how to make your websites speedy. Basically all of it is required knowledge for web developers. It also includes sections on HTTP/2.0, HTTPS and TLS, which I did not have space to discuss in this chapter. For a deeper understanding, I recommend Computer Networking: A top down approach by Kurose and Ross. “Top down” here means that it begins from the perspective of the application developer and works its way down the network stack. Read until you lose interest. Another popular textbook is TCP/IP Illustrated: Volume 1 by Fall and Stevens. It covers much more than just TCP and IP.
If you make it all the way through those textbooks and still want more, you might be interested in networking from the perspective of a network administrator. Should you wish to go this far, the de facto standard is Cisco’s CCNA certification. There is a wealth of free study materials online that you can use without actually signing up to get the certificate.