This post is part of a series:
- Internet Protocol
- User Datagram Protocol
- Transmission Control Protocol
- Domain Name System
Welcome to the final post in this series! It’s been a long time coming, I know. Unfortunately, various life things got in the way and prevented me writing as much as I’d have liked. But fear delay no more! In this post, we’ll round off the series by looking at the Domain Name System (DNS).
DNS is a wonderful piece of Internet infrastructure that maps boring, hard to read IP addresses like 220.127.116.11 to fun, easy to remember domains such as
buymybook.com. People like to use domain names and computers like to use IP addresses. DNS makes both work together.
We’ll first look at domain names themselves and then how the protocol is designed. DNS is especially interesting because it’s our first example of an application layer protocol. The protocol defines a service that runs on the Internet and requires its own global network of servers to operate. We’ll therefore round off the post by briefly reviewing how the system itself is structured.
DNS is one of those sprawling protocols with lots of weird corners that you rarely, if ever, need to go poking around in. As ever, I’ll keep us tightly focused on what a web developer needs to know.
What’s in a domain name?
To recap, IPv4 uses 32-bit addresses for hosts on its network. IPv6 uses 128-bit addresses. That’s all well and good for computers but not very useful for humans. How would you like to have to remember
18.104.22.168 (Google’s IP address) every time you wanted to search for something? Not a great user experience, is it?
From a developer’s perspective, IP addresses aren’t great either. If everyone accessed your website using the server’s IP address, then migrating the server would require your users to all somehow find and use the new server’s IP address.
For most purposes, an IP address is simply too low level to be very useful.
DNS allows hosts on the Internet to be identified by a consistent domain name that maps a human-readable name to an IP address. Whenever we use a domain name, our computer first queries the DNS for the domain’s current IP address and then uses that IP address for all the TCP/IP stuff we’ve seen already.
Domain names have a hierarchical structure that reflects the structure of the DNS namespace. Let’s take as an example the website of the computer science department at the University of Oxford:
The first thing to realise about domain names is that they are read from right to left, going from the top level of the namespace to the most specific.
The DNS namespace is conceptually divided into zones that are managed by an administrator. At the very top you have the root zone. The identifier for this zone is an empty string so it’s not visible in the domain name. A zone administrator is responsible for allocating domains within their zone. Domains might point to a specific host or contain further domains known as subdomains. The dots in the domain name separate the domain and subdomain identifiers.
The root zone is administered by ICANN. The root zone is subdivided into various top-level domains (TLDs). Each country has its own TLD (e.g.
fr) and there are various “generic” TLDs (e.g.
org) that are global. Each of these TLDs will have their own administrators who allocate domains within the TLD.
For our example domain we see the
uk TLD, which is administered by the non-profit company Nominet UK. Within that TLD is the
ac subdomain, which is reserved for academic institutions. The University of Oxford is assigned the
ox domain. Presumably that dates from a time when people tried to avoid typing as much as possible?
The university’s administrators are responsible for all the domains within their zone. They have decided to give the computer science department its own
cs subdomain. The departmental administrators responsible for this domain have decided to create a
www subdomain pointing to a web server so that
www.cs.ox.ac.uk serves a web page. Depending on the department’s needs, they will have set up other subdomains for other useful services e.g.
ftp.cs.ox.ac.uk for a departmental file server (using FTP).
So it seems like domains and zones are kind of the same but not quite. A zone is a part of the DNS namespace managed by the same administrator. A domain is a subdivision of the namespace. One domain could conceivably contain multiple zones. For the most part, though, they’re more or less the same. When you buy a
co.uk domain name, Nominet UK will make you the administrator of a zone containing that domain and any subdomains you choose to make.
Since so many website addresses begin with
www., it’s natural to think that the
www. prefix is some sort of instruction to find a website. In fact, the
www bit is just another subdomain. It’s merely a convention from the early days of the web that HTTP servers should be available at a
www subdomain. Over time, many websites have moved away from this convention by having the domain itself point to the HTTP server or by using other subdomain names in clever ways (
del.icio.us was an early example).
Configuring DNS records
Imagine you have bought the domain name
mybestdomain.com. You are now the administrator of your very own little fiefdom. What can you do with it? There are two interesting things you might want to do:
- Making subdomains
- Mapping the domain and subdomains to IP addresses
For both we need to set DNS records. DNS servers will then use the information in our records to serve requests, telling users what domains are at what IP addresses.
The exact encoding of DNS records and messages (i.e. client queries and server responses) isn’t super interesting or important. You can think of records as simply a set of mappings from domains to IP addresses. Each mapping is a resource. Different resources may refer to different record types. There is a whole plethora of DNS record types, so let’s limit ourselves to just the most useful.
NSrecords provide the IP address of the domain’s authoritative name server (and thus define the domain’s zone).
Arecords contain the actual IP address mapping for that domain. Each domain should have one
CNAMErecords map one domain to another domain.
MXrecords specify the mail server for a domain. Note that this might be totally different to the
If you have set up a web server accessible to the Internet at
22.214.171.124 and an API at
126.96.36.199, you might define the following DNS record for your domain:
mybestdomain.com A 188.8.131.52 www.mybestdomain.com CNAME mybestdomain.com api.mybestdomain.com A 184.108.40.206
Anyone can now access the website at
www.mybestdomain.com and the API at
api.mybestdomain.com. Note that two
A resources are permitted because they are for different domains.
Note that the
www subdomain points to the parent domain, which in turn points to the IP address. Whenever a user visits
www.mybestdomain.com, their browser will first request the
A record for the
www subdomain. The DNS server will respond by telling the browser to instead make the
A request to
mybestdomain.com. The browser does so and the server then returns the IP address so that the browser can connect to the web server and fetch the resources.
Sadly, I always get
CNAME the wrong way round because to me
A sounds like it should mean “alias” to the “canonical” resource that holds the IP address. Instead, try to think of the
A resource as being the “authoritative” resource and the
CNAME record(s) returning the canonical name.
Querying the DNS
You will have noticed that in the previous section I talked rather blithely about “querying the website’s DNS server”. That appears to be a bit a conundrum, since to make a DNS query to the server I would need to already know the IP address of its DNS server. I can get that information from the
NS resource, but how can I retrieve the
NS resource without already knowing the IP address of the DNS server?
To answer this we consider the implementation of DNS as a system. Let’s look at what happens when you type
www.cs.ox.ac.uk into your browser, both in theory and in practice. People love to ask this question in interviews, so with this knowledge you’ll impress far and wide.
In theory, your computer will request the domain’s IP address from the root zone server. The root zone won’t have the mapping for
www.cs.ox.ac.uk, so it will give you the address of the top-level domain’s authoritative name server and tell you to go ask there. That name server will tell you to ask the name server of the next level down in the hierarchy and so on until you query a name server that has the IP address mapping you need. Here’s what it looks like:
$ dnstracer -4 -r1 -s. www.cs.ox.ac.uk Tracing to www.cs.ox.ac.uk[a] via A.ROOT-SERVERS.NET, maximum of 1 retries A.ROOT-SERVERS.NET [.] (220.127.116.11) |\___ dns2.nic.uk [uk] (18.104.22.168) | |\___ ns4.ja.net [ac.uk] (22.214.171.124) | | |\___ dns1.ox.ac.uk [ox.ac.uk] (126.96.36.199) Got authoritative answer
First we query the root server (
.), then Nominet UK (
nic.uk), then the
ac.uk domain administrators (
ja.net) and then finally the name server of the
ox domain (
Starting at the root means we can always find our way to an unknown domain but it’s obviously inefficient and puts immense pressure on the root servers. In practice, therefore, the system works in reverse, thanks to the magic of caching.
When you navigate to
www.cs.ox.ac.uk, your computer will first check if it has a local cache of the IP address mapping. There may be a cache entry if you’ve navigated to the same domain before. If not, your computer will ask your ISP’s name server for the DNS record. Since your ISP will have provided its name server’s details when you first connected to the network, this solves the problem of not knowing which name server to ask.
ISP name servers maintain a large cache of previously seen domains and therefore usually have any vaguely popular domain cached already. If not, it will almost certainly have cached the authoritative name server for the relevant zone (in this case
ac.uk), from which it can request the resources and add them to its cache. In the rare event that the name server doesn’t have any useful cache entries, it can ask a root server directly using hardcoded IP addresses with which your ISP’s sysadmins would have set up the server. This combination of caching and recursive lookup means that your ISP’s name server can quickly find any resources and helps to distribute the load across the system.
Cache entries are only valid for a certain time, which ensures that the caching server periodically re-fetches updated entries from the authoritative name server. This explains why changes to DNS records can take a while to become visible: you must wait for caching servers to clear the old record and fetch the new mappings.
The domain name system is somewhat distributed because no central server holds every mapping. Instead, the responsibility for maintaining authoritative mappings is distributed across servers in each zone and other servers will redirect to the authoritative server. Extensive caching saves the root servers from having to handle every name request on the Internet.
Nevertheless, the design of the system is hierarchical and DNS root servers play a central role, since they are ultimately authoritative for the entire DNS name space. That’s why you sometimes see them referred to as the “backbone of the Internet”. An outage in a root server can affect large parts of the Internet and are significant news events.
The Domain Name System (DNS) is an application layer protocol that maps human-readable domain names to IP addresses. The DNS name space is conceptually divided into zones, each managed by a different administrator. When you buy a domain name, you become the administrator of your domain. By defining DNS records, you can create subdomains and map them to IP addresses. A global network of DNS name servers are responsible for serving DNS requests.
And thus ends our tour of the Internet for web developers! We’ve covered the main networking protocols and seen how they fit together into a protocol stack. The end result is something beautiful: a planet-wide network of computers, each addressable by their own domain name.