Large language models and AI

Introduction

The first edition of The Computer Science Book finished with compilers because I felt they represented the perfect capstone project: a complex application that drew together multiple computer science topics to deliver seemingly magical results. In the six years since I wrote the first edition, advances in large language models (LLMs) have made compilers seem almost trivially simple.

In this chapter, we’ll cover how LLMs work at a deep level and then explore how they generalise to artificial intelligence systems. In the sense I’ll use throughout this chapter, artificial intelligence means a system that can perceive some part of the world, build useful internal representations, choose actions toward a goal, and improve those choices from data or feedback.

For the first time in history, humans aren’t the only things that can talk. We’re at the very beginning of working out what this means for humanity. Prosaically for software engineers, this is both a huge opportunity and a possible threat. We can now build capabilities that were previously impossible or comparatively limited, including intelligent assistants, semantic search, and automated code generation. But LLMs are different from the programs we’ve seen before. We don’t “design” LLMs. We “grow” them and there is much we don’t understand about them. Yet LLMs are still programs. They are an abstraction, hiding an enormous amount of machinery behind a deceptively simple “text in, text out” interface.

This chapter is in two parts. The first traces the inside of a working LLM: the history that led from symbolic AI to deep learning, a step-by-step walk through a GPT-2-style model, and how post-training turns a next-token predictor into an aligned assistant capable of tool use and multi-step reasoning.

There are already many excellent introductions to LLMs (included in the further reading, of course) and so I don’t want to repeat old ground. I want to use this final chapter of the book to point you towards the many exciting capabilities that are just coming into view. Therefore the second part looks outward at where AI systems are heading. We’ll look at multimodal perception, action models, world models, and the tools we’re only beginning to develop to evaluate and understand what these systems are actually doing inside.

I’m unashamedly pro-AI and excited by its potential when used in the correct context. Understanding how an LLM works under the surface helps us to evaluate lurid claims about what

Large language models and AI

Introduction

Get the free 45-page CS roadmap