Oct 2, 2023 - Technology

How AI works, in plain English: Three great reads

Illustration of a robot hand holding a book

Illustration: Sarah Grillo/Axios

"Any sufficiently advanced technology is indistinguishable from magic," the science fiction author Arthur C. Clarke famously said — but AI isn't magic, and you don't have to be a computer scientist to learn what makes ChatGPT tick.

Why it matters: AI is going to turn up soon in your workplace and home, if it hasn't already. You probably want to be prepared.

The big picture: In the ten months since OpenAI unleashed ChatGPT on the world, experts and journalists have set about trying to explain, in accessible prose, how AI programs arrive at their answers.

Here are the three best reads out there for getting up to speed on how today's generative AI works — ranked from beginner to advanced level.

1. How transformers work (The Financial Times)

  • Transformers is the name that a team of Google researchers gave in 2017 to their new approach to neural network design, which kicked off today's AI revolution. (And no, it has nothing to do with the movies.)
  • The paper those researchers wrote, "Attention is all you need," described a streamlined way to build AI language programs.
  • The FT's visualization walks you through some simple examples of how transformers work and why they turbocharged what AI chatbots could do.
  • Word count: 3000. Math required: minimal.

2. Go deeper into large language models (Ars Technica)

  • Each time you ask ChatGPT or any other LLM a question, it performs an enormous number of calculations in order to, finally, type some words back to you.
  • Behind the cursor, the LLM has mapped mountains of words to vast arrays of numbers in order to predict the next word in any sequence.
  • Journalists Timothy B. Lee and Sean Trott lay out how these "word vectors" operate — and how  dozens of layers of machine-learning "neurons" pass clues along in the AI brain to zero in on a good answer.
  • If you're only going to read one AI explainer, this is the best I've found.
  • Word count: 6000. Math required: modest.

3. The mathematician's perspective, for the rest of us (Stephen Wolfram)

  • Like all computer software, AI programs work by translating everything they touch into numbers. The latest generative AI programs do this on a scale that's unfathomably large, with billions of words needed to train them and billions of "cycles," or processor operations, needed to form an answer.
  • If you're looking for a deeper understanding of how all that works — and not just for language AI but for image-making programs, too — Wolfram's opus is a great read. It also gets into topics the other articles neglect, including the mysterious "temperature" settings that determine how much randomness a system injects into its answers.
  • Between the lucid lines of his explanations, you can also sense Wolfram's frustration that so much of building today's AI depends on shared "lore" and guesswork rather than science.
  • Word count: 18,000. Math required: considerable — but it's still understandable even if, like me, you never took calculus.

Bonus read: For a window onto the most trenchant criticisms of today's AI hype, you can't do better than "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?"

  • The celebrated and controversial paper from 2021 (by Emily Bender, Timnit Gebru, Angelina McMillan-Major and Margaret Mitchell) foretold some of the risks and problems with pushing generative AI forward into wide public use before we've fully understood and mitigated the risks — including bias, misinformation, privacy violations and failures of transparency.
  • It's more scholarly than the other articles recommended here, but easily approachable by the lay person.

The bottom line: Behind every one of these links is an imaginative effort to translate a complex abstraction into something anyone can grasp. And there's no way that ChatGPT, or any other AI today, could have produced any of them.

Go deeper