By Han Fang - Part of the Tokens for Thoughts series
Every mature science has its central dogma — a foundational claim so deeply embedded that practitioners forget it's even there. Biology has DNA → RNA → Protein. Thermodynamics has entropy. What is ours?
If you prefer listening over reading, you can check out the audio version generated by NotebookLM.
If you prefer listening over reading, you can check out the audio version generated by NotebookLM.
Not a definition of intelligence — those are abundant and mostly useless. This is a claim about the mechanism. The arrow that connects raw experience to capable behavior. And like Crick's original formulation in molecular biology, it turns out to be both more productive and more subtle than it first appears.
The idea that compression and intelligence are deeply linked has a long pedigree — from Shannon's information theory through Solomonoff's formalization of induction to Hutter's mathematical proof that optimal decision-making reduces to optimal compression. For decades, this was beautiful theory with limited empirical traction. Then came large language models, and suddenly the theory had teeth. But we've mostly treated this as a useful intuition rather than what I think it actually is — a dogma, in the scientific sense. Something foundational enough to build on, and constraining enough to be worth taking seriously.
In biology, the central dogma derives its force from three concrete artifacts — DNA, RNA, Protein — and a claim about the directional flow of information between them. Sequential information flows from nucleic acid to protein, never the reverse.
The AI equivalents:
Data exists on disk — terabytes of text, images, interaction logs. Like DNA, it's the raw blueprint, the store of everything the system could potentially learn.
Weights exist in the model — billions of parameters shaped by optimization. Like RNA, they're the intermediate representation: a compressed encoding of the data's statistical structure, transcribed into a form that can be acted upon.
Behavior is the observable output — the predictions, the generated text, the actions taken. Like protein, it's where the information finally becomes functional. It's what the system does.
And the directionality claim holds: **information flows from data into weights into behavior, and is lost at each step.**You cannot fully recover the training data from the weights. You cannot recover the weights from the behavior. Compression is irreversible, and the losses propagate forward.
This describes what most modern AI systems are doing, regardless of architecture or objective. Supervised learning compresses labeled examples into decision boundaries. Pre-training compresses internet-scale text into a geometry of token predictions. RL compresses trajectories of reward into policies. Diffusion models compress the structure of natural images into learned score functions. Different data, different compression, different behavior. Same flow.
The empirical evidence bears this out. Kaplan et al. (2020) and Hoffmann et al. (2022) showed that model performance follows remarkably smooth power laws as you increase compute and data — trends spanning seven orders of magnitude. What are scaling laws, really? They're a quantitative measurement of how compression quality improves with resources. The smoothness of these curves is itself strong evidence that compression is a productive lens for understanding what these systems are doing. Hutter even put money on the idea — his Hutter Prize offers €500,000 for better lossless compression of Wikipedia, on the premise that compressing human knowledge more efficiently means building a smarter model of the world.
But the real force of a central dogma is never in what it says can happen. It's in what it says cannot.
Crick's deepest insight wasn't that DNA makes RNA makes protein — it was that the arrow is irreversible. Once sequential information has passed into protein, it cannot flow back to nucleic acid. That's what made it a dogma rather than a diagram.