Agency Comes from the Model. An Agent Product = Model + Harness.
Before we write any code, one thing needs to be clear.
Agency -- the capacity to perceive, reason, and act -- comes from model training, not from external code orchestration. But a working agent product needs both the model and the harness. The model is the driver. The harness is the vehicle. This repository teaches you how to build the vehicle.
Where Agency Comes From
At the core of every agent is a neural network -- a Transformer, an RNN, a trained function -- shaped by billions of gradient updates on sequences of perception, reasoning, and action. Agency was never bestowed by the surrounding code. It was learned during training.
Humans are the original proof. A biological neural network, refined by millions of years of evolutionary pressure, perceives the world through senses, reasons through a brain, and acts through a body. When DeepMind, OpenAI, or Anthropic say "agent," they all mean the same core thing: a model that learned to act through training, plus the infrastructure that lets it operate in a specific environment.
The historical record is unambiguous:
2013 -- DeepMind DQN plays Atari. A single neural network, receiving only raw pixels and game scores, learned 7 Atari 2600 games -- surpassing prior algorithms and beating human experts in 3 of them. By 2015, scaled to 49 games at professional tester level, published in Nature. No game-specific rules. One model, learning from experience.
2019 -- OpenAI Five conquers Dota 2. Five neural networks played 45,000 years of Dota 2 against themselves over 10 months, then defeated OG -- the TI8 world champions -- 2-0 in a live match. In the public arena, the AI won 99.4% of 42,729 games. No scripted strategies. Models learned teamwork through self-play.
2019 -- DeepMind AlphaStar masters StarCraft II. AlphaStar beat a professional player 10-1 in closed matches, then reached Grandmaster rank on the European server -- top 0.15% of 90,000 players. An incomplete-information, real-time game with a combinatorial action space far exceeding chess or Go.
2019 -- Tencent Jueyu dominates Honor of Kings. Tencent AI Lab's "Jueyu" system defeated KPL professional players in full 5v5 at the World Champion Cup semifinal. In 1v1 mode, pros won just 1 out of 15 matches, lasting under 8 minutes at best. Training intensity: one day equaled 440 human years. A model that learned the entire game from scratch through self-play.
2024-2025 -- LLM agents reshape software engineering. Claude, GPT, Gemini -- large language models trained on the full breadth of human code and reasoning -- are deployed as coding agents. They read codebases, write implementations, debug failures, and coordinate as teams. The architecture is identical to every previous agent: a trained model, placed in an environment, given tools for perception and action.
Every milestone points to the same fact: Agency -- the ability to perceive, reason, and act -- is trained, not coded. But every agent also needs an environment to operate in: an Atari emulator, the Dota 2 client, the StarCraft II engine, an IDE and a terminal. The model supplies the intelligence. The environment supplies the action space. Together they form a complete agent.
Scope
This repository is a 0-to-1 harness engineering learning project: it teaches how to build the working environment around an agent model. To keep the learning path clear, some production mechanisms are intentionally simplified or omitted:
Full event / hook bus behavior, such as
PreToolUse,SessionStart/End, andConfigChange. The teaching code uses minimal lifecycle events where needed.Rule-based permission governance and full trust workflows.
Session lifecycle controls such as resume/fork, plus more complete worktree lifecycle handling.
Full MCP runtime details such as transport, OAuth, resource subscription, and polling.
The JSONL mailbox protocol in this repository is a teaching implementation, not a claim about any specific production internal implementation.
Progressive Lessons
Each lesson adds one harness mechanism. Each mechanism has a motto.
s01 "One loop & Bash is all you need" — one tool + one loop = one agent
s02 "Adding a tool means adding one handler" — the loop stays untouched; new tools register into the dispatch map
s03 "Set boundaries first, then grant freedom" — check what can run, what must stop, and what needs approval
s04 "Hook around the loop, never rewrite the loop" — add extension points without changing the main loop
s05 "An agent without a plan drifts" — list the steps before starting; completion rate doubles
s06 "Big tasks split small, each subtask gets clean context" — subagents do the side work and bring back only the result
s07 "Load knowledge on demand, not upfront" — list skills first, expand them only when needed
s08 "Context always fills up -- have a way to make room" — multi-layer compaction strategies buy you infinite sessions
s09 "Remember what matters, forget what doesn't" — three subsystems: selection, extraction, consolidation
s10 "Prompts are assembled at runtime, not hardcoded" — section-based concatenation, loaded on demand
s11 "Errors aren't the end, they're the start of a retry" — retry, make room, or take another path when things fail
s12 "Big goals break into small tasks, ordered, persisted to disk" — a file-backed task graph that lays the groundwork for multi-agent coordination
s13 "Slow ops go background, agent keeps thinking" — background threads run commands; notifications inject on completion
s14 "Fire on schedule, no human kick needed" — trigger tasks automatically by time
s15 "Too big for one agent -- delegate to teammates" — persistent teammates + async mailboxes
s16 "Teammates need shared communication rules" — use a fixed request-reply format for coordination
s17 "Teammates check the board, claim work themselves" — no leader assigning one by one; self-organizing
s18 "Each works in its own directory, no interference" — tasks own goals, worktrees own directories, bound by ID
s19 "Not enough capability? Plug in more via MCP" — connect external tools into the same tool pool
s20 "Many mechanisms, one loop" — all previous mechanisms return to one complete harness
Learning Path
Main line: act → handle complex work → remember and recover → run long tasks → collaborate → extend and assemble.