Three agents, two sessions, one bug I couldn't remember fixing
How Beads (the graph-shaped task tracker) + optional semantic memory beats scrolling chat until your thumb hurts. Real patterns, honest limits.
On this page
It’s Tuesday night. I’ve scrolled the chat until my thumb hurts, and I still can’t tell if task B is unblocked or we only talked about it. Three different agent sessions have touched this codebase. One of them definitely fixed a bug. I’m pretty sure I fixed the same bug again today. This is not sophisticated—this is chaos pretending to be context.
I needed machine-checkable what’s next, not a bigger context window. That led me to Beads.
Two layers of forgetting (and how they’re different)
Merging task tracking with memory retrieval feels obvious until you build it. Then you realize they solve different problems:
| Kind | What breaks | Example scenario | What helps |
|---|---|---|---|
| Workflow memory | Don’t know what’s ready, blocked, or already closed | Rewrote auth across 3 sessions; session 2 closed the token handler; session 3 reopened it anyway | Beads—structured graph with bd ready, deps |
| Semantic memory | Don’t remember why code looks this way | Opened a validation helper months later; forgot it was defensive against race conditions | Issue text, Git commit messages, ADRs—plus retrieval tools if you wire them |
I used to treat these as one problem. They’re not. Beads is how I stop turning “the plan” into archaeology in a chat buffer. ADRs and Git are how I stop re-learning why the code exists.
Beads (bd) is the graph-shaped issue tracker on Dolt I reach for first. State lives under
.beads/. Docs are at gastownhall.github.io/beads. It still doesn’t answer why the code looks this way when I reopen a file months later, so I stack optional MCP or SDK retrieval on the same habits. This is one arc with operational detail kept in, no pretend war story from a job I didn’t have.
What Beads stores (facts, not vibes)
- Engine: Dolt—versioned structured data; embedded default lives under
.beads/embeddeddolt/per upstream docs. - IDs: Hash-style ids (e.g.
bd-a1b2) so branches and agents collide less often. - Graph links:
relates_to,duplicates,supersedes—the shapes that makebd readyactually work.
Here’s how the pieces talk to each other:
View diagram source
graph LR
Agent["Agent / Developer"]
Beads["Beads<br/>(Dolt state)"]
Git["Git / Remote"]
Agent -->|"bd create/update"| Beads
Beads -->|"auto-commit"| Git
Beads -->|"bd ready --json"| Agent
Agent -->|"bd dolt push"| Git
Git -->|"bd dolt pull"| Beads
style Beads fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
style Git fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
style Agent fill:#181825,stroke:#84ffff,color:#ecfdf5 Critical pitfall: If you never bd close a task, bd ready becomes fiction. I’ve lied to myself this way more times than I’d admit. The database doesn’t forgive it.
A real morning with Beads (before and after)
Without bd ready (the old way):
09:00 AM: Scroll chat. “Did we finish the token handler?” Thread goes back to yesterday’s session. Maybe it’s done. Session 1 mentions it. Session 2 says “looks good.” Session 3 (today) has me re-explaining validation logic I wrote in session 1. Thirty minutes of context rebuilding.
09:35 AM: Start coding. Make the change. Realize halfway through: this was already done. Undo. Apologize to the model. Waste confirmed.
With bd ready (the real way):
09:00 AM: Open terminal.
$ bd ready --json
[
{
"id": "bd-7f2a",
"title": "Add refresh-token rotation",
"status": "ready",
"claimed_by": null,
"blockers": []
}
]
09:01 AM: Exactly one thing. Claim it.
$ bd update bd-7f2a --claim --json
{
"id": "bd-7f2a",
"claimed_by": "agent-session-3",
"updated_at": "2026-04-10T09:01:22Z"
}
09:02 AM: Check memory if needed. Work. Close it.
$ bd close bd-7f2a --reason "refresh-token rotation complete; added Redis entry expiry" --json
{
"id": "bd-7f2a",
"status": "closed",
"closed_at": "2026-04-10T09:35:12Z"
}
10:00 AM: Session 4 runs. Checks bd ready. That task doesn’t appear because it’s closed. No double work. No archaeology.
View diagram source
graph TD
Start["START: Session begins"]
Query["bd ready --json"]
Empty{Empty?}
Done["Done for now<br/>or new epic"]
Pick["Pick top item"]
Claim["bd update --claim"]
Work["Ship + tests + notes"]
Close["bd close --reason"]
Push["bd dolt push<br/>optional"]
End["END"]
Start --> Query
Query --> Empty
Empty -->|Yes| Done
Empty -->|No| Pick
Pick --> Claim
Claim --> Work
Work --> Close
Close --> Push
Push --> End
style Start fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
style End fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
style Claim fill:#181825,stroke:#84ffff,color:#ecfdf5
style Close fill:#181825,stroke:#84ffff,color:#ecfdf5 Upstream AGENT_INSTRUCTIONS.md spells it out: do not use interactive bd edit in automation. Use bd update with flags or stdin. That’s the contract with your future self.
Lesson 1: “the lying graph”
I lost 45 minutes reopening a token-rotation bug that was already fixed. Here’s why: I claimed a task in session 1, shipped it, forgot to close it. Session 2 saw it still claimed. Session 3 (today) saw it still claimed and assumed it was unfinished. Opened it again. Recognized the work halfway through. Wasted time digging through chat to find where I closed it (I hadn’t).
The lesson: bd close --reason "..." is not decoration. It’s a contract with future-you that says “this is done, here’s why I closed it.” Skip it, and your task graph becomes a liar.
Good close reason:
bd close bd-7f2a --reason "token rotation done; Redis TTL synced with JWT exp; added tests in auth.spec.ts"
Bad close reason:
bd close bd-7f2a --reason "done"
Why I added semantic memory on top
Beads answers what’s next. It does not magically surface why the code exists unless you wrote it down somewhere the agent can query. So I keep Git + ADRs as truth and, when the pain hits, I add one retrieval path—MCP or self-hosted memory—that can search short notes or code chunks.
Here’s the decision tree I use:
View diagram source
graph TD
Q1["Project duration<br/>less than 1 month?"]
Q2["Asked 'why is this here'<br/>more than 2 times?"]
Skip["Skip memory<br/>for now"]
Add["Add one MCP<br/>or memory layer"]
Q1 -->|Yes| Skip
Q1 -->|No| Q2
Q2 -->|Yes| Add
Q2 -->|No| Skip
style Skip fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
style Add fill:#181825,stroke:#84ffff,color:#ecfdf5 Good memory entry format:
beads_id: bd-7f2a
title: Token rotation implementation
paths: src/auth/token-handler.ts, src/cache/redis.ts
decisions:
- Refresh token TTL = 7 days (business requirement, expires before JWT)
- Redis entry expires autonomously (SET with EX flag, not manual cleanup)
- Session invalidation on logout clears both JWT and refresh token
references: [OWASP session management](https://cheatsheetseries.owasp.org/), #bd-6c1f (previous session token bug)
Bad memory entry format:
we did the token thing and made it work and also i learned about redis ttl
and like the owasp stuff and i think we should maybe also look at the
auth flow because there's probably other issues there too
The first one is searchable. The second one is noise.
View diagram source
graph LR
Session["Agent session"]
Beads["Beads<br/>what is ready?"]
Memory["MCP / Memory<br/>fuzzy recall"]
Git["Git + ADRs<br/>shipped truth"]
Session --> Beads
Session --> Memory
Session --> Git
Beads -.->|structured| Git
Memory -.->|approximate| Git
style Beads fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
style Memory fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
style Git fill:#181825,stroke:#84ffff,color:#ecfdf5 Retrieval is approximate. bd ready is structured. I need both labels in my head or I expect the wrong thing from each.
Lesson 2: “the Rube Goldberg memory”
I started with three MCP servers: one for code search, one for memory, one for changelog scraping. The overhead of keeping them synced ate the benefit. After two weeks, I was writing memory entries to two places and verifying in a third. Abandoned by week 3.
Checklist for adding a second MCP:
- First MCP is actually being called (check logs)
- Its results are improving decisions more than Git log does
- The new MCP doesn’t duplicate search surface (e.g., don’t add a changelog MCP if Git already has that)
- I can maintain the server config in
AGENTS.md(tool names, not fake handwaving)
My rule now: one memory MCP, one code MCP, done. Git wins ties.
The AGENTS.md pattern (how tools become real)
I iterated this three times. Version 1 had memory tool names that didn’t match the server. Agents never called them. Version 2 added them correctly but without context. Version 3 added the loop and preconditions. Here’s v3:
## Task source of truth
- Before choosing work: `bd ready --json`.
- Claim: `bd update <id> --claim --json`.
- Finish: `bd close <id> --reason "..." --json`.
- Never `bd edit`; use `bd update` + flags or stdin/body-file.
## Recall (real MCP tool names from `mcp list`)
- After claim: `memory_search` with issue title, epic name, keywords.
- If editing a path: `code_search` for that path or symbol.
- If nothing useful: use `git log -- path` and `bd show <id>`.
## Write-back (short + linked)
- On close: append note with `beads_id`, paths touched, decisions, follow-ups.
- Keep entries under 300 words (noise >= signal if longer).
## Order (never skip steps)
1. bd ready
2. claim
3. recall (if first time touching this epic)
4. implement
5. close + memory write
6. optional bd dolt push
Critical lesson: If tools aren’t named in AGENTS.md, the model won’t call them even if they exist. I watched this happen. The tool is there. The MCP is running. The model has access. But without the tool name in a context block the model reads, it gets “lost” and resorts to chat.
How a session actually flows (template, not fiction)
Step 1: Ready query
$ bd ready --json | jq '.' | head -5
[
{
"id": "bd-7f2a",
"title": "Add refresh-token rotation",
"status": "ready",
"blockers": []
}
]
Step 2: Claim
$ bd update bd-7f2a --claim --json
{
"id": "bd-7f2a",
"claimed_by": "agent-session",
"updated_at": "2026-04-10T09:01:22Z"
}
Step 3: Recall (if needed)
$ memory_search "token rotation refresh jwt"
[
{
"beads_id": "bd-6c1f",
"title": "Previous session token bug (expired JWTs not cleaned)",
"excerpt": "Refresh token TTL = 7 days; set Redis TTL with EX flag..."
}
]
Step 4: Implement (work happens here)
Step 5: Close with reason
$ bd close bd-7f2a --reason "token rotation complete; added Redis TTL sync; tests passing" --json
{
"id": "bd-7f2a",
"status": "closed",
"closed_at": "2026-04-10T09:35:12Z"
}
Step 6: Memory write (if not trivial)
beads_id: bd-7f2a
title: Refresh token rotation
paths: src/auth/token-handler.ts, src/cache/redis.ts
decisions:
- Refresh TTL = 7 days (JWT + buffer)
- Redis SET with EX flag (auto-cleanup, no manual overhead)
- Logout clears both tokens atomically
references: bd-6c1f (prior token cleanup bug)
How to come back cold to a file
Session 4 opens an old feature. Here’s the pattern:
Search Beads:
bd list --filter "epic:auth" --json
Show detail:
bd show bd-7f2a
Query memory with beads_id:
memory_search "beads_id:bd-7f2a"
Fall back to Git truth:
git log --oneline -- src/auth/token-handler.ts
File a small issue with pointers (memory snippet, path:line), not a mega blob.
Anti-patterns (and how I learned them the hard way)
- Dumping the repo into memory: Tried it. Memory search returned noise. Now I write issue-scoped bullets with
beads_id+ paths. Takes 30 seconds more, worth 10x more. - Skipping
bd close: Forgot to close a task once.bd readylied to the next session. Wasted an hour. Never again. - Running Jira + Beads + markdown: Three sources of truth = zero sources of truth. Pick one. I pick Beads.
- Treating embeddings as compliance truth: Retrieval is a hypothesis. Git + ADRs are the audit trail. Verify retrieval against them.
- Memory entries longer than 300 words: Defeats the purpose. If it’s that complex, file a new Beads issue instead.
Milestones (how to know if it’s working)
| Week | Signal |
|---|---|
| 1 | Every session starts bd ready --json; every done task gets bd close with reason |
| 2 | AGENTS.md names real MCP tools (not placeholders); recall before big edits |
| 3 | Memory lines include beads_id + paths + decisions (not rambling prose) |
| 4 | Old features can be reopened via Beads + memory + Git; no re-explaining |
Three things to steal today
Move 1: Start with bd ready
Stop inferring priorities from chat. Ask the database. If bd ready is empty, you’re either done or you never modeled blockers. Knowing which one is half the battle.
bd ready --json
Move 2: Close with reason
bd close <id> --reason "..." is not decoration. It’s a contract with future-you. Good reasons are: what changed, why it changed, where to find the diff. Write it.
Move 3: One memory + one code search, nothing more
Wire them into AGENTS.md with real tool names (not placeholders). One MCP per category. Git wins ties. Treat retrieval as a hypothesis, not law.
If you take nothing else: run bd ready before you argue with the model about priorities, and treat semantic hits as hypotheses you verify in Git. Everything else—extra MCPs, compaction, fancy graphs—is seasoning. I’m still calibrating how much memory write-back is worth the maintenance. The honest status is: it helps when you keep entries short and link IDs.