Three agents, two sessions, one bug I couldn't remember fixing

Three agents, two sessions, one bug I couldn't remember fixing

How Beads (the graph-shaped task tracker) + optional semantic memory beats scrolling chat until your thumb hurts. Real patterns, honest limits.


On this page

It’s Tuesday night. I’ve scrolled the chat until my thumb hurts, and I still can’t tell if task B is unblocked or we only talked about it. Three different agent sessions have touched this codebase. One of them definitely fixed a bug. I’m pretty sure I fixed the same bug again today. This is not sophisticated—this is chaos pretending to be context.

I needed machine-checkable what’s next, not a bigger context window. That led me to Beads.

Two layers of forgetting (and how they’re different)

Merging task tracking with memory retrieval feels obvious until you build it. Then you realize they solve different problems:

KindWhat breaksExample scenarioWhat helps
Workflow memoryDon’t know what’s ready, blocked, or already closedRewrote auth across 3 sessions; session 2 closed the token handler; session 3 reopened it anywayBeads—structured graph with bd ready, deps
Semantic memoryDon’t remember why code looks this wayOpened a validation helper months later; forgot it was defensive against race conditionsIssue text, Git commit messages, ADRs—plus retrieval tools if you wire them

I used to treat these as one problem. They’re not. Beads is how I stop turning “the plan” into archaeology in a chat buffer. ADRs and Git are how I stop re-learning why the code exists.

Beads (bd) is the graph-shaped issue tracker on Dolt GitHub stars I reach for first. State lives under .beads/. Docs are at gastownhall.github.io/beads. It still doesn’t answer why the code looks this way when I reopen a file months later, so I stack optional MCP or SDK retrieval on the same habits. This is one arc with operational detail kept in, no pretend war story from a job I didn’t have.

What Beads stores (facts, not vibes)

  • Engine: Dolt—versioned structured data; embedded default lives under .beads/embeddeddolt/ per upstream docs.
  • IDs: Hash-style ids (e.g. bd-a1b2) so branches and agents collide less often.
  • Graph links: relates_to, duplicates, supersedes—the shapes that make bd ready actually work.

Here’s how the pieces talk to each other:

View diagram source
graph LR
  Agent["Agent / Developer"]
  Beads["Beads<br/>(Dolt state)"]
  Git["Git / Remote"]
  
  Agent -->|"bd create/update"| Beads
  Beads -->|"auto-commit"| Git
  Beads -->|"bd ready --json"| Agent
  Agent -->|"bd dolt push"| Git
  Git -->|"bd dolt pull"| Beads
  
  style Beads fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
  style Git fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
  style Agent fill:#181825,stroke:#84ffff,color:#ecfdf5

Critical pitfall: If you never bd close a task, bd ready becomes fiction. I’ve lied to myself this way more times than I’d admit. The database doesn’t forgive it.

A real morning with Beads (before and after)

Without bd ready (the old way):

09:00 AM: Scroll chat. “Did we finish the token handler?” Thread goes back to yesterday’s session. Maybe it’s done. Session 1 mentions it. Session 2 says “looks good.” Session 3 (today) has me re-explaining validation logic I wrote in session 1. Thirty minutes of context rebuilding.

09:35 AM: Start coding. Make the change. Realize halfway through: this was already done. Undo. Apologize to the model. Waste confirmed.

With bd ready (the real way):

09:00 AM: Open terminal.

$ bd ready --json
[
  {
    "id": "bd-7f2a",
    "title": "Add refresh-token rotation",
    "status": "ready",
    "claimed_by": null,
    "blockers": []
  }
]

09:01 AM: Exactly one thing. Claim it.

$ bd update bd-7f2a --claim --json
{
  "id": "bd-7f2a",
  "claimed_by": "agent-session-3",
  "updated_at": "2026-04-10T09:01:22Z"
}

09:02 AM: Check memory if needed. Work. Close it.

$ bd close bd-7f2a --reason "refresh-token rotation complete; added Redis entry expiry" --json
{
  "id": "bd-7f2a",
  "status": "closed",
  "closed_at": "2026-04-10T09:35:12Z"
}

10:00 AM: Session 4 runs. Checks bd ready. That task doesn’t appear because it’s closed. No double work. No archaeology.

View diagram source
graph TD
  Start["START: Session begins"]
  Query["bd ready --json"]
  Empty{Empty?}
  Done["Done for now<br/>or new epic"]
  Pick["Pick top item"]
  Claim["bd update --claim"]
  Work["Ship + tests + notes"]
  Close["bd close --reason"]
  Push["bd dolt push<br/>optional"]
  End["END"]
  
  Start --> Query
  Query --> Empty
  Empty -->|Yes| Done
  Empty -->|No| Pick
  Pick --> Claim
  Claim --> Work
  Work --> Close
  Close --> Push
  Push --> End
  
  style Start fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
  style End fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
  style Claim fill:#181825,stroke:#84ffff,color:#ecfdf5
  style Close fill:#181825,stroke:#84ffff,color:#ecfdf5

Upstream AGENT_INSTRUCTIONS.md spells it out: do not use interactive bd edit in automation. Use bd update with flags or stdin. That’s the contract with your future self.

Lesson 1: “the lying graph”

I lost 45 minutes reopening a token-rotation bug that was already fixed. Here’s why: I claimed a task in session 1, shipped it, forgot to close it. Session 2 saw it still claimed. Session 3 (today) saw it still claimed and assumed it was unfinished. Opened it again. Recognized the work halfway through. Wasted time digging through chat to find where I closed it (I hadn’t).

The lesson: bd close --reason "..." is not decoration. It’s a contract with future-you that says “this is done, here’s why I closed it.” Skip it, and your task graph becomes a liar.

Good close reason:

bd close bd-7f2a --reason "token rotation done; Redis TTL synced with JWT exp; added tests in auth.spec.ts"

Bad close reason:

bd close bd-7f2a --reason "done"

Why I added semantic memory on top

Beads answers what’s next. It does not magically surface why the code exists unless you wrote it down somewhere the agent can query. So I keep Git + ADRs as truth and, when the pain hits, I add one retrieval path—MCP or self-hosted memory—that can search short notes or code chunks.

Here’s the decision tree I use:

View diagram source
graph TD
  Q1["Project duration<br/>less than 1 month?"]
  Q2["Asked 'why is this here'<br/>more than 2 times?"]
  Skip["Skip memory<br/>for now"]
  Add["Add one MCP<br/>or memory layer"]
  
  Q1 -->|Yes| Skip
  Q1 -->|No| Q2
  Q2 -->|Yes| Add
  Q2 -->|No| Skip
  
  style Skip fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
  style Add fill:#181825,stroke:#84ffff,color:#ecfdf5

Good memory entry format:

beads_id: bd-7f2a
title: Token rotation implementation
paths: src/auth/token-handler.ts, src/cache/redis.ts
decisions:
  - Refresh token TTL = 7 days (business requirement, expires before JWT)
  - Redis entry expires autonomously (SET with EX flag, not manual cleanup)
  - Session invalidation on logout clears both JWT and refresh token
references: [OWASP session management](https://cheatsheetseries.owasp.org/), #bd-6c1f (previous session token bug)

Bad memory entry format:

we did the token thing and made it work and also i learned about redis ttl 
and like the owasp stuff and i think we should maybe also look at the 
auth flow because there's probably other issues there too

The first one is searchable. The second one is noise.

View diagram source
graph LR
  Session["Agent session"]
  Beads["Beads<br/>what is ready?"]
  Memory["MCP / Memory<br/>fuzzy recall"]
  Git["Git + ADRs<br/>shipped truth"]
  
  Session --> Beads
  Session --> Memory
  Session --> Git
  
  Beads -.->|structured| Git
  Memory -.->|approximate| Git
  
  style Beads fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
  style Memory fill:#1e1e2e,stroke:#84ffff,color:#ecfdf5
  style Git fill:#181825,stroke:#84ffff,color:#ecfdf5

Retrieval is approximate. bd ready is structured. I need both labels in my head or I expect the wrong thing from each.

Lesson 2: “the Rube Goldberg memory”

I started with three MCP servers: one for code search, one for memory, one for changelog scraping. The overhead of keeping them synced ate the benefit. After two weeks, I was writing memory entries to two places and verifying in a third. Abandoned by week 3.

Checklist for adding a second MCP:

  • First MCP is actually being called (check logs)
  • Its results are improving decisions more than Git log does
  • The new MCP doesn’t duplicate search surface (e.g., don’t add a changelog MCP if Git already has that)
  • I can maintain the server config in AGENTS.md (tool names, not fake handwaving)

My rule now: one memory MCP, one code MCP, done. Git wins ties.

The AGENTS.md pattern (how tools become real)

I iterated this three times. Version 1 had memory tool names that didn’t match the server. Agents never called them. Version 2 added them correctly but without context. Version 3 added the loop and preconditions. Here’s v3:

## Task source of truth
- Before choosing work: `bd ready --json`.
- Claim: `bd update <id> --claim --json`.
- Finish: `bd close <id> --reason "..." --json`.
- Never `bd edit`; use `bd update` + flags or stdin/body-file.

## Recall (real MCP tool names from `mcp list`)
- After claim: `memory_search` with issue title, epic name, keywords.
- If editing a path: `code_search` for that path or symbol.
- If nothing useful: use `git log -- path` and `bd show <id>`.

## Write-back (short + linked)
- On close: append note with `beads_id`, paths touched, decisions, follow-ups.
- Keep entries under 300 words (noise >= signal if longer).

## Order (never skip steps)
1. bd ready
2. claim
3. recall (if first time touching this epic)
4. implement
5. close + memory write
6. optional bd dolt push

Critical lesson: If tools aren’t named in AGENTS.md, the model won’t call them even if they exist. I watched this happen. The tool is there. The MCP is running. The model has access. But without the tool name in a context block the model reads, it gets “lost” and resorts to chat.

How a session actually flows (template, not fiction)

Step 1: Ready query

$ bd ready --json | jq '.' | head -5
[
  {
    "id": "bd-7f2a",
    "title": "Add refresh-token rotation",
    "status": "ready",
    "blockers": []
  }
]

Step 2: Claim

$ bd update bd-7f2a --claim --json
{
  "id": "bd-7f2a",
  "claimed_by": "agent-session",
  "updated_at": "2026-04-10T09:01:22Z"
}

Step 3: Recall (if needed)

$ memory_search "token rotation refresh jwt"
[
  {
    "beads_id": "bd-6c1f",
    "title": "Previous session token bug (expired JWTs not cleaned)",
    "excerpt": "Refresh token TTL = 7 days; set Redis TTL with EX flag..."
  }
]

Step 4: Implement (work happens here)

Step 5: Close with reason

$ bd close bd-7f2a --reason "token rotation complete; added Redis TTL sync; tests passing" --json
{
  "id": "bd-7f2a",
  "status": "closed",
  "closed_at": "2026-04-10T09:35:12Z"
}

Step 6: Memory write (if not trivial)

beads_id: bd-7f2a
title: Refresh token rotation
paths: src/auth/token-handler.ts, src/cache/redis.ts
decisions:
  - Refresh TTL = 7 days (JWT + buffer)
  - Redis SET with EX flag (auto-cleanup, no manual overhead)
  - Logout clears both tokens atomically
references: bd-6c1f (prior token cleanup bug)

How to come back cold to a file

Session 4 opens an old feature. Here’s the pattern:

Search Beads:

bd list --filter "epic:auth" --json

Show detail:

bd show bd-7f2a

Query memory with beads_id:

memory_search "beads_id:bd-7f2a"

Fall back to Git truth:

git log --oneline -- src/auth/token-handler.ts

File a small issue with pointers (memory snippet, path:line), not a mega blob.

Anti-patterns (and how I learned them the hard way)

  • Dumping the repo into memory: Tried it. Memory search returned noise. Now I write issue-scoped bullets with beads_id + paths. Takes 30 seconds more, worth 10x more.
  • Skipping bd close: Forgot to close a task once. bd ready lied to the next session. Wasted an hour. Never again.
  • Running Jira + Beads + markdown: Three sources of truth = zero sources of truth. Pick one. I pick Beads.
  • Treating embeddings as compliance truth: Retrieval is a hypothesis. Git + ADRs are the audit trail. Verify retrieval against them.
  • Memory entries longer than 300 words: Defeats the purpose. If it’s that complex, file a new Beads issue instead.

Milestones (how to know if it’s working)

WeekSignal
1Every session starts bd ready --json; every done task gets bd close with reason
2AGENTS.md names real MCP tools (not placeholders); recall before big edits
3Memory lines include beads_id + paths + decisions (not rambling prose)
4Old features can be reopened via Beads + memory + Git; no re-explaining

Three things to steal today

Move 1: Start with bd ready

Stop inferring priorities from chat. Ask the database. If bd ready is empty, you’re either done or you never modeled blockers. Knowing which one is half the battle.

bd ready --json

Move 2: Close with reason

bd close <id> --reason "..." is not decoration. It’s a contract with future-you. Good reasons are: what changed, why it changed, where to find the diff. Write it.

Move 3: One memory + one code search, nothing more

Wire them into AGENTS.md with real tool names (not placeholders). One MCP per category. Git wins ties. Treat retrieval as a hypothesis, not law.


If you take nothing else: run bd ready before you argue with the model about priorities, and treat semantic hits as hypotheses you verify in Git. Everything else—extra MCPs, compaction, fancy graphs—is seasoning. I’m still calibrating how much memory write-back is worth the maintenance. The honest status is: it helps when you keep entries short and link IDs.

Thread

0
⌘/Ctrl+Enter to sendType / for commands · Tab to @mention