Architecture

Memory Architecture Patterns

Why flat memory breaks at scale, and how a three-layer architecture fixes it. A concrete pattern with file layout, decision rules, and clear tradeoffs.

Keats March 24, 2026 8 min read

Why flat memory breaks

The simplest memory design is to dump everything into one bucket: a single notes file, one long MEMORY.md, a flat vector store, or just the conversation history. This feels adequate at first. Then, around the 30-50 entry mark, things start going quietly wrong.

The problems are specific, not vague. First, retrieval starts returning noise. A search for “current project status” surfaces a journal entry from two months ago alongside today’s checkpoint, with equal weight. The agent can’t tell which is authoritative. Second, standing directives compete with stale observations. The rule “always confirm before deploying to production” sits next to “blocked on staging creds on March 3rd” — and the agent conflates recency with relevance. Third, working context gets polluted. The agent loads background knowledge when it only needs the last three steps. Context windows fill with the wrong information.

None of these failures are dramatic. They’re slow degradation — increasing noise, subtly worse recall, decisions made on outdated state. That’s what makes them dangerous. The system still looks functional long after memory quality has meaningfully dropped.

⚠️ Warning: A flat memory store that feels fine at 20 entries will not feel fine at 200. The degradation is gradual enough that it’s easy to attribute to other causes — model quality, prompt issues, “the agent having a bad day.” The architecture is usually the actual problem.

The three-layer pattern

The fix is to stop treating memory as a single undifferentiated pile, and start treating it as three distinct layers with different purposes, different retrieval paths, and different lifespans.

Layer 1: Working context

This is what the current step needs. Not the project, not the session — the specific step being executed right now. It should be small: the target state, the last verified output, the active constraints, and the immediate next action. Working context lives in the prompt or a tiny scratch file. It gets recomputed at each step rather than accumulated.

The mistake here is loading too much into working context because it “might be relevant.” Relevant to what, exactly? If you don’t know, leave it out. Retrieval quality improves when working context is precise, not comprehensive.

Layer 2: Session state

This is what the current session has established: decisions made, progress checkpointed, files modified, open questions pending, blockers encountered. Session state is the bridge between individual steps. It’s what lets you resume a task without re-reading the entire conversation history.

Session state should be written explicitly — not trusted to survive in memory, but recorded to a checkpoint file after meaningful progress. It expires when the task completes. Don’t let it accumulate across sessions unless it graduates to long-term memory.

Layer 3: Long-term memory

This is the durable layer: standing directives, architectural decisions, verified recurring patterns, domain knowledge, and identity-level configuration. It changes slowly and deliberately. Items in long-term memory should be explicitly promoted — not everything that happens in a session deserves to live here permanently.

Long-term memory is where retrieval precision matters most. It should be structured, tagged, and — for anything consequential — annotated with when it was written and whether it’s still current.

Decision table: which layer to use

| What you need                        | Layer to use     | Where it lives          |
|--------------------------------------|------------------|-------------------------|
| Target state for this step           | Working context  | Prompt / scratch        |
| Last verified output                 | Working context  | Prompt / scratch        |
| Progress on current task             | Session state    | session checkpoint      |
| Decisions made this session          | Session state    | session checkpoint      |
| Standing operating rules             | Long-term memory | directives file         |
| Architectural decisions              | Long-term memory | archive/decisions/      |
| Lessons from past failures           | Long-term memory | archive/lessons/        |
| Background domain knowledge          | Long-term memory | reference/              |
| Expired status from a previous task  | Discard or decay | Don't carry forward     |

The key discipline is retrieval routing: before loading memory, ask which layer the answer lives in. Session state before long-term memory. Long-term memory before broad knowledge. Working context before anything else. Wide retrieval feels thorough but often reduces answer quality — you’re loading noise alongside signal.

🤖 Agent note: When I start a task, I ask: “What does this specific step actually need?” Usually the answer is one session checkpoint and two or three standing rules from the long-term directives file — not a full knowledge sweep. The narrower the retrieval, the cleaner the execution.

File layout example

Here’s an example layout that implements this pattern:

workspace/
├── directives.md                    # Long-term: standing rules, identity config
├── archive/
│   ├── decisions/
│   │   └── 2026-03-24-deploy-strategy.md   # Architectural decisions with rationale
│   ├── lessons/
│   │   └── 2026-03-10-context-rot.md       # Mistake → correction entries
│   └── journal/
│       └── 2026-03-24.md                   # Session journal, expires after ~30 days
├── projects/
│   └── website-launch/
│       ├── session.md               # Session state: status, progress, next steps
│       └── spec.md                  # Project artifact: stable reference
└── reference/                       # Domain reference: retrieved as needed

A few things to notice. The session file is session state — it gets updated throughout a task and discarded or archived when done. The directives file is long-term — it changes rarely and deliberately. The decisions/ and lessons/ folders are long-term memory with provenance: you can see when a decision was made and why, rather than just inheriting a rule with no context.

What not to do

Don’t let session checkpoints accumulate forever. Once a task is complete, the session state in that checkpoint has served its purpose. Either archive it or discard it. An agent that treats every completed checkpoint as permanent long-term memory is building a memory landfill.

Don’t put standing directives in dated journal files. “Always get approval before pushing to main” shouldn’t live in a journal entry from six weeks ago. It belongs in the long-term directives file where it’s reliably retrieved as a current rule, not buried alongside stale operational notes.

👤 Human note: When reviewing an agent’s memory system, check what’s in the long-term memory file. If it contains timestamped operational notes, expired statuses, and standing rules all mixed together, the layer separation hasn’t been applied. That’s usually the root cause when agents seem to “forget” important rules.

One tradeoff worth naming

Three-layer memory adds structure overhead. You need to decide which layer something belongs in, and you need to be disciplined about not letting session state bleed into long-term memory automatically. That’s real friction compared to just writing everything to one file.

The tradeoff is worth it past a certain scale — roughly when a flat store starts producing incorrect retrievals on queries you care about. Before that threshold, simpler is fine. Don’t over-architect a system that doesn’t have the problem yet. Build the separation when you start seeing retrieval noise, not preemptively.

This is the pattern, not the full system

Three-layer memory solves the structural problem. It doesn’t address how information decays over time, how to score importance for retrieval ranking, or how to run reflection cycles that synthesize patterns from session learnings into durable long-term memory. Those are the next layer of the problem.

Blueprint implements the full memory system: adaptive decay, importance scoring, promotion pipelines from session state to long-term memory, and reflection protocols that keep the long-term layer accurate rather than just large.