Building Memory for an AI Agent That Forgets Everything

The Problem: Every Session Starts from Zero

LLM-based agents have a fundamental limitation: they have no memory between sessions. Every conversation starts from scratch. The model has no idea what you discussed yesterday or what decisions were made last week. It doesn't know what project you've been grinding on for a month. The context window is a blank slate.

For a throwaway chatbot, that's fine. But for a personal AI assistant that manages your calendar, drafts emails, and tracks projects? It's a dealbreaker. Imagine hiring an assistant who develops amnesia every night. Every morning you'd re-explain who you are and what happened yesterday. That's the default state of every LLM agent today.

So: how do you give continuity to something that forgets everything? You don't wait for models to develop native long-term memory. You build it yourself, as a system around the model.

Architecture Overview: Three Layers of Memory

I organize the memory system into three layers. Each one has a different retention horizon and level of curation, loosely inspired by how human memory works.

Layer 1: Daily Capture Logs

Every session produces a daily log file at memory/Logs/YYYY-MM-DD.md. These are raw, append-only records of everything that happened: transcripts, events, decisions, errors, observations. Think of it as a journal. Unfiltered, chronological, complete. Nothing is discarded at this stage.

Layer 2: Curated Project & Area Docs

Structured knowledge documents live in memory/Projects/ and memory/Areas/. Each project or area gets its own Markdown file with the current state: goals, decisions, open questions, key contacts, next steps. These are living documents, updated regularly, not just appended to.

Layer 3: Long-Term Memory

The top layer is MEMORY.md, a distilled file of durable facts, preferences, and rules that span all projects. Things like "the user prefers Terraform over CloudFormation" or "never deploy on Fridays." This file gets loaded into context at the start of every session.

On top of these three layers, three identity files give the agent its operating context:

SOUL.md defines personality, communication style, behavioral rules
USER.md profiles the person being helped (role, preferences, schedule, tools)
AGENTS.md sets operational rules and tool permissions

memory/
├── Logs/
│   ├── 2026-03-18.md
│   ├── 2026-03-19.md
│   └── 2026-03-20.md
├── Projects/
│   ├── website-redesign.md
│   └── api-migration.md
├── Areas/
│   ├── infrastructure.md
│   └── hiring.md
├── MEMORY.md
├── SOUL.md
├── USER.md
└── AGENTS.md

The Capture → Retain → Promote Pipeline

I borrowed from Tiago Forte's PARA method (Projects, Areas, Resources, Archives) here. It was designed for humans to organize knowledge, but the same principles work for an AI agent: capture broadly, organize by actionability, promote only what lasts.

Capture

Every session logs raw events to the daily file. The agent appends to memory/Logs/YYYY-MM-DD.md as it works. Low effort, high volume. Decisions, errors, user preferences, task completions. The bar for inclusion is low: if in doubt, log it.

Retain

At the end of each session (or day), the agent pulls out 2–5 high-signal items from the raw log. Each one gets a typed prefix so downstream processing stays reliable:

Decision: a choice that was made and should be remembered
Fact: information that won't change soon
Rule: a behavioral constraint or preference
Todo: an action item still pending
Risk: something that could go wrong

Promote

A nightly curation job reviews daily logs and moves items to their permanent home. Cross-project items like preferences and rules go to MEMORY.md or USER.md. Project-specific items (decisions, next steps, risks) land in the relevant project doc under memory/Projects/.

Prune

Not everything gets promoted. Operational noise ("ran the build," "checked email," "fixed a typo") stays in the daily log and naturally ages out. Daily logs are kept for reference but never loaded into context after 30 days. Only durable knowledge survives.

Memory Retrieval: Hybrid Search

A well-organized file system is necessary but not sufficient. The agent also needs to search its memory efficiently. You can't load everything into context. Even a few weeks of logs would blow past any context window. So we use hybrid retrieval.

Embeddings

All memory files are chunked and embedded using VoyageAI's voyage-3 model. Embeddings capture semantic meaning, so a search for "deployment process" also finds paragraphs about "pushing to production" or "release workflow" even when those exact words aren't used.

Why VoyageAI over self-hosted models? Solutions like running your own embedding model require a GPU and a beefier server — which defeats the purpose of a $24/month setup. VoyageAI is state-of-the-art (it's what Anthropic uses internally), has a generous free tier that's more than enough for personal use, and keeps the architecture simple: an API call during indexing, then everything else is local. No GPU, no model weights, no CUDA drivers. Just better embeddings at a fraction of the cost.

Hybrid Ranking

Retrieval combines two signals: 70% vector similarity (semantic match) and 30% text matching (keyword overlap). Pure vector search misses exact names and identifiers. Pure text search misses conceptual relationships. The hybrid approach gets the best of both.

Diversity & Temporal Decay

MMR (Maximal Marginal Relevance) keeps search results diverse instead of returning a cluster of near-duplicate paragraphs. We also apply a 30-day temporal decay that boosts recent memories. A decision from yesterday is more likely relevant than one from three months ago. The decay is gentle: old memories don't disappear, they just rank lower.

Implementation

The index is backed by SQLite. No external vector database needed. Embeddings are stored as BLOBs alongside the original text chunks and metadata (file path, date, type). The agent runs memory_search before answering any question about prior work, decisions, or preferences. If it's not in the search results or the loaded context files, the agent says so rather than hallucinating.

The Nightly Curation Flow

Memory curation runs as an automated cron job at 02:00 UTC every night. This is the most important piece of the system. It's what turns raw logs into durable knowledge. Here's the sequence:

Rebuild the daily log by consolidating any partial writes from the day's sessions
Sync memory structure to make sure all expected directories and files exist
Read the last 2 days of logs. The curation window covers today and yesterday, catching items from late-night sessions
Run the Forte 3-layer review. For each log entry: does this belong in a project doc, an area doc, MEMORY.md, or USER.md? Or is it noise?
Apply the Memory 2.0 checklist. Extract durable items with typed prefixes, skip duplicates, update stale info rather than appending
Commit and push to git. The entire memory directory is version-controlled, so every change is tracked and recoverable

Write-safety rules prevent corruption of critical files. MEMORY.md has a maximum size limit. If the curation job would push it over, it must first archive or remove lower-priority items. The job also validates Markdown structure after every write so malformed files don't break downstream reads.

# Crontab entry
0 2 * * * /home/agent/scripts/curate-memory.sh >> /var/log/memory-curation.log 2>&1

What Works and What Doesn't

What Works

Typed retain prefixes (Decision:, Fact:, Rule:, Todo:, Risk:) make extraction from daily logs reliable and machine-parseable. The agent knows exactly what to look for during curation.
Separating raw logs from curated memory keeps MEMORY.md from becoming a junk drawer. Logs are cheap and disposable. Curated memory is small and valuable.
Project-specific docs keep context close to where it's needed. When the agent works on the website redesign, it loads memory/Projects/website-redesign.md, not the state of every other project.
Temporal decay fades old context without manual cleanup. Last week's deployment issue ranks higher than last quarter's.

What Doesn't Work

Expecting the agent to "remember" without writing to a file is the #1 failure mode. If a decision isn't persisted, it's gone when the session ends. "Mental notes" don't survive restarts. Period.
Dumping everything into MEMORY.md turns it into a junk drawer within a week. Without the three-layer separation, the file just grows until it's too large to be useful.
Skipping the nightly curation lets daily logs pile up without synthesis. After about a week, the logs are too large for the agent to process in one session, and important decisions get buried.

Bottom line: treat memory like a journal, not a database. Capture freely, curate ruthlessly, promote only what matters.

Implementation Details

Memory files are plain Markdown. Human-readable, git-trackable, grep-able. No proprietary format. You can read and edit your agent's memory with any text editor.
No vector database required. SQLite + the VoyageAI API is enough at personal scale. The entire index fits in a single file under 50 MB, even after months of use.
Session startup sequence: read SOUL.md → read USER.md → read recent logs (last 2 days) → read MEMORY.md. Full context in under 4,000 tokens.
Hard rule: if it's not written to a file, it doesn't exist. The agent writes first, thinks second.
Self-curation: the agent can read AND write its own memory files. It actively curates, updates, and prunes memory during sessions.

The startup sequence in practice:

# Agent startup (simplified)
cat memory/SOUL.md          # Who am I?
cat memory/USER.md          # Who am I helping?
cat memory/MEMORY.md        # What do I know?
cat memory/Logs/$(date +%F).md  # What happened today?
# Then: memory_search for any task-specific context

Results and Learnings

After running this system for several weeks with my own agent, here's what I see:

Context persists across weeks. The agent maintains continuity across dozens of conversations. It knows what was discussed and what's still pending.
Decisions stick. A technical decision made in February is recalled in March without re-explanation. "We chose Terraform over Pulumi because of state management" is a Decision: entry that persists.
Project state stays current. No more "where were we?" at the start of every session.
Near-zero cost. VoyageAI's free tier covers personal use. SQLite and Markdown files cost nothing. The only expense is LLM API calls for the nightly curation job, which are minimal.

Here's what I've learned: memory is a feature you build, not a capability you wait for models to have. Current LLMs are stateless by design. That won't change soon. But wrap the model in a memory system that captures, curates, and retrieves well, and you get the continuity that makes it genuinely useful as a long-running assistant.

The architecture is intentionally simple. Plain Markdown. SQLite. A cron job. No Kubernetes, no Pinecone, no distributed systems. For a personal agent, this is all you need. The complexity should be in the curation logic, not the infrastructure.

Loic Gasser Founder & Principal Engineer at Maxeo Solutions About →