The Problem: Every Session Starts from Zero

LLM-based agents have a fundamental limitation: they have no memory between sessions. Every conversation starts from scratch. The model has no idea what you discussed yesterday or what decisions were made last week. It doesn't know what project you've been grinding on for a month. The context window is a blank slate.

For a throwaway chatbot, that's fine. But for a personal AI assistant that manages your calendar, drafts emails, and tracks projects? It's a dealbreaker. Imagine hiring an assistant who develops amnesia every night. Every morning you'd re-explain who you are and what happened yesterday. That's the default state of every LLM agent today.

So: how do you give continuity to something that forgets everything? You don't wait for models to develop native long-term memory. You build it yourself, as a system around the model.

Architecture Overview: Three Layers of Memory

I organize the memory system into three layers. Each one has a different retention horizon and level of curation, loosely inspired by how human memory works.

Layer 1: Daily Capture Logs

Every session produces a daily log file at memory/Logs/YYYY-MM-DD.md. These are raw, append-only records of everything that happened: transcripts, events, decisions, errors, observations. Think of it as a journal. Unfiltered, chronological, complete. Nothing is discarded at this stage.

Layer 2: Curated Project & Area Docs

Structured knowledge documents live in memory/Projects/ and memory/Areas/. Each project or area gets its own Markdown file with the current state: goals, decisions, open questions, key contacts, next steps. These are living documents, updated regularly, not just appended to.

Layer 3: Long-Term Memory

The top layer is MEMORY.md, a distilled file of durable facts, preferences, and rules that span all projects. Things like "the user prefers Terraform over CloudFormation" or "never deploy on Fridays." This file gets loaded into context at the start of every session.

On top of these three layers, three identity files give the agent its operating context:

memory/
├── Logs/
│   ├── 2026-03-18.md
│   ├── 2026-03-19.md
│   └── 2026-03-20.md
├── Projects/
│   ├── website-redesign.md
│   └── api-migration.md
├── Areas/
│   ├── infrastructure.md
│   └── hiring.md
├── MEMORY.md
├── SOUL.md
├── USER.md
└── AGENTS.md

The Capture → Retain → Promote Pipeline

I borrowed from Tiago Forte's PARA method (Projects, Areas, Resources, Archives) here. It was designed for humans to organize knowledge, but the same principles work for an AI agent: capture broadly, organize by actionability, promote only what lasts.

Capture

Every session logs raw events to the daily file. The agent appends to memory/Logs/YYYY-MM-DD.md as it works. Low effort, high volume. Decisions, errors, user preferences, task completions. The bar for inclusion is low: if in doubt, log it.

Retain

At the end of each session (or day), the agent pulls out 2–5 high-signal items from the raw log. Each one gets a typed prefix so downstream processing stays reliable:

Promote

A nightly curation job reviews daily logs and moves items to their permanent home. Cross-project items like preferences and rules go to MEMORY.md or USER.md. Project-specific items (decisions, next steps, risks) land in the relevant project doc under memory/Projects/.

Prune

Not everything gets promoted. Operational noise ("ran the build," "checked email," "fixed a typo") stays in the daily log and naturally ages out. Daily logs are kept for reference but never loaded into context after 30 days. Only durable knowledge survives.

Memory Retrieval: Hybrid Search

A well-organized file system is necessary but not sufficient. The agent also needs to search its memory efficiently. You can't load everything into context. Even a few weeks of logs would blow past any context window. So we use hybrid retrieval.

Embeddings

All memory files are chunked and embedded using VoyageAI's voyage-3 model. Embeddings capture semantic meaning, so a search for "deployment process" also finds paragraphs about "pushing to production" or "release workflow" even when those exact words aren't used.

Why VoyageAI over self-hosted models? Solutions like running your own embedding model require a GPU and a beefier server — which defeats the purpose of a $24/month setup. VoyageAI is state-of-the-art (it's what Anthropic uses internally), has a generous free tier that's more than enough for personal use, and keeps the architecture simple: an API call during indexing, then everything else is local. No GPU, no model weights, no CUDA drivers. Just better embeddings at a fraction of the cost.

Hybrid Ranking

Retrieval combines two signals: 70% vector similarity (semantic match) and 30% text matching (keyword overlap). Pure vector search misses exact names and identifiers. Pure text search misses conceptual relationships. The hybrid approach gets the best of both.

Diversity & Temporal Decay

MMR (Maximal Marginal Relevance) keeps search results diverse instead of returning a cluster of near-duplicate paragraphs. We also apply a 30-day temporal decay that boosts recent memories. A decision from yesterday is more likely relevant than one from three months ago. The decay is gentle: old memories don't disappear, they just rank lower.

Implementation

The index is backed by SQLite. No external vector database needed. Embeddings are stored as BLOBs alongside the original text chunks and metadata (file path, date, type). The agent runs memory_search before answering any question about prior work, decisions, or preferences. If it's not in the search results or the loaded context files, the agent says so rather than hallucinating.

The Nightly Curation Flow

Memory curation runs as an automated cron job at 02:00 UTC every night. This is the most important piece of the system. It's what turns raw logs into durable knowledge. Here's the sequence:

  1. Rebuild the daily log by consolidating any partial writes from the day's sessions
  2. Sync memory structure to make sure all expected directories and files exist
  3. Read the last 2 days of logs. The curation window covers today and yesterday, catching items from late-night sessions
  4. Run the Forte 3-layer review. For each log entry: does this belong in a project doc, an area doc, MEMORY.md, or USER.md? Or is it noise?
  5. Apply the Memory 2.0 checklist. Extract durable items with typed prefixes, skip duplicates, update stale info rather than appending
  6. Commit and push to git. The entire memory directory is version-controlled, so every change is tracked and recoverable

Write-safety rules prevent corruption of critical files. MEMORY.md has a maximum size limit. If the curation job would push it over, it must first archive or remove lower-priority items. The job also validates Markdown structure after every write so malformed files don't break downstream reads.

# Crontab entry
0 2 * * * /home/agent/scripts/curate-memory.sh >> /var/log/memory-curation.log 2>&1

What Works and What Doesn't

What Works

What Doesn't Work

Bottom line: treat memory like a journal, not a database. Capture freely, curate ruthlessly, promote only what matters.

Implementation Details

The startup sequence in practice:

# Agent startup (simplified)
cat memory/SOUL.md          # Who am I?
cat memory/USER.md          # Who am I helping?
cat memory/MEMORY.md        # What do I know?
cat memory/Logs/$(date +%F).md  # What happened today?
# Then: memory_search for any task-specific context

Results and Learnings

After running this system for several weeks with my own agent, here's what I see:

Here's what I've learned: memory is a feature you build, not a capability you wait for models to have. Current LLMs are stateless by design. That won't change soon. But wrap the model in a memory system that captures, curates, and retrieves well, and you get the continuity that makes it genuinely useful as a long-running assistant.

The architecture is intentionally simple. Plain Markdown. SQLite. A cron job. No Kubernetes, no Pinecone, no distributed systems. For a personal agent, this is all you need. The complexity should be in the curation logic, not the infrastructure.