Recipe 06 of 10 in the Agent Blueprint Recipes arc:
Foundation → Knowledge → Grounding → Orchestration → Thread Memory → User Memory → Observability → Guardrails → Actions → Simulation
This recipe keeps the progression intact.
It still uses the LangGraph orchestration from Cookbook #4 and the short-term thread_id memory from Cookbook #5.
It adds durable user_id memory backed by Postgres so the agent can remember useful facts across new threads and process restarts.
The demo stays intentionally explicit.
The request body includes user_id because this cookbook has no auth layer.
In production, derive user_id from authenticated identity and ignore any user identifier supplied by the client.
What you'll build
A FastAPI service that extends the orchestrated, thread-aware book agent:
- Inherited orchestration — the
direct/deliberateLangGraph route still chooses the response path. - Inherited thread memory —
thread_idstill loads and saves recent conversation turns. - Postgres-backed user memory — long-term facts and preferences survive process restarts.
- Bounded recall — a small set of relevant memories is injected before the model writes.
- Privacy operations — list, summarize, and delete endpoints let users inspect and remove stored context.
- Network-free tests — tests use an in-memory backend and
respx, so CI never calls Nebius or Postgres by default.
request ──► thread_id ──► short-term checkpoint ──┐
└─► user_id ─────► Postgres memory store ─┼─► LangGraph route ──► Nebius stream
└─► bounded recall
Prerequisites
- Python 3.12+
- uv
- Docker for local Postgres
- A Nebius API key from the Nebius console
Run it
cp .env.example .env
# Open .env and fill NEBIUS_API_KEY.
docker compose up -d postgres
uv sync
make dev
In another terminal, store a durable user preference:
curl -N -X POST http://localhost:8000/agent/run \
-H 'content-type: application/json' \
-d '{"thread_id":"thread-a","user_id":"reader-42","prompt":"Remember that I prefer concise science fiction recommendations."}'
Then ask from a different thread:
curl -N -X POST http://localhost:8000/agent/run \
-H 'content-type: application/json' \
-d '{"thread_id":"thread-b","user_id":"reader-42","prompt":"Recommend one book for me."}'
The SSE stream includes memory status events:
event: status
data: {"phase":"memory_loaded","threadId":"thread-b","userId":"reader-42","messages":0,"longTermMemories":1}
event: status
data: {"phase":"routed","route":"deliberate","contextNeed":"curated_recommendation"}
event: token
data: {"text":"..."}
API Contract
POST /agent/run
Runs the agent and streams named SSE events.
Request:
{
"thread_id": "thread-b",
"user_id": "reader-42",
"prompt": "Recommend one book for me.",
"temperature": 0.4,
"max_tokens": 1024,
"history": []
}
Important request fields:
| Field | Required | Purpose |
|---|---|---|
thread_id | yes | Short-term conversation continuity. |
user_id | yes | Long-term memory namespace for this demo. |
prompt | yes | Current user request. |
history | no | Optional client-provided context; server-side memory is loaded separately. |
SSE events:
| Event | Meaning |
|---|---|
status | Phase transitions such as memory_loaded, routed, writing, and memory_saved. |
token | Nebius token deltas plus the final usage footer. |
done | Final usage payload. |
error | Recoverable API-level failure. |
heartbeat | Long-running connection heartbeat. |
Memory Endpoints
| Method | Path | Purpose |
|---|---|---|
GET | /memory/{user_id} | List stored long-term memories. |
GET | /memory/{user_id}/summary | Summarize what the agent knows about the user. |
DELETE | /memory/{user_id} | Delete stored long-term memories for privacy or account deletion. |
DELETE | /threads/{thread_id} | Delete process-local short-term thread memory. |
Example memory summary:
curl http://localhost:8000/memory/reader-42/summary
Configuration
Every setting is validated by Pydantic at boot.
Copy .env.example and keep secrets out of code.
| Env var | Default | Purpose |
|---|---|---|
NEBIUS_API_KEY | required | Nebius API key. |
NEBIUS_BASE_URL | https://api.studio.nebius.ai/v1/ | OpenAI-compatible Nebius endpoint. |
NEBIUS_MODEL | meta-llama/Llama-3.3-70B-Instruct | Chat model. |
MEMORY_BACKEND | postgres | postgres for local/prod, memory for tests. |
POSTGRESQL_ADDON_URI | local Postgres URL | Postgres connection string. Clever Cloud injects this when a Postgres add-on is linked. |
LONG_TERM_MEMORY_LIMIT | 5 | Maximum memories recalled or listed. |
CORS_ORIGINS | http://localhost:3000 | Browser allowlist. |
LOG_LEVEL | info | Structured logging level. |
Sharing one Postgres database
Postgres schemas let multiple cookbooks share one database without sharing tables.
Think of a schema as a folder inside the database: cookbook #6 writes to dev_cbk_06.user_memories locally and prod_cbk_06.user_memories in production.
The schema name is built from ENV and the cookbook number, so there is no extra environment variable to maintain.
Cookbooks #7-#10 inherit the same pattern with their own schemas, for example prod_cbk_07, prod_cbk_08, prod_cbk_09, and prod_cbk_10.
That gives you one managed Postgres instance, one connection URI, and isolated tables per cookbook.
The app creates the configured schema and user_memories table on first use.
For local development, keep ENV=development, which produces the dev_cbk_NN schemas.
For Clever Cloud, link the same Postgres add-on to each cookbook app.
Clever injects POSTGRESQL_ADDON_URI, which the app uses as its only Postgres connection string.
Set ENV=production and the app automatically uses prod_cbk_06 for this cookbook.
Implementation Notes
Long-term memory lives in app/core/long_term_memory.py.
The production backend creates a schema-qualified user_memories table and indexes records by user_id and created_at.
The test backend implements the same contract in memory.
Recall is deliberately bounded. The agent never injects the full memory store into a prompt. The route asks the long-term store for relevant memories, converts them into synthetic context, and then lets the inherited LangGraph route decide whether the prompt is direct or deliberate.
Memory extraction is deterministic in this cookbook. It stores explicit user facts such as "remember that..." and "my favorite author is...". A production system can replace this with a model-assisted memory writer, but keep the same privacy and deletion contracts.
Production Checklist
- Derive
user_idfrom auth instead of trusting request JSON. - Use a managed Postgres instance with backups, migrations, and connection pooling.
- Keep
ENV=productionin deployed cookbook apps so memory lands inprod_cbk_NNschemas. - Add row-level tenancy controls if multiple customers share the same database.
- Decide retention windows for user memories and checkpoints.
- Keep deletion paths tested because privacy workflows are product behavior, not admin utilities.
- Review stored memories for PII before expanding extraction beyond explicit user preferences.
Failure Modes
| Symptom | Likely cause | Handling |
|---|---|---|
| Agent forgets within one thread | Unstable thread_id | Keep thread_id stable per conversation and expose reset explicitly. |
| Agent forgets across threads | Missing or wrong user_id | Derive user_id from authenticated identity in production. |
| Memory leaks between users | Client-controlled user_id in a real app | Never trust request JSON for identity outside this demo. |
| Token usage grows | Too many recalled memories | Lower LONG_TERM_MEMORY_LIMIT or summarize memories. |
| Stale preferences affect answers | User changed their mind | Prefer newer memories and expose delete/edit flows. |
Test It
uv run pytest
uv run ruff check
uv run ruff format --check
The tests mock Nebius with respx.
They use MEMORY_BACKEND=memory, so they do not require Postgres or network access.
Going Further
- Replace deterministic extraction with a dedicated memory-writer node.
- Store memory provenance so users can see which prompt created a memory.
- Add edit endpoints for user-corrected memories.
- Move thread checkpoints to Postgres alongside long-term memory.
- Continue to Cookbook #7 to trace the full memory path with LangSmith.
Reference
- LangChain long-term memory — docs.langchain.com/oss/python/langchain/long-term-memory
License
MIT