Practice: Personal Assistant Agent with Memory
Combining the knowledge from the previous four sections, we'll build a personal assistant that truly "remembers" — one that retains your preferences and continuously learns across sessions.
This project is a comprehensive application of everything in this chapter. We combine short-term memory (sliding window) with long-term memory (ChromaDB vector retrieval), plus automatic memory extraction, to build a personal assistant that genuinely "knows you".
Core Design
The personal assistant's memory system operates in three coordinated layers:
- Short-term memory (sliding window): maintains coherence within the current session, allowing the assistant to track conversation context
- Long-term memory (ChromaDB): persists across sessions, remembering the user's identity, preferences, skills, and other long-term information
- Automatic extraction (LLM analysis): after each conversation turn, automatically identifies information worth remembering
How these three layers collaborate: each time the user sends a message, the system first retrieves relevant information from long-term memory, then sends it along with recent conversation history to the LLM. After the LLM replies, the extractor analyzes whether the turn produced any new information worth storing. The entire process is transparent to the user.
System Architecture
Complete Implementation
The PersonalAssistant class below is the core of the entire system. Pay attention to the processing flow in the chat method — this is exactly the three-layer memory working sequence: retrieve long-term memories → build messages (including short-term memory and relevant long-term memories) → call LLM → update short-term memory → auto-extract new memories.
# personal_assistant.py
import json
import os
import datetime
import chromadb
import uuid
from openai import OpenAI
from dotenv import load_dotenv
from rich.console import Console
from rich.panel import Panel
load_dotenv()
client = OpenAI()
console = Console()
class PersonalAssistant:
"""
Personal assistant with a complete memory system.
- Short-term memory: conversation history (last 10 turns)
- Long-term memory: ChromaDB vector storage
- Automatic memory extraction and storage
"""
def __init__(self, user_id: str, assistant_name: str = "Aria"):
self.user_id = user_id
self.assistant_name = assistant_name
# Initialize vector database
self.chroma = chromadb.PersistentClient(path=f"./data/memory_{user_id}")
self.memory_collection = self.chroma.get_or_create_collection(
name="long_term_memory",
metadata={"hnsw:space": "cosine"}
)
# Short-term memory: conversation history
self.conversation_history: list[dict] = []
self.max_history_turns = 10
# Session ID (for tracking)
self.session_id = str(uuid.uuid4())[:8]
count = self.memory_collection.count()
console.print(f"[dim]{assistant_name} loaded, long-term memory: {count} entries[/dim]")
# =====================
# Memory Operations
# =====================
def _embed(self, text: str) -> list[float]:
"""Generate embedding vector"""
response = client.embeddings.create(
input=text,
model="text-embedding-3-small"
)
return response.data[0].embedding
def save_memory(self, content: str, memory_type: str = "general", importance: int = 5):
"""Save a long-term memory entry"""
memory_id = str(uuid.uuid4())
self.memory_collection.add(
ids=[memory_id],
embeddings=[self._embed(content)],
documents=[content],
metadatas=[{
"type": memory_type,
"importance": importance,
"user_id": self.user_id,
"session_id": self.session_id,
"created_at": datetime.datetime.now().isoformat()
}]
)
def recall_memories(self, query: str, n: int = 5) -> list[dict]:
"""Retrieve relevant memories"""
if self.memory_collection.count() == 0:
return []
results = self.memory_collection.query(
query_embeddings=[self._embed(query)],
n_results=min(n, self.memory_collection.count()),
where={"user_id": self.user_id},
include=["documents", "metadatas", "distances"]
)
memories = []
if results["documents"] and results["documents"][0]:
for doc, meta, dist in zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0]
):
relevance = 1 - dist
if relevance > 0.4: # filter low-relevance memories
memories.append({
"content": doc,
"type": meta.get("type", "general"),
"importance": meta.get("importance", 5),
"relevance": relevance
})
return sorted(memories, key=lambda x: x["relevance"], reverse=True)
def _auto_extract_memories(self, user_msg: str, assistant_reply: str):
"""Automatically extract memorable information from a conversation"""
prompt = f"""Extract information worth storing as long-term memory from the following conversation.
User said: {user_msg}
Assistant replied: {assistant_reply[:200]}
Extraction rules:
✅ Extract: user's personal info, preferences, work, skills, ongoing projects, explicitly stated needs
❌ Skip: greetings, temporary queries, what the assistant said, content with no lasting value
Return a JSON array (return empty array [] if nothing worth extracting):
[{{"content": "concise statement", "type": "fact|preference|task|skill", "importance": 1-10}}]"""
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"},
max_tokens=300
)
result = json.loads(response.choices[0].message.content)
memories = result if isinstance(result, list) else result.get("memories", [])
for m in memories:
if isinstance(m, dict) and m.get("content"):
self.save_memory(
m["content"],
m.get("type", "general"),
m.get("importance", 5)
)
console.print(f"[dim]💾 Memory: {m['content'][:60]}[/dim]")
except Exception as e:
pass # Memory extraction failure doesn't affect the main function
def _get_window_history(self) -> list[dict]:
"""Get conversation history within the sliding window"""
return self.conversation_history[-(self.max_history_turns * 2):]
# =====================
# Conversation
# =====================
def chat(self, user_message: str) -> str:
"""Chat with the assistant"""
# 1. Retrieve relevant information from long-term memory
relevant_memories = self.recall_memories(user_message, n=5)
# 2. Build messages
system_content = f"""You are {self.assistant_name}, the personal assistant for user {self.user_id}.
You can help the user with various tasks: answering questions, writing, analysis, coding, and more.
Personalize your responses — use what you know about the user to provide targeted help.
"""
if relevant_memories:
memory_text = "\n".join([
f"- [{m['type']}] {m['content']}"
for m in relevant_memories[:3] # at most 3 most relevant
])
system_content += f"\n[Memories About the User]\n{memory_text}\n"
messages = [
{"role": "system", "content": system_content}
] + self._get_window_history() + [
{"role": "user", "content": user_message}
]
# 3. Call LLM
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
max_tokens=800
)
reply = response.choices[0].message.content
# 4. Update conversation history
self.conversation_history.append({"role": "user", "content": user_message})
self.conversation_history.append({"role": "assistant", "content": reply})
# 5. Extract memories asynchronously (non-blocking)
self._auto_extract_memories(user_message, reply)
return reply
def show_memories(self):
"""Display all memories"""
if self.memory_collection.count() == 0:
console.print("[dim]No long-term memories yet[/dim]")
return
results = self.memory_collection.get(
where={"user_id": self.user_id},
include=["documents", "metadatas"]
)
from rich.table import Table
table = Table(title=f"📚 {self.user_id}'s Long-Term Memory Store")
table.add_column("Type", style="cyan", width=10)
table.add_column("Importance", style="yellow", width=10)
table.add_column("Content", style="white")
entries = list(zip(results["documents"], results["metadatas"]))
entries.sort(key=lambda x: x[1].get("importance", 0), reverse=True)
for doc, meta in entries[:20]: # show at most 20 entries
importance = meta.get("importance", 5)
table.add_row(
meta.get("type", "general"),
"★" * min(importance // 2, 5),
doc[:80]
)
console.print(table)
# ============================
# Main Program
# ============================
def main():
console.print(Panel(
"[bold]🤖 Personal Assistant Agent[/bold]\n"
"I'll remember information about you and provide personalized service\n\n"
"Commands:\n"
" memory → View what I remember about you\n"
" clear → Clear conversation history\n"
" quit → Exit",
title="Startup",
border_style="green"
))
user_id = input("\nPlease enter your username: ").strip() or "default_user"
assistant = PersonalAssistant(
user_id=user_id,
assistant_name="Aria"
)
console.print(f"\n[bold green]Aria:[/bold green] Hello, {user_id}! How can I help you today?")
while True:
user_input = input(f"\n[{user_id}]: ").strip()
if not user_input:
continue
if user_input.lower() == "quit":
console.print("[bold]Goodbye! I'll remember today's conversation 😊[/bold]")
break
if user_input.lower() == "memory":
assistant.show_memories()
continue
if user_input.lower() == "clear":
assistant.conversation_history.clear()
console.print("[dim]Conversation history cleared (long-term memory preserved)[/dim]")
continue
reply = assistant.chat(user_input)
console.print(f"\n[bold green]Aria:[/bold green] {reply}")
if __name__ == "__main__":
main()
Key Implementation Details
Let's dive deeper into several important design decisions in the code:
Relevance filtering in memory retrieval: The recall_memories method sets a relevance > 0.4 threshold. This is because vector retrieval always returns the "most similar" results, but those results aren't necessarily truly relevant. For example, if the user asks "what should I eat today", ChromaDB will still return some results even if there's no food-related information in the memory store (just with very low similarity). Setting a threshold filters out this noise.
Injecting memories into the System Prompt: In the chat method, retrieved relevant memories are injected into the system prompt (the [Memories About the User] section), rather than sent as user messages. This is because information in the system prompt is treated by the model as "background knowledge" — it won't be directly quoted or exposed to the user in the reply, making the interaction feel more natural.
Silent memory extraction: The _auto_extract_memories method is wrapped in try...except, so failures don't affect the main conversation flow. Memory extraction is a "nice-to-have" feature — even if it occasionally fails, it shouldn't disrupt the user's normal conversation experience.
Demo Output
[Alex]: My name is Alex, I'm a Python engineer working on an AI project
💾 Memory: User's name is Alex
💾 Memory: User is a Python engineer
💾 Memory: User is working on an AI project
Aria: Hi Alex! Great to meet you. As a Python engineer working on an AI project,
what direction are you focusing on? Agent development, model training, or something else?
[Alex]: Help me write a Python function to compute the Fibonacci sequence
Aria: (Based on the "Python engineer" memory, provides professional-grade code)
def fibonacci(n: int) -> list[int]:
"""Return the first n Fibonacci numbers"""
...
-- Next session --
[Alex]: Can that Fibonacci function from yesterday be optimized?
Aria: (Knows from memory that Alex is a Python engineer, gives targeted answer)
You can optimize it with a generator or memoization...
Summary
This chapter built a complete memory system:
| Component | Technology | Purpose |
|---|---|---|
| Short-term memory | Sliding window | In-session coherence |
| Long-term memory | ChromaDB | Cross-session personalization |
| Memory extraction | LLM analysis | Auto-identify important information |
| Semantic retrieval | Vector similarity | Precise recall of relevant memories |
This framework can serve as the foundation for building personalized Agent applications.
Next chapter: Chapter 6: Planning and Reasoning