How to Build AI Agents with Long-Term Memory Using LangGraph and Mem0
Conventional AI agents usually depend on short-term context, meaning the active conversation window, and they often lose earlier information once a chat session finishes. A more advanced approach is to equip agents with long-term memory. When an agent can retain user preferences, facts, and past interactions, it becomes more personalized and more effective. This can be achieved by combining LangGraph, a framework for stateful graph-based agents, with Mem0, a dedicated memory layer. With this setup, an LLM-based agent can retain prior information and make use of it later.
By using LangGraph together with Mem0, you can create agents that respond with awareness of past interactions. Because Mem0 stores and retrieves memories, every new LangGraph session can include a summary of meaningful earlier exchanges in the prompt. This makes it possible to build agents that maintain more consistent, personal, and coherent conversations over time. This article explains the primary memory categories, walks through the LangGraph and Mem0 workflow, shares code examples, compares memory approaches such as RAG and persistent memory, and covers important scaling considerations including vector databases, privacy, and cost.
Key Takeaways
- Persistent memory improves agents: LangGraph agents can keep memory across conversations, allowing interactions to be tailored from one session to the next. Over time, the agent can retain details about the user and build a stronger understanding.
- Memory versus context window: The context window offers temporary, short-lived memory that ends with the current session. Persistent memory through Mem0 stores user-specific facts over time. RAG complements both by retrieving outside knowledge when needed.
- LangGraph structure: LangGraph’s graph-based design makes it simple to introduce memory-related nodes. You can define a State that includes a
mem0_user_idand then build a chatbot node that searches and writes memories on each turn. - Mem0 capabilities: Mem0 can extract semantic memory and supports flexible persistent storage. It works with different LLMs and allows developers to shape memory behavior themselves, unlike closed memory systems.
- Memory system design: Use semantic retrieval for relevant facts, merge or filter memory entries to avoid duplication, and balance detailed storage with summarized memory for efficiency. The choice of vector database and indexing strategy matters.
- Production concerns: Think through privacy, retention rules, and scalability. Memory can reduce token usage and improve relevance, but it also introduces additional storage and compute requirements.
AI Memory: Short-Term, Retrieval, and Long-Term
AI agents rely on different kinds of memory depending on the task and scope:
Short-Term Session Memory
Short-term memory, often called window memory, refers to the active chat history inside one conversation thread. This session-scoped state is handled automatically by LangGraph. Once the conversation is over, however, that memory window closes. If you ask an agent to list previously saved documents, it can only reference documents shared during the same session unless data has been stored elsewhere. When an agent depends directly on raw message history, it is limited by the LLM context window, which can lead to larger prompts and increased cost.
Retrieval Memory with RAG
Retrieval memory refers to bringing in information from external sources such as files or databases. Retrieval-Augmented Generation uses a vector database to fetch related information dynamically based on the current query. In practice, RAG allows an agent to consult external materials whenever needed.
Long-Term Persistent Memory
Long-term memory is a durable, user-specific layer that remains available across sessions. It makes it possible to preserve distilled facts, preferences, and experiences about a user and reuse them in future interactions. Unlike RAG, which brings in general outside information, long-term memory is focused on personalized context tied to the user.
Put simply, short-term memory manages the active conversation, RAG enriches the interaction with external information, and long-term memory through Mem0 provides ongoing user-specific continuity.
Overview of LangGraph
LangGraph is a framework for building stateful, graph-based agents. Rather than using a simple linear chain, LangGraph lets you create nodes and edges that define the agent’s workflow. Nodes are responsible for specific tasks such as calling an LLM, running calculations, or retrieving information from memory, and they return an updated state. Edges determine how the workflow moves from one node to another based on the current state. At the center is a StateGraph object that keeps track of shared state throughout the workflow.
State Management
LangGraph stores conversation state in a State object that moves through the graph’s nodes. This State can contain the full message history along with any metadata associated with the user. State can be preserved across nodes through checkpointing, but by default it only remains available within a single session.
Conditional Edges
Edges can be conditional, so the workflow does not need to remain strictly linear. A LangGraph can branch or loop as needed. For instance, different tools can be selected based on user intent.
Extensibility
LangGraph works with multiple LLM providers, making it flexible for a wide range of deployments. It is designed with production use in mind and supports features such as streaming and error handling.
Session Scope
Out of the box, a LangGraph agent only sees the context of the current session. Once the chat ends, the state is removed unless it has been written to external storage.
What Mem0 Adds
Mem0 is a persistent memory solution designed for AI agents. It functions as a semantic memory layer by extracting, storing, and retrieving information from conversations and user-related facts. Mem0 is not itself an LLM. Instead, it acts as a dedicated database and search layer built specifically for AI memory use cases.
Semantic Memory
Mem0 extracts factual knowledge from raw chat messages and stores it in compact memory phrases. For example, if a user says, “I love pizza,” the stored memory might become “Loves pizza.” This keeps memory concise and manageable.
Multi-Level Memory
Mem0 supports several namespace levels, including user-level, session-level, and agent-level memory. This allows you to separate one user’s memories from another’s or share broader facts across an agent.
Smart Retrieval
When given a query, such as the latest user message, Mem0 uses vector similarity to return the most relevant stored memories. It usually scopes retrieval to a user ID so that only that user’s history is accessed.
Flexible Storage
Mem0 can connect to different storage backends. SQLite works well for local experimentation, while vector databases such as Qdrant, Pinecone, Weaviate, and others can be used for more advanced deployments. In hosted versions, storage management can be handled for you.
Open Source and Cloud Availability
Mem0 is available as an open-source client library for self-hosted use and also as a cloud platform for easier setup.
Integration Architecture
When these two systems are combined, the integration follows a straightforward flow:
- Message reception: The agent receives a user message through a LangGraph node such as a chatbot node.
- Memory search: The node calls
mem0.search()with the latest user message and the associated user ID. This returns a ranked list of relevant memories based on vector similarity. - Context construction: The retrieved memory list is turned into a readable context string and added before the system prompt. This helps the LLM consider earlier information while producing a reply.
- LLM invocation: The agent sends the system message and conversation history to the chosen LLM provider. The model responds using both the current user input and the supplied memories.
- Memory update: After the reply is sent, the agent calls
mem0.add()asynchronously to store the interaction so it can be retrieved later.
LangGraph handles state during the interaction loop, while Mem0 preserves long-term information between sessions. The following sketch shows the concept in code:
def chatbot(state: State):
messages = state["messages"]
user_id = state["mem0_user_id"]
try:
# 1. Retrieve relevant memories with user filter
memories = mem0.search(
messages[-1].content,
filters={"user_id": user_id},
version="v2"
)
memory_list = memories.get('results', [])
# 2. Build context string
context = "Relevant information from previous conversations:\n"
for memory in memory_list:
context += f"- {memory['memory']}\n"
# 3. Prepend system message
system_message = SystemMessage(content=f"""
You are a helpful assistant. Use the provided context to personalize your response.
{context}
""")
full_messages = [system_message] + messages
# 4. Generate response
response = llm.invoke(full_messages)
# 5. Store interaction with explicit user_id
interaction = [
{"role": "user", "content": messages[-1].content},
{"role": "assistant", "content": response.content}
]
mem0.add(interaction, filters={"user_id": user_id})
return {"messages": [response]}
except Exception as e:
# Fallback without memory
response = llm.invoke(messages)
return {"messages": [response]}
Memory Extraction, Filtering, and Summarization Strategies
A dependable long-term memory system relies on three controls: deciding what should be stored, determining how existing memory should change over time, and filtering writes so the stored information remains accurate and useful.
Define What Counts as Memory
Mem0’s system for custom fact extraction prompts encourages you to specify exactly which facts should be written to memory. This is especially helpful when you want order details, preferences, support history, or task requirements to be stored permanently, while excluding casual conversation from long-term memory. Broad prompts can easily create noisy memory.
Define How Memory Evolves
Mem0 also supports a configurable custom_update_memory_prompt that tells the LLM to choose between ADD, UPDATE, DELETE, or NONE when reconciling new information with existing memory. Without this kind of instruction, corrected details, changed preferences, or revoked instructions may simply accumulate as conflicting memory entries.
Control Ingestion Quality
Unfiltered memory writing can turn speculation into stored fact. If an assistant records every user message without review, temporary questions, incomplete information, or misunderstandings may become permanent memory. That can lead to poor assumptions in future replies. In production settings, a healthier pattern is to store only important preferences and confirmed facts immediately, while processing less important conversational material asynchronously.
Trade-Offs Between Memory Approaches
Adding long-term memory to an agent introduces several trade-offs:
Storage Versus Latency
Keeping full conversations provides maximum recall, but it increases storage needs and may slow retrieval. Summarizing memory can reduce storage requirements and improve retrieval speed, though this may come at the cost of precision.
Privacy Versus Personalization
Any memory system must handle user privacy carefully. Mem0 separates memories by user ID, but you should still implement retention policies and give users the ability to delete stored information through the API.
Accuracy Versus Cost
Retrieving too many memories may overwhelm the LLM, while retrieving too few may omit essential information. You will likely need to adjust parameters such as maximum memory count and relevance thresholds for your use case.
Database Choice
Vector databases such as pgvector, Pinecone, and Weaviate differ in scaling behavior, pricing, and features. Mem0 uses pgvector in its reference implementation, but other backends or managed services can be used depending on requirements.
Understanding these trade-offs makes it easier to design a memory system that balances user experience, performance, and cost.
A Step-by-Step Overview of the Mem0 and LangGraph Integration
Here is a quick-start walkthrough for connecting Mem0 with LangGraph. This section summarizes the official approach and includes practical optimization tips.
1. Install Dependencies
Install the necessary libraries:
pip install langgraph langchain-openai mem0ai python-dotenv
Create a .env file and add your API keys:
OPENAI_API_KEY=sk-your-openai-key
MEM0_API_KEY=your-mem0-key
Set the embedding provider, model, and dimensions according to your preferred configuration.
2. Initialize LangGraph and Mem0
import os
from typing import Annotated, TypedDict, List
from dotenv import load_dotenv
from langgraph.graph import StateGraph, START
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from mem0 import MemoryClient
load_dotenv()
class State(TypedDict):
messages: Annotated[List[HumanMessage | AIMessage], add_messages]
mem0_user_id: str
llm = ChatOpenAI(model="gpt-4o")
mem0 = MemoryClient() # No API key needed for local/serverless mode
graph = StateGraph(State)
The code above performs the following tasks:
- It imports the packages required for state handling, message management, chat interactions, and Mem0-based memory.
- It loads environment variables from the
.envfile. - It defines a State object containing both the conversation history and a Mem0 user ID.
- It initializes a GPT-4o chat model together with a Mem0 client.
- It creates a LangGraph state graph that will later be used to define the agent workflow.
You would then define the chatbot function, as shown earlier, to retrieve memories, build context, generate a response, and store the interaction.
3. Build the Conversation Graph
Add the chatbot node and connect the graph edges:
graph.add_node("chatbot", chatbot)
graph.add_edge(START, "chatbot")
graph.add_edge("chatbot", "chatbot")
compiled_graph = graph.compile()
This code creates a simple LangGraph workflow in which the chatbot node serves as the entry point. The chatbot function becomes the main step in the process, and the graph loops back to it for every new message. Finally, graph.compile() turns the graph definition into an executable application.
4. Create a Conversation Runner
Write a run_conversation function that streams events from the compiled graph:
def run_conversation(user_input: str, mem0_user_id: str):
config = {"configurable": {"thread_id": mem0_user_id}}
state = {"messages": [HumanMessage(content=user_input)], "mem0_user_id": mem0_user_id}
for event in compiled_graph.stream(state, config, stream_mode="values"):
last_message = event["messages"][-1]
if isinstance(last_message, AIMessage):
return last_message.content
# Main interaction loop
def main():
user_id = input("Enter your user ID: ")
print("Chatbot ready! Type 'quit' to exit.")
while True:
user_input = input("\nYou: ")
if user_input.lower() == 'quit':
break
response = run_conversation(user_input, user_id)
print(f"Bot: {response}")
This code runs the chatbot by sending in the user’s message, building the root conversation state, and streaming events through the compiled LangGraph workflow until the AI response is returned. The main() function provides a basic command-line chat loop where the user enters messages and the chatbot continues responding until the user exits.
5. Deploy and Monitor
Deploy the agent in the environment that best fits your needs. Store memories inside a vector database such as pgvector, Pinecone, or Weaviate. Monitor how memory grows over time, adjust cleanup intervals, and fine-tune retrieval settings so personalization, relevance, and system performance stay balanced.
Production Considerations
There are several important aspects to evaluate when running a LangGraph and Mem0 agent in production:
| Topic | Main idea | Practical notes |
|---|---|---|
| Vector Database | Mem0 uses SQLite by default for quick testing, but production environments usually require a vector database. | Make sure the database has an index on user_id. Managed options can simplify operations, while self-hosting remains possible. The chosen database, such as Qdrant or Pinecone, affects speed, features, and cost. |
| Data Privacy & Retention | Because memory systems store user information, privacy and retention need careful planning. | Encrypt sensitive fields when appropriate, remove stored memories after a defined time, and obtain user consent before storing personal data. Mem0 APIs can support exporting or deleting data. Private networking can add another layer of protection for the vector store. |
| Cost & Performance | Adding memory can reduce LLM token usage by keeping prompts smaller, but it also introduces database lookups. | Semantic search is often fast and can be batched efficiently. Mem0 reports approximately 90% token savings and 91% lower p95 latency compared with a full-context method. You should benchmark your own stack to verify real-world latency. |
| Reliability | The memory database and LangGraph state layer should be designed to handle failures gracefully. | Use LangGraph checkpoints to recover from crashes and maintain backups for memory storage. As the vector database expands, monitor consumption and prepare for scaling. |
| Security | The Mem0 API key and the database infrastructure must be secured properly. | Limit write permissions so only the agent can change memory. In multi-agent or multi-tenant systems, isolate namespaces to strengthen separation and security. |
FAQ
What is long-term memory in AI agents?
Long-term memory is the place where an agent keeps important facts learned from interactions. While short-term memory is limited to context windows that usually reset after only a limited set of messages, long-term memory can continue across multiple sessions.
How is Mem0 different from RAG?
Retrieval-Augmented Generation uses external documents to expand the LLM’s knowledge. Mem0, by contrast, stores information derived from conversation history. It extracts facts from user interactions so the agent can remember details about the user and respond in a more personalized way. With RAG, you might ask for the capital of France. With Mem0, the agent could remember that you bought a laptop last month.
Can LangGraph agents remember previous conversations?
Yes. By combining Mem0 with LangGraph, agents can retain details from earlier interactions. After each turn, new memory snippets are stored once the LLM responds, and relevant memories are retrieved in future turns. Middleware and search logic then insert applicable memories into the system prompt.
Do I need a vector database for Mem0?
Mem0 depends on a vector store to perform similarity searches over embeddings. Although the reference implementation uses pgvector, you can configure another managed service if needed. pgvector is often sufficient for smaller projects, while large deployments may benefit from services such as Pinecone or Weaviate.
What are common use cases for long-term memory in agents?
Typical long-term memory use cases include personal assistants, customer support bots, tutoring systems, and internal help desks. Whenever an agent interacts repeatedly with the same user, memory can help tailor answers, reduce repetition, and strengthen continuity. Long-term memory can also support analysis of user preferences and behavior.
Conclusion
Combining LangGraph with Mem0 is one practical way to move from session-based agents to agents with persistent, user-scoped long-term memory. LangGraph provides structured orchestration together with short-lived conversation state management, while Mem0 adds persistent semantic memory that can be retrieved across sessions to improve continuity, personalization, and relevance. When designed carefully through selective extraction, retention rules, privacy controls, and tuned retrieval settings, this approach enables developers to build stronger agents that remain efficient at scale without depending on oversized chat histories or generic document retrieval alone.
Beyond local examples, a production-ready memory architecture also depends on deployment infrastructure. Integrations between LangChain-based workflows and scalable AI platforms make it possible to connect these workflows to broader inference environments. This gives developers access to multiple models through GPU-accelerated serverless inference and offers a route to scale AI applications beyond the prototype stage.


