LangMem SDK for AI Agents: Long-Term Memory, Architecture, Integration, Performance, and Alternatives

AI agents powered by large language models have long faced a major limitation: their memory is very restricted. In their default form, an LLM can only retain what fits inside the context window, such as the current conversation or chat history. As a result, previously learned details are quickly lost once a session ends or the token limit is reached, and the model remains stateless between interactions. LangMem SDK is designed to solve this issue by adding persistent long-term memory.

With LangMem, agents can improve over time by retaining earlier interactions, important facts, user preferences, and other relevant information across multiple sessions. This article explains what LangMem is, why long-term memory matters, how LangMem functions, and how it can be applied in your own projects. It also looks at performance and compares LangMem with other options. By the end, you will understand how to use LangMem SDK to build more capable AI agents with meaningful memory.

Key Takeaways

  • LangMem gives agents state: It turns context-limited, stateless LLM agents into systems that preserve knowledge across sessions.
  • Several memory categories are supported: LangMem works with semantic memory (facts), episodic memory (previous interactions), and procedural memory (behavioral rules) through one unified API.
  • Memory handling is driven by an LLM: A Memory Manager reviews conversations, decides what should be stored, updated, or removed, and refines knowledge over time.
  • Storage is flexible: LangMem can work with vector databases, key-value stores, Postgres, and other backends through a configurable store interface.
  • Production deployments need planning: Namespacing, pruning, retrieval tuning, and cost management are all important for scalable long-term memory systems.

What Is LangMem SDK?

LangMem SDK is an open-source software development kit created by the LangChain team that brings long-term memory to AI agents. Put simply, it provides an AI agent with a persistent memory store, along with the logic needed to save, update, and retrieve knowledge as the agent interacts with users over time. Used alongside any language model and agent framework, LangMem can pull useful information from conversations or other experiences and insert the right memories back into the agent’s context whenever they are needed. LangMem SDK functions as a lightweight Python wrapper library, making it easy to integrate with different agent frameworks and storage backends.

Using LangMem, an agent can retain facts, preferences, previous events, or even adapt its own behavior based on feedback. For example, if you tell a virtual assistant your name or mention a preference such as “I like dark mode,” LangMem can store that information in long-term memory. Later, even during a different session, the agent can retrieve it and use it naturally in its response, such as greeting you by name and remembering your preference.

Internally, LangMem defines several categories of memories that an agent can use:

  • Semantic memory – facts and data, such as important information, user details, and knowledge triples. This becomes the agent’s growing factual knowledge base learned through interaction.
  • Episodic memory – previous experiences or events. These are often stored as summaries of past interactions and help the agent learn from earlier conversations.
  • Procedural memory – learned behavior, instructions, or policies that influence how the agent responds. This can help adjust the agent’s persona or teach it new rules over time.

LangMem offers a unified API for working with these memory types. You define which types of memory the agent should use, and LangMem extracts the relevant information from conversations, stores it, and makes it available again in future interactions.

Why Long-Term Memory Matters for AI Agents

Giving agents persistent memory unlocks several important capabilities:

  • Continuous context: A standard conversational agent cannot remember earlier sessions. With long-term memory, agents can keep extended context across conversations, so users do not need to repeat past information. A support assistant, for example, could remember a customer’s last problem.
  • Personalization: Agents can recall user preferences and profile details to tailor answers. If an AI tutor remembers which topics a learner struggled with previously, it can adapt its teaching style accordingly.
  • Learning from experience: Memory allows agents to improve based on prior actions. By retaining successes and failures, an autonomous system can refine its strategy. Agents with memory are not just reactive tools; they can adapt and improve through use.
  • Task continuity: An AI agent can preserve details about an ongoing task. For example, if a coding assistant works on bug fixes over multiple days, memory allows it to resume from the previous state instead of requiring a recap or starting over.
  • Reduced prompt size: Rather than stuffing entire conversation histories into prompts, which increases cost and eventually hits context limits, an agent with memory can pull only the information it truly needs from long-term storage. This makes context usage more efficient.

Architecture and Technical Overview

LangMem’s architecture can be viewed as a layered system that sits beside the main agent logic.

a. Agent Framework Layer

This is the agent itself, whether built with LangChain or another framework, that interacts with the language model. The agent needs to be configured to use LangMem’s memory tools as part of its available tools. For example, in a LangChain-based setup, you would create an agent and supply manage_memory and search_memory as tools the agent can call. The decision loop can then make use of those tools whenever appropriate. LangMem is not limited to LangChain, however. If you use another framework, you can call LangMem’s API directly from your own agent logic.

b. Memory Manager Core (LLM-Powered)

At the center of LangMem is the Memory Manager. This component is effectively an LLM that takes conversation transcripts or similar data as input and outputs memory entries. Behind the scenes, prompt templates and structured instructions evaluate the transcript and decide what should be stored, updated, or deleted. For example, if a new fact appears in a conversation, such as a change in someone’s role, LangMem’s Memory Manager can interpret that update and generate a corresponding memory record. It can also review existing memories and determine when older information is no longer accurate and should be replaced or removed. This process is referred to as consolidation logic.

c. Memory Storage Layer

LangMem does not require a specific storage format. Instead, it expects a backend memory store that can persist and retrieve memory entries. This backend is often a vector database or another key-value system capable of handling embeddings and semantic search. In the overall architecture, this acts as the long-term memory database. LangMem only requires a store object that follows the expected interface, including methods for saving memory and querying by embedding, so developers can connect a wide range of alternatives through lightweight adapters.

d. LangGraph Integration (Optional)

When used together with LangGraph, you can also rely on services such as Checkpointer and BaseStore. These provide checkpointing for short-term memory, such as chat history logging, and BaseStore services for long-term vector storage. LangMem adds the higher-level logic on top, deciding what should be written into the store and how stored memory should be updated over time.

Data Flow

Here is a simplified explanation of how the components work together while the agent is running:

a. During a Conversation

The agent receives user input and processes it. As part of its normal reasoning or tool usage, it may decide to call the manage_memory tool and pass in the current conversation content.

LangMem’s Memory Manager, through its LLMNode, evaluates that content, determines what information is valuable enough to retain, and returns one or more memory entries for persistence. Those memory entries are then written to the storage layer, along with an embedding index that will later support retrieval.

b. Later in the Same Conversation or in a Future Session

When the agent later receives a query or needs supporting context, it may call the search_memory tool. LangMem then takes a query, which could be the user’s current question or a broader topic, and runs a similarity search against the stored memories. It returns any memory entries that appear relevant.

The agent can then place those retrieved memory snippets into its prompt, often by appending them to the system or user context, and let the LLM generate a response based on that recovered information. In this way, knowledge from earlier interactions is dynamically reintroduced into the present conversation.

c. Background Maintenance

Separately, a background thread, process, or scheduled job can periodically call LangMem’s consolidation routine if that feature is enabled. This routine works on groups of memory entries, or possibly the full memory store, and uses the LLM-based Memory Manager to produce cleaner output. That can include merging similar memories, summarizing older conversations, removing flagged entries, and more. The cleaned results are then written back to the store, replacing or updating older records.

Integration Guide

Next, we will go through how to use LangMem SDK with an AI agent. This example uses Python together with LangChain utilities, although the overall workflow is similar with other languages and tools.

1) Python Packages

You will need:

  • langmem (memory tools)
  • langchain (agent API)
  • langgraph (stores and runtime wiring)
  • provider packages (for example, OpenAI)

Install:

pip install -U langmem langchain langgraph langchain-openai openai

If you want to persist memory with Postgres later:

pip install -U "psycopg[binary,pool]"

2) Provider Credentials

LangMem does not include its own LLM. You need to configure a provider such as OpenAI or Anthropic. For OpenAI:

export OPENAI_API_KEY="sk-..."

You can also set it before creating the agent:

import os, getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("Paste OPENAI_API_KEY: ").strip()
assert os.environ["OPENAI_API_KEY"], "Empty key."

Step 1 — Import the Updated Components

You will use create_agent from LangChain and a memory store from LangGraph:

from langchain.agents import create_agent
from langgraph.store.memory import InMemoryStore
from langmem import create_manage_memory_tool, create_search_memory_tool

In the code above:

  • create_agent is the modern factory for tool-using agents.
  • InMemoryStore represents a vector-backed store for long-term memory in development or demo scenarios.
  • LangMem tools handle memory writing, updating, and retrieval.

Step 2 — Create a Memory Store (Demo Mode)

For development work, you can begin with an in-memory vector index:

store = InMemoryStore(
   index={
       "dims": 1536,
       "embed": "openai:text-embedding-3-small",
   }
)

Operationally, this means:

  • Each memory item is converted into a 1536-dimensional vector.
  • Retrieval is based on semantic similarity search.
  • Memory is cleared when the process restarts, which is acceptable for demonstrations.

Step 3 — Define Memory Tools With User-Scoped Namespaces

Namespaces are the simplest and most effective way to prevent memory leakage between users. They allow you to separate memory entries when your system supports multiple agents or multiple users. This ensures that memories from one user or agent are not mixed with another. In a multi-user system, you can assign a namespace dynamically per user, such as namespace=(user_id, “memories”). The LangMem tools will then retrieve memories only from the relevant scope. A common recommendation is to scope memory by a runtime user_id.

Memory write tool (manage memory):

manage_memory = create_manage_memory_tool(
   namespace=("memories", "{user_id}"),
   instructions=(
       "Store stable user facts and preferences (name, role, long-running projects, UI preferences). "
       "Avoid storing sensitive data unless the user explicitly requests it."
   ),
)

Memory read tool (search memory):

search_memory = create_search_memory_tool(
   namespace=("memories", "{user_id}"),
   instructions=(
       "When questions depend on prior info (preferences, identity, previous tasks), search memory first "
       "and use the results in the response."
   ),
)

Why this matters:

  • ("memories", "{user_id}") tells LangMem to store memories in a user-specific partition.
  • When the agent runs, you provide the user_id in the call configuration, and LangMem fills in the template automatically.

Step 4 — Create the Agent

Now connect the components:

agent = create_agent(
   model="gpt-4o-mini",                 # choose your model
   tools=[manage_memory, search_memory],
   store=store,
)

At this stage, you have:

  • an LLM-driven agent,
  • tools for long-term memory writing and retrieval,
  • a store that semantically persists and retrieves memories.

As shown here, the LangMem integration happens behind the scenes, so you do not need to manually invoke the memory functions yourself. That is the core integration process. With only a few lines of code, long-term memory can be added to an AI agent.

Production Upgrade — Persistent Memory With Postgres

One important consideration is how to make memory truly persistent through a backend that survives restarts. In the earlier example, InMemoryStore would lose all data when the process stops. In a production system, you would typically use something like this:

from langgraph.store.postgres import PostgresStore
store = PostgresStore.from_conn_string("postgresql://user:password@host:5432/dbname")
store.setup()  # run once

In this example, Postgres is used to persist memory so the agent’s knowledge remains available even after the application restarts. LangChain provides PostgresStore for storing text and embeddings in an SQL table. The same principle can also be applied to any other vector database, provided it is wrapped in the store interface LangMem expects. Once persistence is in place, the agent can retain information indefinitely, or at least until the stored data is intentionally cleaned up.

Performance and Scaling Considerations

Adding long-term memory greatly expands what an agent can do. At the same time, it introduces new performance and scaling considerations that should be planned for carefully. The following guidelines are useful when working with LangMem:

Consideration Risk / Challenge Practical Mitigations (LangMem-Focused)
Memory Growth and Pruning As time passes, the agent can accumulate many memory entries, which may slow retrieval and increase the likelihood of irrelevant recall. Apply pruning and compression strategies: summarize older memories into fewer records, retain only the most important facts such as the latest 100 entries, and use time-based decay for items that are rarely referenced unless they are marked as permanent. Use LangMem’s background manager for periodic consolidation.
Retrieval Efficiency As the memory store grows into the thousands of entries or more, vector search latency can increase and reduce responsiveness. Use an indexed vector database and monitor retrieval latency. Narrow searches with namespaces or sharding by memory category, such as separating preferences from general memories. Tune top-k retrieval values and embedding model selection to balance accuracy and speed.
Context Window Usage Retrieved memories still consume tokens when added to the prompt. Large entries can push context limits and raise costs. Store concise, distilled facts instead of long conversation transcripts. Summarize at write time, extract only the most useful sentences, and use tool instructions that enforce brevity. Limit the number of memories injected into each response.
Memory Scope and Privacy In multi-user systems, poorly scoped memory may leak across users or contain sensitive information that should not have been stored. Use user- or tenant-scoped namespaces, and add role- or mode-specific namespaces where appropriate. Filter what gets stored and avoid raw transcripts when they are unnecessary. For sensitive material, consider encryption at rest and strict retention policies that preserve only non-sensitive insights.
Scaling the LLM for Memory Operations The quality and cost of memory extraction and consolidation depend on the chosen model. Stronger models cost more, while weaker ones may produce poor memory quality. Use model tiering: rely on a smaller model for routine fact extraction and a stronger model for periodic summarization. Control how often extraction runs, avoid triggering it on every turn unless something important occurred, cache where possible, and track total LLM costs as usage increases.

Comparison With Alternatives

Below is a quick comparison of common approaches teams use to add long-term memory to AI agents. The comparison contrasts three main paths: building a custom RAG-style memory layer from scratch, adopting another memory-focused SDK, or choosing LangMem as an all-in-one solution.

Approach / Option What It Looks Like in Practice Key Trade-Offs vs LangMem
Custom Memory Solutions (DIY RAG-Style Memory) Building memory yourself with a vector database typically means choosing which messages matter, embedding them, storing them, retrieving the closest matches at query time, and prepending them to the prompt. You also need to build your own summarization and extraction prompts, update and delete logic, deduplication, and retention rules. Pros: Maximum flexibility and full control for niche requirements; highly customizable schemas and pipelines. Cons: Significant engineering and prompt-engineering overhead; more difficult to maintain; inconsistent behavior is easier to introduce; pruning, consolidation, and versioning become your responsibility. LangMem advantage: a standardized and tested memory-management pipeline that reduces custom plumbing and speeds up delivery.
Other Memory SDKs / Tools Use a dedicated memory toolkit or framework that offers user-centric memory and retrieval features and may include templates for memory schemas and storage backends. Some teams also build their own internal memory modules. In many agent stacks, memory still has to be inserted manually. Pros: May be simpler if it already fits your existing stack; some tools may offer specialized workflows for user memory. Cons: Feature depth and maturity vary widely; some solutions are tightly coupled to a specific service or backend; integration depth may differ. LangMem advantage: strong LangChain integration together with broader support for tooling, background processing, multiple memory types, and backend-agnostic design.
LangMem SDK (LangChain-First Memory Layer) Memory writing and searching are exposed as tools inside the agent loop and backed by a LangGraph store. It supports structured memory management and background consolidation and is designed to work across multiple storage backends. Pros: Quick integration, consistent behavior, support for multiple memory types, flexible storage, and a clean separation between memory logic and agent logic; less custom plumbing is required. Cons: Familiarity with LangChain and LangGraph patterns is helpful; stateful systems naturally introduce more debugging and monitoring complexity, which applies to any long-term memory strategy.

FAQs

Does LangMem include its own language model?

No. LangMem does not ship with a built-in language model. It works as a memory layer alongside external LLM providers such as OpenAI or Anthropic. To use it effectively, you must configure and connect your preferred model.

Can LangMem work without LangChain?

Yes. LangMem is not strictly tied to LangChain. Although it integrates smoothly with LangChain-based workflows, it can also be used with custom agent systems, which makes it suitable for developers who want more architectural control.

Is memory automatically persistent?

Memory becomes persistent only if you configure a persistent storage backend such as PostgreSQL or a vector database. If you use an in-memory store, the data disappears when the session ends or the process restarts. Persistence therefore depends entirely on the selected storage configuration.

How does LangMem prevent memory leakage between users?

LangMem uses namespaces to isolate data between users or tenants. Each user’s memory is stored within its own scope so information does not spill across sessions. This is important for both privacy and overall system reliability.

Does long-term memory increase costs?

Yes. Long-term memory can raise costs because it adds extra steps for storage, retrieval, and processing. These operations may involve additional LLM calls and storage usage. Costs can be controlled by pruning unnecessary memory and optimizing retrieval behavior.

Conclusion

LangMem makes it possible to turn stateless, context-window-limited LLM agents into stateful systems that retain user facts, preferences, and task history across interactions. It combines an LLM-driven memory manager with an extensible storage layer and practical tools such as manage_memory and search_memory to bring persistence into real-world applications. Instead of building a custom memory pipeline from scratch, LangMem offers a faster route toward production-ready persistent agent behavior. When memory is scoped correctly through namespaces, paired with an appropriate storage backend, and supported by thoughtful pruning and retrieval strategies, LangMem provides a strong foundation for building agents that improve over time while remaining scalable and manageable.

Source: digitalocean.com

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in: