· AI & Automation  · 14 min read

LangChain: Build Intelligent AI Agents Using Python

Master LangChain for production Python AI agents. Learn agent types, tool integration, memory systems, and RAG implementation with real code examples and best practices

Master LangChain for production Python AI agents. Learn agent types, tool integration, memory systems, and RAG implementation with real code examples and best practices

How LangChain Helps You Build Production-Ready AI Agents with Python

This article is part of our 5-part series on AI Agent & Workflow Development Tools where we explore the leading platforms and frameworks for building production-ready AI solutions.

📚 Series: Tools We Use for AI Development

  1. Azure AI Foundry - How Azure AI Foundry helps you build secure enterprise AI solutions
  2. LangChain (this article) - How LangChain helps you build production-ready AI agents with Python
  3. Semantic Kernel - How Semantic Kernel helps you build multi-agent AI systems in .NET
  4. n8n - How n8n democratizes AI automation with low-code workflows
  5. Microsoft Agent Framework - How Microsoft Agent Framework enables scalable multi-agent workflows

What is LangChain?

LangChain is the most popular open-source Python framework for building AI applications powered by large language models (LLMs). It transforms simple LLM API calls into sophisticated AI agents capable of reasoning, using tools, maintaining memory, and executing complex workflows.

LangChain solves the critical challenge of LLM orchestration: connecting language models to external data sources, APIs, and tools while managing context, memory, and error handling. Instead of writing custom prompt engineering logic and tool calling code, LangChain provides battle-tested abstractions that handle the complexity for you.

The framework is designed for production-grade AI systems, not just prototypes. With over 100k GitHub stars and adoption by companies like Robinhood, Notion, and Zapier, LangChain has become the de facto standard for Python AI development.

Why LangChain for AI Agents?

Traditional LLM applications are stateless and reactive—they respond to prompts but can’t plan, remember, or interact with external systems. AI Agents built with LangChain overcome these limitations:

  • Autonomous reasoning: Agents decide which actions to take based on context
  • Tool usage: Connect to databases, APIs, search engines, and custom functions
  • Memory systems: Maintain conversation history and long-term knowledge
  • Error recovery: Retry failed operations and handle exceptions gracefully
  • Multi-step workflows: Break complex tasks into manageable steps

LangChain is particularly powerful for:

  • Retrieval-Augmented Generation (RAG): Ground LLM responses in your data
  • Conversational AI: Build chatbots with context and memory
  • Data analysis agents: Query databases and visualize results
  • Automation workflows: Replace manual tasks with intelligent agents

Core LangChain Architecture

LangChain is organized into modular components that you compose together. Understanding this architecture is essential for building robust agents.

The Component Hierarchy

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# 1. Model: The LLM (OpenAI, Anthropic, local models, etc.)
model = ChatOpenAI(
    model="gpt-4o",
    temperature=0.7,
    api_key="your-api-key"
)

# 2. Prompt: Template for LLM input
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant specialized in {domain}."),
    ("user", "{question}")
])

# 3. Output Parser: Structure the LLM response
parser = StrOutputParser()

# 4. Chain: Connect components with LCEL (LangChain Expression Language)
chain = prompt | model | parser

# Execute the chain
result = chain.invoke({
    "domain": "Python development",
    "question": "How do I optimize database queries?"
})

Key Concepts:

  • Runnables: Every component implements the Runnable interface (.invoke(), .stream(), .batch())
  • LCEL (LangChain Expression Language): The | operator chains components together
  • Type safety: Pydantic models ensure data validation at runtime

LangChain vs LangGraph

LangChain provides linear chains (step-by-step execution), while LangGraph enables cyclic workflows (loops, conditionals, human-in-the-loop). Use LangGraph for:

  • Multi-agent collaboration
  • Iterative refinement (agent tries, evaluates, retries)
  • Complex state machines

We’ll cover both in this guide.

Building Your First LangChain Agent

Agents are autonomous systems that use LLMs to decide which tools to call. Unlike chains (predefined steps), agents reason about the best action dynamically.

Agent Types in LangChain

Agent TypeBest ForToolsMemory
ReActGeneral-purpose reasoningAnyOptional
OpenAI FunctionsStructured tool callingOpenAI function schemaBuilt-in
ConversationalChatbots with historyAnyRequired
Plan-and-ExecuteMulti-step tasksAnyTask list

Creating a ReAct Agent with Tools

The ReAct pattern (Reasoning + Acting) is the most versatile agent architecture. The agent alternates between thinking and tool usage.

from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_core.prompts import PromptTemplate
from langchain.tools import Tool

# 1. Define tools the agent can use
search = DuckDuckGoSearchRun()

def calculate(expression: str) -> str:
    """Evaluate a mathematical expression safely."""
    try:
        # Use ast.literal_eval for safety (only allows literals)
        import ast
        result = eval(expression, {"__builtins__": {}}, {})
        return f"Result: {result}"
    except Exception as e:
        return f"Error: {str(e)}"

tools = [
    Tool(
        name="Search",
        func=search.run,
        description="Search the internet for current information. Input should be a search query."
    ),
    Tool(
        name="Calculate",
        func=calculate,
        description="Perform mathematical calculations. Input should be a valid Python expression (e.g., '2 + 2', '10 * 5')."
    )
]

# 2. Create the agent with a ReAct prompt
prompt = PromptTemplate.from_template("""
You are an intelligent agent capable of reasoning and using tools.

Tools available:
{tools}

Tool names: {tool_names}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Question: {input}
Thought: {agent_scratchpad}
""")

llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_react_agent(llm, tools, prompt)

# 3. Create executor (handles tool calling logic)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,  # Print reasoning steps
    max_iterations=10,  # Prevent infinite loops
    handle_parsing_errors=True  # Graceful error handling
)

# 4. Execute the agent
response = agent_executor.invoke({
    "input": "What is the current price of Bitcoin multiplied by 100?"
})

print(response["output"])

What happens under the hood:

  1. Agent receives the question
  2. Thought: “I need to search for Bitcoin’s current price”
  3. Action: Calls the Search tool with “current Bitcoin price”
  4. Observation: Gets the search result (e.g., “$45,000”)
  5. Thought: “Now I need to multiply by 100”
  6. Action: Calls the Calculate tool with “45000 * 100”
  7. Observation: Gets “4,500,000”
  8. Final Answer: Returns the result to the user

LangChain Tools: Connecting Agents to the Real World

Tools are functions that agents call to interact with external systems. LangChain provides hundreds of pre-built tools and makes it easy to create custom ones.

Using Pre-Built Tools

from langchain_community.tools import (
    WikipediaQueryRun,
    PythonREPLTool,
    ShellTool,
    FileReadTool,
)
from langchain_community.utilities import WikipediaAPIWrapper

# Wikipedia search
wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

# Execute Python code (use with caution!)
python_repl = PythonREPLTool()

# Shell commands (production: restrict to safe commands)
shell = ShellTool()

# File operations
file_reader = FileReadTool()

tools = [wikipedia, python_repl, shell, file_reader]

Production Warning: PythonREPLTool and ShellTool execute arbitrary code. Use them only in sandboxed environments or with strict input validation.

Creating Custom Tools

For production systems, you’ll need custom tools that integrate with your business logic.

from langchain.tools import StructuredTool
from pydantic import BaseModel, Field
from typing import List
import requests

# 1. Define input schema with Pydantic
class CustomerLookup(BaseModel):
    customer_id: str = Field(description="The unique customer ID")
    include_orders: bool = Field(
        default=False,
        description="Whether to include order history"
    )

# 2. Implement the tool function
def lookup_customer(customer_id: str, include_orders: bool = False) -> dict:
    """
    Query customer database and return customer details.
    Production: Replace with actual database call.
    """
    # Simulated API call
    response = requests.get(
        f"https://api.example.com/customers/{customer_id}",
        params={"include_orders": include_orders}
    )

    if response.status_code == 200:
        return response.json()
    else:
        return {"error": f"Customer {customer_id} not found"}

# 3. Create the tool with structured schema
customer_tool = StructuredTool.from_function(
    func=lookup_customer,
    name="CustomerLookup",
    description="Retrieve customer information from the CRM system. Use this when you need details about a specific customer.",
    args_schema=CustomerLookup
)

# 4. Use in an agent
tools = [customer_tool]
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

response = agent_executor.invoke({
    "input": "Find details for customer ID 12345 including their order history"
})

Best Practices:

  • Descriptive names: Help the LLM understand when to use the tool
  • Clear descriptions: Explain what the tool does and when to use it
  • Type safety: Use Pydantic schemas for complex inputs
  • Error handling: Return meaningful error messages, not exceptions

LangChain Memory: Building Stateful Agents

LLMs are stateless—they don’t remember previous interactions. Memory systems solve this by storing and retrieving conversation history.

Memory Types

Memory TypeUse CaseRetentionStorage
ConversationBufferMemoryShort chatsAll messagesIn-memory
ConversationBufferWindowMemoryLimit contextLast N messagesIn-memory
ConversationSummaryMemoryLong conversationsSummarizedLLM-compressed
VectorStoreMemorySemantic retrievalRelevant contextVector DB
EntityMemoryTrack facts about entitiesStructured factsDictionary

Implementing Conversation Memory

from langchain.memory import ConversationBufferMemory
from langchain.agents import initialize_agent, AgentType

# 1. Create memory that stores chat history
memory = ConversationBufferMemory(
    memory_key="chat_history",  # Key for prompt template
    return_messages=True  # Return as ChatMessage objects
)

# 2. Initialize agent with memory
agent = initialize_agent(
    tools=tools,
    llm=ChatOpenAI(model="gpt-4o", temperature=0),
    agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
    memory=memory,
    verbose=True
)

# 3. Conversation with context
agent.invoke({"input": "My name is Alice and I work at TechCorp."})
agent.invoke({"input": "What's my name?"})  # Agent remembers: "Alice"
agent.invoke({"input": "Where do I work?"})  # Agent remembers: "TechCorp"

Window Memory for Long Conversations

To prevent exceeding context limits, use sliding window memory:

from langchain.memory import ConversationBufferWindowMemory

# Only keep last 5 message pairs (10 messages total)
memory = ConversationBufferWindowMemory(
    k=5,  # Number of exchanges to remember
    memory_key="chat_history",
    return_messages=True
)

Summary Memory for Token Efficiency

For very long conversations, summarize old messages to save tokens:

from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(
    llm=ChatOpenAI(model="gpt-4o-mini"),  # Use cheaper model for summaries
    memory_key="chat_history",
    return_messages=True
)

# As conversation grows, old messages are summarized:
# "User discussed Q4 sales targets and marketing budget constraints."

Retrieval-Augmented Generation (RAG) with LangChain

RAG grounds LLM responses in your proprietary data. Instead of relying on the model’s training data, you retrieve relevant documents and inject them into the prompt.

RAG Architecture

User Query → Embedding → Vector Search → Retrieve Docs → LLM + Context → Response

Building a Production RAG System

from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import DirectoryLoader, TextLoader

# 1. Load documents
loader = DirectoryLoader(
    "./docs",
    glob="**/*.md",
    loader_cls=TextLoader
)
documents = loader.load()

# 2. Split into chunks (critical for retrieval quality)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # Characters per chunk
    chunk_overlap=200,  # Overlap to preserve context
    separators=["\n\n", "\n", " ", ""]  # Split on paragraphs, then sentences
)
chunks = text_splitter.split_documents(documents)

# 3. Create embeddings and vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"  # Persist to disk
)

# 4. Create retrieval chain
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}  # Retrieve top 4 chunks
)

qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4o", temperature=0),
    chain_type="stuff",  # "stuff" = inject all docs into prompt
    retriever=retriever,
    return_source_documents=True  # Include sources in response
)

# 5. Query the knowledge base
result = qa_chain.invoke({"query": "How do I configure authentication?"})

print(result["result"])
print("\nSources:")
for doc in result["source_documents"]:
    print(f"- {doc.metadata['source']}")

Advanced RAG: Multi-Query Retrieval

Generate multiple query variations to improve recall:

from langchain.retrievers import MultiQueryRetriever

# Automatically generates 3 variations of the user query
retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(),
    llm=ChatOpenAI(model="gpt-4o-mini")
)

# User asks: "How do I deploy?"
# LLM generates:
# 1. "What are the deployment steps?"
# 2. "How to configure production deployment?"
# 3. "Deployment guide and instructions"
# → Retrieves results for all 3, deduplicates

RAG with Re-Ranking

Improve relevance by re-scoring retrieved documents:

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# 1. Initial retrieval (fast, may include irrelevant docs)
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

# 2. Re-rank with LLM (slow, but accurate)
compressor = LLMChainExtractor.from_llm(ChatOpenAI(model="gpt-4o-mini"))
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=base_retriever
)

# Retrieves 10 chunks, filters to most relevant 3-4

LangGraph: Building Multi-Agent Systems

LangGraph is LangChain’s framework for building stateful, cyclic workflows. Unlike linear chains, LangGraph supports loops, conditionals, and multi-agent collaboration.

LangGraph Core Concepts

  • Nodes: Functions that process state
  • Edges: Transitions between nodes
  • State: Shared data passed through the graph
  • Conditional edges: Dynamic routing based on state

Creating a Research Agent with LangGraph

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage
import operator

# 1. Define state (shared across all nodes)
class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    research_results: str
    should_continue: bool

# 2. Define nodes (agent actions)
def researcher(state: AgentState) -> AgentState:
    """Research the topic using search tools."""
    query = state["messages"][-1].content
    # Use search agent to gather information
    search_agent = initialize_agent(tools=[search_tool], llm=llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
    result = search_agent.invoke({"input": f"Research: {query}"})

    state["research_results"] = result["output"]
    state["should_continue"] = True
    return state

def writer(state: AgentState) -> AgentState:
    """Write a report based on research."""
    research = state["research_results"]

    prompt = f"Write a comprehensive report based on this research:\n\n{research}"
    response = llm.invoke(prompt)

    state["messages"].append(response)
    state["should_continue"] = False
    return state

def reviewer(state: AgentState) -> AgentState:
    """Review the report quality."""
    report = state["messages"][-1].content

    prompt = f"Review this report for accuracy and completeness:\n\n{report}\n\nIs it ready to publish? Reply 'APPROVED' or 'NEEDS_REVISION'"
    review = llm.invoke(prompt).content

    if "APPROVED" in review:
        state["should_continue"] = False
    else:
        state["should_continue"] = True
        state["messages"].append(f"Revision needed: {review}")

    return state

# 3. Build the graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("researcher", researcher)
workflow.add_node("writer", writer)
workflow.add_node("reviewer", reviewer)

# Add edges
workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", "reviewer")

# Conditional edge: loop if revision needed
workflow.add_conditional_edges(
    "reviewer",
    lambda state: "writer" if state["should_continue"] else END
)

# 4. Compile and execute
app = workflow.compile()

result = app.invoke({
    "messages": [HumanMessage(content="Research the impact of AI on healthcare")],
    "research_results": "",
    "should_continue": True
})

print(result["messages"][-1].content)

Flow:

  1. Researcher gathers information
  2. Writer creates a report
  3. Reviewer checks quality
  4. If approved → END
  5. If needs revision → loop back to Writer

Human-in-the-Loop with LangGraph

Add manual approval steps:

from langgraph.checkpoint.memory import MemorySaver

# Add checkpointing to save state
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

# Execute with interrupts
config = {"configurable": {"thread_id": "1"}}

# Run until human approval needed
for output in app.stream(input_data, config):
    print(output)
    if "reviewer" in output:
        # Pause for human review
        user_input = input("Approve? (yes/no): ")
        if user_input.lower() == "yes":
            # Continue execution
            pass
        else:
            # Provide feedback and re-run
            pass

Production LangChain: Best Practices

1. Error Handling and Retries

from langchain_core.runnables import RunnableWithFallbacks

# Fallback to cheaper model if primary fails
primary_chain = prompt | ChatOpenAI(model="gpt-4o")
fallback_chain = prompt | ChatOpenAI(model="gpt-4o-mini")

chain_with_fallback = primary_chain.with_fallbacks([fallback_chain])

# Automatic retry with exponential backoff
from langchain_core.runnables import RunnableRetry

chain_with_retry = RunnableRetry(
    runnable=chain,
    max_attempts=3,
    wait_exponential_jitter=True
)

2. Streaming Responses

For better UX, stream LLM outputs token-by-token:

for chunk in chain.stream({"question": "Explain quantum computing"}):
    print(chunk, end="", flush=True)

3. Batch Processing

Process multiple inputs efficiently:

questions = [
    {"question": "What is Python?"},
    {"question": "What is Java?"},
    {"question": "What is JavaScript?"}
]

# Parallel execution
results = chain.batch(questions)

4. Observability with LangSmith

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-api-key"

# All chains automatically log to LangSmith
# View traces at: https://smith.langchain.com

5. Prompt Management

from langchain.prompts import load_prompt

# Store prompts in JSON/YAML files
prompt = load_prompt("prompts/customer_support.json")

# Version control your prompts
# Track performance of different prompt versions in LangSmith

Conclusion: Building Production AI with LangChain

LangChain has evolved from a simple prompt wrapper to a comprehensive ecosystem for building production AI systems. Key takeaways:

  1. Start with chains, graduate to agents: Use simple chains for predictable workflows, agents for autonomous tasks
  2. Tools are critical: The value of agents comes from tool integration—invest in building robust custom tools
  3. Memory matters: Conversational agents need memory; choose the right type for your use case
  4. RAG is essential: For enterprise AI, RAG grounds responses in your data and reduces hallucinations
  5. Use LangGraph for complexity: Multi-step reasoning, human-in-the-loop, and multi-agent systems require LangGraph
  6. Production patterns:
    • Streaming for UX
    • Fallbacks for reliability
    • LangSmith for observability
    • Structured outputs with Pydantic

The future of LangChain includes:

  • LangGraph Studio: Visual graph builder
  • LangServe: Deploy chains as REST APIs
  • Deeper integrations: More pre-built tools and vector stores

LangChain is the Python equivalent of Semantic Kernel (.NET) and provides the most mature tooling for AI agents in the Python ecosystem.


Frequently Asked Questions (FAQ)

What is LangChain used for?

LangChain is used to build AI agents and applications powered by large language models (LLMs). It provides tools for prompt engineering, tool calling, memory management, RAG (Retrieval-Augmented Generation), and multi-agent workflows in Python.

Is LangChain free to use?

Yes, LangChain is open-source and free under the MIT license. However, you’ll need API keys for LLM providers (OpenAI, Anthropic, etc.) which have their own pricing. You can also use free local models with LangChain.

What’s the difference between LangChain and LangGraph?

LangChain provides linear chains and basic agents. LangGraph enables cyclic workflows with loops, conditionals, and multi-agent collaboration. Use LangGraph for complex, stateful systems that need iterative refinement.

Can I use LangChain with local LLMs?

Yes! LangChain supports Ollama, Hugging Face models, LlamaCpp, and other local LLM providers. You’re not locked into paid APIs like OpenAI.

How does LangChain RAG work?

LangChain RAG:

  1. Splits documents into chunks
  2. Converts chunks to vector embeddings
  3. Stores in a vector database (Chroma, Pinecone, Weaviate)
  4. At query time, retrieves relevant chunks
  5. Injects chunks into the LLM prompt as context

What is the difference between LangChain and Semantic Kernel?

LangChain is Python-first with a massive ecosystem of integrations. Semantic Kernel is .NET-focused with strong typing and enterprise patterns. LangChain has more community tools; Semantic Kernel has better Azure integration.

How do I debug LangChain agents?

Enable verbose mode (verbose=True) to see agent reasoning steps. Use LangSmith for detailed tracing, including token usage, latency, and errors. Add logging to custom tools.


Next Steps: Master LangChain

Coming Next in the Tools We Use Series:

  • AutoGen: Microsoft’s Multi-Agent Framework
  • CrewAI: Role-Based Multi-Agent Systems
  • LlamaIndex: Advanced RAG and Knowledge Graphs
Back to Blog