LangGraph for Complex Agent Workflows

Graph-based orchestration for multi-step LLM workflows

Sequential chains were sufficient when LLM applications consisted of simple prompt-response patterns. You had a user query, you generated a response, you returned it. LangChain's LCEL made this easy: chain together a prompt template, an LLM call, and an output parser. The flow was linear, predictable, and sufficient for 80% of early LLM use cases.

But production systems rarely stay simple. A RAG pipeline that starts with "retrieve and answer" quickly becomes "retrieve, check relevance, re-query if needed, synthesize, validate citations, retry on failure." A research assistant that begins with "summarize this paper" evolves into "extract claims, verify against sources, flag contradictions, generate questions for unclear sections." The moment you need conditional logic, loops, or stateful multi-step reasoning, sequential chains break down.

This is where LangGraph enters. It's a framework for building LLM applications as directed graphs, where nodes represent computational steps and edges represent transitions between them. State flows through the graph, accumulating context as it moves. You define not just what happens at each step, but how the system decides which step comes next.

When Sequential Chains Aren't Enough

Consider a customer support agent that needs to: identify the issue, check if it's in the knowledge base, escalate to a human if needed, attempt a solution, verify the solution worked, and log the interaction. A sequential chain assumes each step always happens in order. But what if the knowledge base search returns nothing? What if the proposed solution fails? What if the user asks a follow-up question mid-process?

You could hack this with branching logic in your chain, but that quickly becomes unmaintainable. Each new conditional requires rewiring the entire flow. Error handling becomes a mess of try-except blocks scattered throughout your code. State management turns into global variables or awkward context passing. The control flow is implicit, hidden in the logic of your chain functions. LangChain's RunnablePassthrough and RunnableLambda can handle some branching, but they force you to encode graph structure as nested function calls. It works, but it's not readable at scale.

LangGraph makes the control flow explicit. You define nodes for each operation: identify_issue, search_kb, attempt_solution, verify_success. You define edges that specify transitions: "if knowledge base returns results, go to attempt_solution; if it fails, go to escalate_to_human." State is a typed object that flows through the graph, accumulating information at each step. The entire workflow is visible as a graph, not hidden in imperative code.

Graph-Based Orchestration: Concepts

LangGraph models LLM workflows as state graphs. A state graph consists of nodes (computational steps), edges (transitions), and a state schema (the data structure that flows through the graph). Each node is a function that takes the current state and returns an updated state. Edges can be conditional, allowing the graph to branch based on state values.

The state schema is defined using Python's type system. For a RAG pipeline, you might have:

from typing import TypedDict, List

class RAGState(TypedDict):
    query: str
    retrieved_docs: List[str]
    relevance_scores: List[float]
    answer: str
    needs_requery: bool
    error: str | None

Nodes are functions that modify this state. A retrieval node fetches documents and updates retrieved_docs. A relevance check node scores them and sets needs_requery if scores are too low. A generation node produces answer. Each node returns a partial state update, which LangGraph merges into the current state.

Edges connect nodes. A normal edge is a fixed transition: after retrieval, always go to relevance check. A conditional edge branches based on state: if needs_requery is true, go back to retrieval with a refined query; otherwise, proceed to generation. This is where graph-based orchestration shines: the control flow is declarative, not hidden in if-statements.

State Management in Multi-Step Workflows

State management is the core of LangGraph. Unlike sequential chains where each step is a black box, LangGraph maintains an explicit state object that every node can read and modify. This solves several problems: State reducer example:

from langgraph.graph import StateGraph

def add_message(state, new_msg):
    return {
        **state,
        "messages": state["messages"] + [new_msg]
    }

graph = StateGraph(ChatState)
graph.add_node("chat", chat_node, reducer=add_message)

The reducer controls how state updates are merged, preventing nodes from accidentally overwriting critical state.

Context accumulation: A research agent can accumulate findings across multiple retrieval steps. Each node adds to a list of sources, claims, or questions. The final synthesis node has access to everything gathered during the workflow.
Error propagation: If a node encounters an error, it sets state["error"]. A conditional edge can route to a recovery node or gracefully exit. Errors don't silently corrupt state or crash the pipeline.
Retry logic: A node can increment state["retry_count"]. A conditional edge checks if retries are exhausted. If not, it loops back to the same node with modified parameters. If yes, it routes to a fallback.
Observability: Because state is explicit, you can log it at every node. Debugging becomes straightforward: inspect the state history, see exactly what each node saw and produced.

LangGraph supports state reducers, functions that control how state updates are merged. By default, new values overwrite old ones. But for lists, you might want to append. For counters, you might want to increment. For nested objects, you might want to deep merge. Reducers make this explicit and prevent subtle bugs where nodes accidentally clobber each other's state.

Conditional Branching and Cycles

The power of LangGraph lies in conditional branching and cycles. Sequential chains assume a DAG: a directed acyclic graph where each step happens once. But real workflows loop. A code generation agent tries to generate code, runs tests, and if tests fail, regenerates. A planning agent proposes a plan, simulates it, and if the simulation fails, revises the plan. These are cycles, not DAGs.

LangGraph handles this with conditional edges. You define a routing function that inspects state and returns the next node name. For example:

def route_after_generation(state):
    if state["tests_passed"]:
        return "finalize"
    elif state["retry_count"] < 3:
        return "regenerate"
    else:
        return "fallback"

graph.add_conditional_edges(
    "generate_code",
    route_after_generation
)

This creates a cycle: generate_code can loop back to itself via regenerate. The cycle terminates when tests pass or retries are exhausted. Without this, you'd need to manually manage a while loop, check conditions, and track state across iterations. LangGraph makes the cycle part of the graph definition.

Cycles enable powerful patterns. A reflection pattern: generate output, critique it, regenerate based on critique, repeat until quality threshold is met. A search pattern: propose hypothesis, gather evidence, update hypothesis, repeat until convergence. A planning pattern: plan, execute, observe, replan, repeat until goal is achieved. These are not exotic; they're how complex agents work in practice.

Error Handling and Retries

Production LLM systems fail. The model times out. The API rate limit is hit. The retrieval returns empty results. The generated output is malformed. You need error handling that doesn't halt the entire pipeline.

LangGraph's approach is to treat errors as state transitions. A node that might fail sets state["error"] if something goes wrong. A conditional edge checks this and routes to a recovery node. The recovery node might retry with exponential backoff, fall back to a simpler model, or return a cached response. The key is that error handling is part of the graph, not scattered in try-except blocks.

For retries, the pattern is straightforward: track retry_count in state, increment it on failure, route back to the failing node if retries remain, route to a fallback if exhausted. This is more robust than a naive retry loop because state is preserved across retries. You can adjust parameters, switch models, or add context on each retry. Retry with backoff:

def retry_with_backoff(state):
    retry_count = state.get("retry_count", 0)
    if retry_count > 0:
        time.sleep(2 ** retry_count)
    state["retry_count"] = retry_count + 1
    return state

graph.add_node("retry", retry_with_backoff)

Each retry waits longer, preventing immediate re-requests that might hit rate limits.

Another pattern is checkpointing: persist state after each node, and if the pipeline crashes, resume from the last checkpoint. LangGraph supports this natively via MemorySaver or database-backed persistence. For long-running workflows, this is essential. You don't want to re-run 10 minutes of retrieval because the final synthesis step failed.

Real-World Use Cases

Compliance Document Analysis

A financial institution needs to analyze adverse media articles for AML compliance. The workflow: extract entities, classify article relevance, retrieve historical data on entities, cross-reference with watchlists, generate a risk report, and flag for human review if confidence is low.

With sequential chains, you'd struggle with conditional logic. What if entity extraction fails? What if the watchlist lookup times out? What if multiple articles reference the same entity and you need to deduplicate? LangGraph handles this: nodes for each step, conditional edges to retry on failure, state accumulation to track all entities and articles, and a final decision node that routes to human review or automated approval.

Code Generation with Validation

An agent generates Python code based on a natural language spec. It must generate, validate syntax, run tests, and if tests fail, regenerate with error feedback. This is inherently cyclic. A sequential chain can't loop. LangGraph can: generate → validate_syntax → run_tests → conditional edge back to generate if tests fail, or proceed to finalize if they pass.

The state accumulates the spec, generated code, syntax errors, test results, and retry count. The generation node sees all previous errors, so it can avoid repeating mistakes. This context-aware retry is impossible with naive loops.

Research Assistant

A research agent answers complex questions by: decomposing the question into sub-questions, retrieving evidence for each, verifying consistency across sources, synthesizing an answer, and generating follow-up questions for gaps.

This requires branching (different sub-questions go to different retrieval nodes), cycles (if evidence is insufficient, refine the sub-question and retrieve again), and state management (track all evidence, sources, and confidence scores). LangGraph's graph structure maps directly to this workflow. Each sub-question spawns a subgraph, results are aggregated in state, and the synthesis node has full visibility into all evidence.

Comparison with Alternatives

CrewAI

CrewAI focuses on multi-agent collaboration. You define agents with roles, assign them tasks, and they coordinate via a shared message bus. It's higher-level than LangGraph: you don't define nodes and edges, you define agents and tasks. CrewAI handles delegation automatically.

The tradeoff: less control. CrewAI's abstraction is powerful for straightforward multi-agent scenarios, but if you need fine-grained control over state transitions, error handling, or conditional logic, you'll fight the framework. LangGraph is lower-level, but that gives you precision. You define the graph explicitly, so there's no hidden magic.

Use CrewAI if you want a team of agents to collaborate on a task and you're okay with delegating orchestration to the framework. Use LangGraph if you need to define the exact control flow and state management logic.

AutoGen

AutoGen, from Microsoft, emphasizes conversational multi-agent systems. Agents communicate via messages, and the framework manages turn-taking and group chat dynamics. It's designed for scenarios where agents debate, critique each other, or collaboratively solve problems.

AutoGen's strength is agent interaction patterns: reflection, debate, critique. But it's less focused on DAGs or graph-based workflows. If your use case is "multiple agents discussing a problem," AutoGen is a good fit. If your use case is "a complex pipeline with conditional branching and state management," LangGraph is better suited.

LangChain vs. LangGraph

LangChain and LangGraph are complementary. LangChain provides chains, prompts, and tool integrations. LangGraph provides orchestration. In practice, you use both: LangChain for the nodes (each node might be a LangChain chain), LangGraph for the graph structure.

If your workflow is linear, stick with LangChain's LCEL. If you need cycles, conditional branching, or stateful multi-step logic, add LangGraph. Don't overcomplicate simple tasks, but don't underestimate the complexity of production workflows.

Practical Considerations

LangGraph is not magic. It adds complexity. You now have a graph to maintain, state schemas to design, and conditional logic to test. For simple tasks, this overhead isn't worth it. But for production systems where workflows evolve, requirements change, and robustness matters, the upfront investment pays off.

Debugging is easier because state is explicit. You can visualize the graph, inspect state at each node, and trace execution. This is harder with imperative code where control flow is implicit. Testing is easier because each node is a pure function from state to state. You can unit test nodes in isolation, then integration test the graph.

The main challenge is designing the state schema. If you make it too granular, every node has to handle dozens of fields. If you make it too coarse, you lose type safety and clarity. The sweet spot is a state schema that reflects your domain: for RAG, queries and documents; for code generation, specs and tests; for compliance, entities and risk scores. Keep it as simple as possible, but no simpler.

LangGraph is still evolving. The API changes, the documentation improves, and the ecosystem grows. It's not as stable as LangChain, but it's production-ready if you're comfortable with the abstraction. Expect to read source code, adapt examples, and iterate on your graph design. This is the cost of working at the frontier of LLM orchestration.

Conclusion

Sequential chains are sufficient until they're not. When your LLM workflow needs conditional logic, cycles, or stateful multi-step reasoning, LangGraph provides a structured way to manage that complexity. It makes control flow explicit, state management robust, and error handling systematic.

The alternatives—CrewAI, AutoGen—solve different problems. CrewAI is for multi-agent collaboration with minimal manual orchestration. AutoGen is for conversational agent interactions. LangGraph is for building production workflows where you need precision and control.

If you're building LLM systems that matter—systems where reliability, observability, and maintainability are non-negotiable—graph-based orchestration is worth learning. Start simple: build a linear workflow as a graph, then add a conditional edge, then add a cycle. The patterns will click, and you'll wonder how you ever managed complex workflows without explicit state graphs.