Skip to main content

Stateful agents with LangGraph

·6 mins

Most AI agents have the memory of a goldfish. Every run starts from zero.

You can’t pause and resume complex workflows, debugging failures requires starting from scratch. There’s no way to explore alternative execution paths. Human intervention means rebuilding the entire context.

Try explaining that to your users when their 3-hour analysis crashes at step 47.

Making agents remember everything #

LangGraph fixes this with checkpointing. Every single step your agent takes gets saved. You end up with a complete execution history you can actually navigate through.

Here's how to enable checkpointing:
from langgraph.graph import StateGraph
from langgraph.checkpoint.memory import MemorySaver

# Create a checkpointer (use SqliteSaver or PostgresSaver for production)
checkpointer = MemorySaver()

# Build your state graph
workflow = StateGraph(InvestmentResearchState)
workflow.add_node("gather_company_info", gather_company_info)
workflow.add_node("technical_analysis", perform_technical_analysis)
# ... add more nodes

# Compile with checkpointing enabled
graph = workflow.compile(checkpointer=checkpointer)

# Execute with a thread ID for state tracking
config = {"configurable": {"thread_id": "investment_001"}}
result = graph.invoke({"ticker": "AAPL"}, config)

And that’s literally all the setup. The agent now saves its state at every step.

Rewinding and replaying executions #

Here’s where it gets interesting. Say your agent just finished analyzing a stock, but you realize you set the wrong risk tolerance. Instead of starting over, you can jump back to any earlier step and try again with different parameters.

Time-travel implementation:
def demonstrate_time_travel(graph, config):
    # Get the complete execution history
    history = list(graph.get_state_history(config))

    # Find the checkpoint after technical analysis
    target_checkpoint = None
    for checkpoint in history:
        if "technical_analysis" in checkpoint.values.get("steps_completed", []):
            target_checkpoint = checkpoint
            break

    # Modify the state at that checkpoint
    modified_state = {
        **target_checkpoint.values,
        "risk_tolerance": "conservative",  # Was "moderate"
        "messages": target_checkpoint.values.get("messages", []) + [
            HumanMessage(content="Modified risk tolerance during time-travel")
        ]
    }

    # Create a new thread for the modified execution
    new_config = {"configurable": {"thread_id": "investment_001_modified"}}

    # Update state and resume from the checkpoint
    graph.update_state(new_config, modified_state)

    # Continue execution with modified parameters
    result = graph.invoke(None, new_config)

    return result

So now you have two different execution paths from the same starting point. Pretty useful when you’re trying to figure out what went wrong or what could go better.

The investment research agent #

I tested this with an agent that analyzes stocks. Nothing fancy, it just goes through the usual steps, but each one gets saved:

State definition with complete tracking:
class InvestmentResearchState(TypedDict):
    # Core identification
    ticker: str
    thread_id: str

    # Analysis results (each stage checkpointed)
    company_info: Optional[CompanyInfo]
    technical_analysis: Optional[TechnicalAnalysis]
    fundamental_analysis: Optional[FundamentalAnalysis]
    risk_assessment: Optional[RiskAssessment]
    recommendation: Optional[Recommendation]

    # Process tracking
    messages: Annotated[List[BaseMessage], add_messages]
    current_step: str
    steps_completed: List[str]

    # Human-in-the-loop controls
    human_approval_required: bool
    approved_by_human: bool

    # Configuration
    risk_tolerance: str  # "conservative", "moderate", "aggressive"
    analysis_depth: str  # "quick", "standard", "comprehensive"

Company info, technical analysis, fundamentals, risk scoring, and a final recommendation. Standard stuff, but now every step is replayable.

Pausing for human review #

When the agent finds something risky, it can pause and wait for you to take a look. The whole state just sits there, perfectly preserved, until you come back.

Implementing human review interrupts:
def assess_risks(state: InvestmentResearchState) -> InvestmentResearchState:
    # Perform risk analysis
    risk_score = calculate_risk_score(state)
    risk_level = determine_risk_level(risk_score)

    state["risk_assessment"] = RiskAssessment(
        risk_level=risk_level,
        risk_score=risk_score,
        risk_factors=identify_risk_factors(state),
        mitigation_strategies=suggest_mitigations(state)
    )

    # Require human approval for high-risk investments
    if risk_level in [RiskLevel.HIGH, RiskLevel.CRITICAL]:
        state["human_approval_required"] = True

    return state

def human_review(state: InvestmentResearchState) -> InvestmentResearchState:
    # This creates an interrupt that pauses execution
    review_summary = f"""
    Investment Review Required for {state['ticker']}
    Risk Level: {state['risk_assessment'].risk_level.value}
    Risk Score: {state['risk_assessment'].risk_score:.2f}

    Key Risk Factors:
    {format_risk_factors(state['risk_assessment'].risk_factors)}
    """

    # Interrupt execution for human input
    raise NodeInterrupt(review_summary)

The agent just stops and waits here. And when you come back, it’ll pick up right where you left it.

Running parallel what-if scenarios #

Sometimes you want to try different approaches. Just fork the state and run both:

Creating execution forks:
def create_analysis_fork(graph, original_config):
    # Get current state
    current_state = graph.get_state(original_config)

    # Create two different analysis paths
    conservative_config = {"configurable": {"thread_id": "analysis_conservative"}}
    aggressive_config = {"configurable": {"thread_id": "analysis_aggressive"}}

    # Fork 1: Conservative analysis
    conservative_state = {
        **current_state.values,
        "risk_tolerance": "conservative",
        "analysis_depth": "comprehensive"
    }
    graph.update_state(conservative_config, conservative_state)

    # Fork 2: Aggressive analysis
    aggressive_state = {
        **current_state.values,
        "risk_tolerance": "aggressive",
        "analysis_depth": "quick"
    }
    graph.update_state(aggressive_config, aggressive_state)

    # Run both forks
    conservative_result = graph.invoke(None, conservative_config)
    aggressive_result = graph.invoke(None, aggressive_config)

    return compare_results(conservative_result, aggressive_result)

Now you can compare both results side by side. Each path has its own complete history too.

Recovering from failures #

When something breaks (and it will), you don’t lose everything. LangGraph keeps what worked, so you just fix the problem and continue:

Handling and recovering from failures:
def simulate_failure_and_recovery(graph, config):
    try:
        # Simulate a failure during fundamental analysis
        result = graph.invoke({"ticker": "AAPL", "simulate_error": True}, config)
    except Exception as e:
        print(f"Execution failed: {e}")

        # Get state at failure point
        failed_state = graph.get_state(config)

        # Fix the issue (remove error flag, update data source, etc.)
        recovery_state = {
            **failed_state.values,
            "simulate_error": False,
            "recovery_attempts": failed_state.values.get("recovery_attempts", 0) + 1,
            "last_error": str(e)
        }

        # Resume from the failed step
        graph.update_state(config, recovery_state)
        result = graph.invoke(None, config)

        return result

It picks up from the failure point with whatever results it had already calculated. No starting from scratch.

Persistence options #

The examples use MemorySaver because it’s simple, but for real deployments you’ll want something more robust:

# SQLite for single-machine deployments
from langgraph.checkpoint.sqlite import SqliteSaver
checkpointer = SqliteSaver.from_conn_string("agents.db")

# PostgreSQL for distributed systems
from langgraph.checkpoint.postgres import PostgresSaver
checkpointer = PostgresSaver.from_conn_string(
    "postgresql://user:pass@localhost/agents"
)

Same interface either way, so switching later is painless.

Digging through the history #

Every checkpoint saves metadata too, which is handy for debugging:

Viewing execution history:
def inspect_checkpoint_history(graph, config):
    history = list(graph.get_state_history(config))

    for i, checkpoint in enumerate(history):
        print(f"\nCheckpoint {i}:")
        print(f"  ID: {checkpoint.config['configurable']['checkpoint_id']}")
        print(f"  Step: {checkpoint.values.get('current_step', 'N/A')}")
        print(f"  Completed: {checkpoint.values.get('steps_completed', [])}")
        print(f"  Messages: {len(checkpoint.values.get('messages', []))}")
        print(f"  Timestamp: {checkpoint.values.get('analysis_timestamp', 'N/A')}")

        # Show any errors
        if checkpoint.values.get('last_error'):
            print(f"  Error: {checkpoint.values['last_error']}")

You can see exactly what your agent did and when.

Try it yourself #

Grab the code from GitHub here.

Then run:

uv sync

uv run python main.py

The demo runs through everything automatically: checkpointing, rewinding, forking, recovery.

When you run the full demo, here’s roughly what happens:

Analyzing AAPL...
- Got company info
- Did technical analysis
- Checked fundamentals
- Calculated risk score
- Generated recommendation: HOLD

Found 6 checkpoints saved

Rewinding to technical analysis checkpoint...
Changing risk tolerance to conservative
Resuming from there...
New recommendation: SELL

Trying aggressive settings on a fork...
That one says: BUY

Simulating a network failure...
Recovered and finished anyway

So what’s the big deal? #

This changes how you can build agents. During development, you can step through checkpoints to debug weird behavior. In production, crashes don’t mean starting over.

You can create test scenarios from any checkpoint. Want to try different strategies? Just fork the state and compare. Need an audit trail? Every decision is already saved.

I used an investment agent as the example, but this works for anything with multiple steps where you need to track state.

Next time you’re building an agent that takes more than a few steps, try adding checkpointing. It makes debugging so much easier.