57% of Companies Have AI Agents in Production. Most Are Struggling With the Same Thing.

According to LangChain's State of Agent Engineering report, 57% of respondents now have AI agents running in production. That number would have seemed ambitious two years ago. Today it just sounds like Tuesday.

But buried in the same data is a number that should get more attention: 32% of engineering teams cite output quality as their top barrier to production AI. Not cost. Not latency. Not the models themselves. The quality of what the agents actually produce when real users start throwing real inputs at them.

The Demo-to-Production Gap Is Real

There is a specific kind of pain that comes from watching an agent nail every eval you throw at it, shipping it, and then having it start hallucinating tool parameters on day three. It is not a model problem. The model is the same. What changed is everything around it.

Production inputs are messier than eval inputs. Context windows fill up in ways you did not anticipate. Tool calls chain together and surface edge cases that never appeared in your test suite. A user phrases something slightly differently than your prompt was tuned for, and the whole thing quietly goes sideways.

Quality in Production Means Something Specific

Consistency across diverse inputs — the agent handles the clean case well but degrades on anything outside the distribution it was tuned on
Graceful degradation when tools fail — instead of recovering, the agent loops, hallucinates a result, or returns something confidently wrong
Hallucinated tool parameters — the agent invents arguments for a function call rather than acknowledging it doesn't have the information it needs
Context loss on long-running tasks — multi-step tasks that work fine at step three start losing coherence by step eight as the context window fills

Rate Limits Are Also a Production Reality

Datadog's State of AI Engineering found that in February 2026, roughly 5% of all LLM call spans reported errors — and 60% of those errors were rate limits. That's a significant chunk of production failures that have nothing to do with prompt quality. They are infrastructure problems dressed up as AI problems.

Production readiness for agents is not just about getting the outputs right. It is about building systems that degrade gracefully when the infrastructure underneath them hiccups. Retry logic, fallback routing, and rate limit awareness are table stakes.

The Framework Moment

Framework adoption for agent development nearly doubled year-over-year — from around 9% of organisations in early 2025 to roughly 18% by early 2026. LangChain, LangGraph, Pydantic AI, and Vercel AI SDK are all gaining ground.

The core insight

The quality problem in production AI is not a model problem. It is a systems problem. Production quality is determined by how well your surrounding system handles the inputs the model wasn't trained on, the tools that don't behave as expected, and the edge cases your evals never surfaced.

What Teams Doing This Well Are Actually Doing

Building evaluation sets from real production traffic, not synthetic examples — so evals reflect the actual distribution of inputs the agent will see
Using structured outputs with validation at every tool call boundary — if the model can't produce a valid structured response, treat it as a failure and handle it explicitly
Adding human-in-the-loop checkpoints for high-stakes decisions — as a circuit breaker while you build confidence in the agent's behaviour
Shipping in shadow mode before full rollout — running the agent on real traffic, logging outputs, but not acting on them until quality is verified
Treating context management as a first-class engineering concern — designing tasks so the information the agent needs is available when it needs it
Building explicit failure modes into tool definitions — so when a tool call can't be completed, the agent returns a structured 'I cannot do this' rather than inventing an answer

The technology is ready enough. The engineering practices are catching up. The gap is closeable — but only if you take it seriously as an engineering problem rather than a model problem.

57% of Companies Have AI Agents in Production. Most Are Struggling With the Same Thing.

The Demo-to-Production Gap Is Real

Quality in Production Means Something Specific

Rate Limits Are Also a Production Reality

The Framework Moment

What Teams Doing This Well Are Actually Doing

Sources

More from the blog

Tech Leaders Are Revising Their AI Jobs Predictions. Here's What the Data Actually Shows.

6,852 Sessions Don't Lie: How Developers Caught Claude's Regression Before Anthropic Did