REST, GraphQL, and the Age of Multi-Agent Traffic for APIs
Real patterns, failure modes, and the simple changes that stop N+1 storms, protect SLAs, and cut bills.
The first time we opened our APIs to an internal pricing agent, traffic looked normal at 9:00 a.m. By 9:03, read QPS was up 12x, cache hit rate fell by half, and our latency S-curve grew a second tail. No one “DDoS’d” us. We’d simply connected a tireless client that never gets bored, never takes lunch, and happily retries until your error budget cries uncle. That day changed how I think about software architecture in a world of agentic and multi-agent (agent-to-agent) systems.
This piece is the playbook I wish I had then: what agent traffic really does to REST and GraphQL services, what it means for SLAs, and how both small and large teams can get ahead of it without drowning in rework.
Humans browse. Agents enumerate. A human session might fire 30–80 requests over several minutes. A single agent run can burst hundreds or thousands of calls in seconds, repeat similar queries with tiny variations, and retry aggressively on any non-200 hint. In multi-agent setups—planner → researcher → executor → validator—each “step” fans out, producing a cascade of reads and writes with tight feedback loops. Three things follow:
Chattiness becomes your bottleneck. N+1 patterns you’ve lived with for years suddenly dominate cost and latency.
Retries amplify small mistakes. A bad 429 or vague 5xx without backoff hints can trigger a thundering herd.
Determinism matters. Agents deal poorly with flaky read-after-write behavior, fuzzy pagination, or undocumented side effects.
If you design for people, you optimize for happy paths and pixels. If you design for agents, you design for discipline—contracts, idempotency, and backpressure.
REST vs. GraphQL when the caller is an agent
I’ve run both at scale. Each can win; each can hurt you in different ways.
REST: boring, predictable, and easier to cache
The upside: stable shapes, straightforward caching, and clear verbs. Agents love predictability; CDNs love stable URLs.
The catch: REST endpoints often grow “accidental” chattiness—list, then fetch by id, then join by another list—and agents will hammer that path. If you stay with REST:
Batch by default. Offer bulk GETs (
/items?ids=…
), bulk writes with idempotency keys, and “since=” snapshot endpoints to avoid pagination storms.Stabilize reads. ETags and
If-None-Match
aren’t optional anymore. A 304 is the cheapest request you’ll ever serve.Make long work asynchronous. Return
202 Accepted
with a job resource (/jobs/{id}
) and clearRetry-After
. Agents will poll; give them a sane target.
GraphQL: flexible, introspectable, and easy to abuse
The upside: agents can express exactly what they want in one call, which cuts chattiness. The downside: without guardrails, they’ll craft deep, expensive queries that melt resolvers.
If you go GraphQL:
Turn on query cost and depth limits. Treat them like seatbelts, not “nice to haves.”
Use persisted queries. Lock down the shapes that are allowed in production. APQ or similar keeps the graph fast and cacheable.
Resolve the N+1 problem. DataLoader-style batching at every boundary, or your “one request” becomes 1,000 downstream calls.
Control introspection in prod. Keep it off or scoped; ship a schema registry and docs instead.
REST rewards discipline in endpoint design. GraphQL rewards discipline in governance. Either way, you need a plan.
SLAs and SLOs: separate humans from agents
Throwing all traffic into one bucket is how you lose both. I now define and measure separate SLOs:
Human traffic SLOs: P95 < 300 ms for reads, P99 < 800 ms for writes, 99.9% availability.
Agent traffic SLOs: often a notch lower on latency, but far tighter on correctness and backpressure signals. I’d rather give an agent a fast, explicit 429 with a
Retry-After: 2
than struggle at P99.
Two hard rules that saved us:
Fairness by design. Token buckets per client, per route, plus concurrency caps. Agents get clear budgets. Humans get protected lanes.
Make backoff machine-readable. Standardize 429/503 responses with
Retry-After
,X-RateLimit-Remaining
, and opaquex-request-id
for tracing.
Your SLA isn’t just uptime; it’s contract clarity under load.
These are the changes that moved the needle for us under agent load:
Idempotency everywhere. Any non-GET write accepts
Idempotency-Key
and stores the result for a short window. Agents retry; you shouldn’t double-charge or double-create.Job resource for long work. If it takes >300–500 ms, consider a job. Polling is fine when you control the cadence.
Snapshots and deltas. “Give me everything since cursor X” beats walking paginated lists. Agents often need all the changes—help them ask for it once.
Cache like your budget depends on it. Because it does. Strong
Cache-Control
,s-maxage
,stale-while-revalidate
, and careful key design. GraphQL? Cache persisted operations at the edge.First-class batching. Don’t make callers discover hidden batch endpoints. Document them and show examples in your SDKs.
Deterministic errors. Structured error codes, not essays. Agents can learn rules; they can’t parse your poetry.
Security and safety when clients are software
Malice isn’t required to cause damage. Well-meaning agents can spiral.
Separate identities. Service accounts for agents with narrow scopes. Short-lived tokens. Rotate often.
Declare intent. Require an
X-Agent-Intent
or similar tag tied to scopes. It helps with policy and forensics.Guardrails at the edge. WAF rules for obvious floods, route-level quotas, and anomaly detection on “query cost per second.”
Audit you can read. “Who did what, why, and on behalf of whom?” needs to be a single query, not a weeklong excavation.
Observability for agent ecosystems
If you can’t see it, you can’t fix it. Minimum bar:
Trace every hop. Propagate correlation ids end-to-end across gateway, app, and data stores.
Tag by principal. Distinguish human vs. agent, service account id, and the agent “run id.” Your dashboards should break this down by default.
Watch P95 and request shapes. A flat P95 can hide a handful of ultra-expensive GraphQL operations. Track “top 10 query shapes by cost.”
Run canaries that behave like agents. Synthetic jobs that batch, retry with backoff, and exercise job endpoints, not just trivial pings.
Cost control
Agents turn small inefficiencies into monthly invoices. Three levers matter:
Edge caching first. Push stable reads out to the CDN and cache persisted GraphQL ops.
Connection discipline. HTTP/2 or HTTP/3 keep-alives, tuned connection pools, and backpressure on the server.
Right-size autoscaling. Scale on work, not just CPU. KEDA/HPA on QPS, queue depth, or in-flight jobs prevents thrash.
And yes, measure egress. Agents love to download “just to be safe.” Your bill doesn’t.
Preparing small teams: the paved footpath
If you’ve got a handful of engineers, you don’t need a platform org to survive agent traffic. You need a short list of non-negotiables and the discipline to ship them.
Start with an API gateway (eg. AWS API Gateway) to enforce rate limits, quotas, and mTLS or OAuth. Add idempotency keys on writes and convert anything slow into a job resource with clear polling rules. Publish OpenAPI or GraphQL SDL and ship a minimal SDK with baked-in backoff and batch helpers. Turn on edge caching with sane TTLs. Add two dashboards: one for human traffic, one for agents, both with P95/P99, 429s, and top request shapes. You can do all of this in weeks, not quarters.
What I learned the hard way: don’t chase the “perfect” architecture. Ship the guardrails first. Most outages aren’t exotic; they’re missing backoff, missing batching, or missing cache headers.
Preparing large teams: the paved road
At scale, coordination beats heroics. The point isn’t more rules; it’s fewer surprises.
Stand up a small API platform with teeth: an internal gateway that applies global policies (quotas, auth, headers) and a schema registry (OpenAPI/GraphQL) with breaking-change checks in CI. Adopt contract testing between services so multi-agent flows don’t snap on deploy day. Centralize query cost governance for GraphQL—persisted operations, cost budgets per tenant, and a review path for new shapes. Introduce priority lanes: humans first, partners second, exploratory agents in a sandbox with tight limits. Run an agent simulator in pre-prod that generates fan-out, retries, and mixed REST/GraphQL patterns; promote only if the simulator stays green.
And don’t bury this under process. The best platform groups I’ve worked with ship paved-road libraries so teams get batching, idempotency, and backoff by default, with zero ceremony.
Agentic Architecture Checklist
I don’t believe in silver bullets, but I do believe in checklists. Here’s the one I use before inviting agents in:
Add idempotency keys to all non-GET writes; store responses for 24–72 hours.
Expose a job resource for anything >300–500 ms and return
202
withRetry-After
.Caching:
Cache-Control
,ETag
,s-maxage
,stale-while-revalidate
.Provide batch endpoints and “since cursor” deltas; document them with examples.
For GraphQL: enforce depth/cost limits and persisted queries; cache at the edge.
Enforce per-client rate limits and quotas at the gateway; publish remaining budget headers.
Ship structured errors with machine-readable codes and clear retry hints.
Tag and trace: human vs. agent, service account id, agent run id in every span.
Split SLOs and dashboards for humans vs. agents; alert on 429 rate and top costly shapes.
Require short-lived creds for agents; rotate automatically; log “who/what/why” for every action.
What I’ve learned is simple: agentic systems don’t require a new religion. They require grown-up engineering. Clear contracts. Predictable backpressure. Caching that actually caches. And a bit of humility about how machines behave when we give them the keys. If you put those pieces in place, REST or GraphQL can carry you a long way. If you don’t, multi-agent will teach you the lesson during business hours—with interest.