limitabl

rate limits, quotas, and spend controls for llm and agentic apps

once a request is identified, transformed, and authorized, the next governance question is:

how much can this user (or org, or agent) consume?

limitabl answers that question.

it enforces per-user, per-tenant, per-model, and per-provider limits β€” ensuring predictable costs, fair usage, and controlled spend across your entire llm ecosystem.

at a glance

limitabl is the usage, rate-limit, and spend-governance layer of gatewaystack.
it lets you:

πŸ“¦ implementation: ai-rate-limit-gateway + ai-cost-gateway (roadmap)

why now?

as llm adoption increases, so do cost overruns, unexpected spikes, and unbounded agent behavior.

organizations need:

limitabl is the answer β€” granular control over usage and spend.

designing the usage & spend governance layer

within the shared requestcontext

all gatewaystack modules operate on a shared RequestContext object.

limitabl operates in two phases:

phase 1 (pre-flight):

phase 2 (post-execution):

limitabl runs immediately before llm execution (pre-flight checks) and again after the provider responds (usage accounting).

pre-flight evaluation

in the pre-flight phase, limitabl evaluates:

if a limit is exceeded, limitabl returns a structured governance decision (deny, throttle, fallback, or degrade).

post-execution accounting

in the post-execution phase, limitabl records actual usage and updates internal counters and budgets.

the core functions

1. checkRateLimit β€” enforce per-user and per-tenant rates
requests per second/minute/hour, sliding windows, token buckets.

2. checkQuota β€” validate remaining quota
daily, weekly, monthly usage ceilings.

3. checkBudget β€” enforce cost ceilings
budget ceilings per user, tenant, environment, or provider.

4. checkRiskCost β€” risk-based spend rules
for example, only certain scopes can call expensive reasoning models.

5. applyThrottling β€” delay or reject when limits are hit
adaptive or static throttling policies.

6. fallbackProvider β€” reroute to cheaper models
automatic degradation modes when quotas or budgets are reached.

7. emitUsageEvents β€” log usage for observability
produces structured records for explicabl’s audit pipeline.

what limitabl does

what limitabl does not do

two-phase limit enforcement

limitabl’s two-phase design ensures both preventive controls and accurate accounting:

phase 1: pre-flight checks (before routing)

phase 2: usage accounting (after execution)

this ensures both preventive controls and accurate accounting.

limit configuration

limits are defined hierarchically:

limits:
  global:
    rate: 10000/min
    budget: $5000/day

  organizations:
    org_healthcare:
      rate: 1000/min
      budget: $500/day
      models:
        gpt-4:
          quota: 100000 tokens/day
          budget: $200/day

  users:
    user_doctor_123:
      rate: 100/min
      budget: $50/day

precedence: user limits override org limits, which override global limits.
enforcement: the most restrictive limit applies.

agent loop protection

limitabl prevents runaway agentic behavior:

agent_protection:
  max_tool_calls_per_workflow: 20
  max_recursion_depth: 5
  max_workflow_cost: $2.00
  max_workflow_duration: 120s
  duplicate_tool_detection:
    enabled: true
    threshold: 3  # same tool with same params

example: an agent enters an infinite web_search loop. after 10 identical calls, limitabl terminates the workflow and returns an error.

distributed rate limiting

in multi-instance deployments, limitabl uses shared state:

storage:
  backend: "redis"
  cluster:
    - "redis://primary:6379"
    - "redis://replica-1:6379"
  consistency: "strong"
  ttl: "3600s"

rate limits are enforced across all gatewaystack instances with strong consistency guarantees.

end to end flow

user
   β†’ identifiabl       (who is calling?)
   β†’ transformabl      (prepare, clean, classify, anonymize)
   β†’ validatabl        (is this allowed?)
   β†’ limitabl          (how much can they use? pre-flight constraints)
   β†’ proxyabl          (where does it go? execute)
   β†’ llm provider      (model call)
   β†’ [limitabl]        (deduct actual usage, update quotas/budgets)
   β†’ explicabl         (what happened?)
   β†’ response

limitabl enforces predictable usage, stable cost, and controlled access, preventing abuse, overspend, and runaway agents.

integrates with your existing stack

limitabl plugs into gatewaystack and your existing llm stack without requiring application-level changes. it exposes http middleware and sdk hooks for:

getting started

for limit configuration examples:
β†’ rate limit patterns
β†’ budget configuration guide
β†’ agent loop protection

for implementation:
β†’ integration guide

want to explore the full gatewaystack architecture?
β†’ view the gatewaystack github repo

want to contact us for enterprise deployments?
β†’ reducibl applied ai studio

app / agent
chat ui Β· internal tool Β· agent runtime
β†’
gatewaystack
user-scoped trust & governance gateway
identifiabl transformabl validatabl limitabl proxyabl explicabl
β†’
llm providers
openai Β· anthropic Β· internal models

every request flows from your app through gatewaystack's modules before it reaches an llm provider β€” identified, transformed, validated, constrained, routed, and audited.