The Aforo Edge
|29 min read

Monetizing AI Agents and MCP Servers: The Billing Architecture for the Agentic Economy

AI agents and MCP servers don't bill like SaaS. Per-tool pricing, session-based billing, and agent-level cost attribution demand a new billing architecture. Here is how to build it.

The Agentic Economy Is Here — And It Doesn't Bill Like SaaS

It's 2:47 AM. Your AI agent is autonomously working through a customer's data integration request. Without a single human instruction, it:

  1. Calls the read_database_schema tool (analyzing table structure)
  2. Invokes validate_credentials (checking API permissions)
  3. Maps fields using column_matcher (intelligent type detection)
  4. Calls execute_transform 47 times (parallelized across data partitions)
  5. Invokes verify_output_integrity (quality assurance)
  6. Logs the job with record_metadata (audit trail)

Total tool invocations: 52 distinct calls, touching 6 different MCP (Model Context Protocol) servers, executing across 14 minutes and 3 separate sessions. Your billing system records: "52 API calls at $0.001 each = $0.052."

You just missed $12,000 in revenue.

This is the silent crisis in agentic billing. The SaaS billing model you inherited—the one that works beautifully for REST APIs and webhooks—has catastrophically broken assumptions when applied to autonomous agents. An API billing model treats every call as equivalent. An agentic workload treats every tool invocation as a heterogeneous event: some tools are expensive (LLM invocations with multi-agent reasoning), others are cheap (simple lookups). Some run in sequences (dependencies), others in parallel (fan-out). Some sessions are worth $1, others are worth $10,000.

The Frankenstein billing stack tries to force this into a single dimension: "requests × unit price." It fails because agents are multidimensional. They operate across sessions, tools, tokens, compute time, and team hierarchies simultaneously. Trying to capture this richness with a flat per-request model is like charging for a chauffeur service by the number of times the driver turns the steering wheel—technically countable, but economically meaningless.

By the end of this article, you'll understand why the first wave of AI-native SaaS companies—the ones building MCP servers, deploying agents to production, and monetizing autonomous workloads—are rethinking billing at the architecture level. And you'll see exactly how Aforo solved it, because we had to. No vendor was building this.


What Changed: From Request-Response to Autonomous Sessions

For two decades, SaaS billing optimized for synchronous request-response: client calls endpoint, gets result, pays. Stalwart SaaS metrics—API calls, requests per second, quota tiers—emerged because this was the operational reality. You could count transactions. You could rate-limit. You could predict costs.

Agents shattered this assumption.

When Claude, GPT-4, Anthropic, and OpenAI began shipping agentic capabilities, they made tool use first-class. An agent doesn't ask permission before invoking a tool; it reasons about which tools to call, in what order, how many times, and whether to retry. This autonomy is the feature. It's also the billing nightmare.

Here's what changed operationally:

DimensionTraditional SaaS APIAgentic Workload
Execution ModelSynchronous — request in, response out, billedAsynchronous multi-step sessions — agent deliberates, invokes 14 tools, retries on failure, resumes after pause. Duration and tool count are variable.
Tool StructureSingle tool per call — one endpoint, one actionTool graphs — conditional branches, fan-out, retry paths. A success path costs $0.50; a failed retry path costs $2.00.
Cost AttributionHuman makes request → human paysAgent autonomously invokes 47 tools — who pays? The deploying team? The agent creator? The end customer? Multi-level hierarchy.
Cost PredictabilityDeterministic — fixed compute per endpointProbabilistic — LLM reasoning depth, error rates, retry logic all affect final cost. No two sessions cost the same.

These shifts require a new architecture. Not a new metric (you still count API calls), but a new shape of billing: multidimensional, hierarchical, asynchronous, and probabilistic.


Four Product Types, Four Billing Surfaces

To understand agentic billing, first understand that there are now four distinct product types in the SaaS ecosystem, each with its own billing surface:

Standard API

  • Hierarchy: Customer → Team → App → API Key
  • Usage Layer: App Instance level
  • Credential: BEARER_TOKEN (sk_live_...)
  • Metric: API calls, requests per second
  • Billing Trigger: HTTP request arrives
  • Example: Stripe, Twilio, AWS API

Agentic API

  • Hierarchy: Customer → Team → App → API Key
  • Usage Layer: App Instance level
  • Credential: BEARER_TOKEN (sk_live_...)
  • Metric: API calls, but can include nested agent reasoning cost
  • Billing Trigger: HTTP request, potentially with sub-calls
  • Example: OpenAI API with tool use, Claude API with extended thinking

AI Agent (Managed)

  • Hierarchy: Customer → Team → Agent → API Key
  • Usage Layer: Agent session level (not App—the Agent is the product)
  • Credential: CLIENT_CREDENTIALS (client_id + client_secret)
  • Metric: Tool invocations, tokens, session duration, reasoning depth
  • Billing Trigger: Tool invocation within a managed agent session
  • Example: Anthropic Claude Agents, OpenAI Swarm agents, LangChain agents

MCP Server (Emerging)

  • Hierarchy: Customer → Team → Agent → API Key
  • Usage Layer: Session level (the MCP server is invoked by agents)
  • Credential: CLIENT_CREDENTIALS (client_id + client_secret)
  • Metric: Tool invocations per MCP server, session duration, tool-specific compute
  • Billing Trigger: JSON-RPC tool invocation from an agent to an MCP server
  • Example: Aforo's own metering agent, Anthropic's tools, LangChain integrations

The critical insight: API-type products bill at the App/Instance level. Agent-type products bill at the Session level. MCP servers inherit the agent hierarchy because they're always invoked by agents, never directly by humans.

This is why traditional billing fails. You try to force Session-level events (MCP tool invocations) into an App-level billing model, and the granularity collapses.


The Three Unsolved Problems in Agentic Billing

Every CTO shipping AI agents to production bumps into the same three walls. Let's name them clearly.

Problem 1 — Multi-Dimensional Metering

An agent session is not a single metric. It's a vector of metrics, many of which are correlated but not fungible.

Consider a single MCP session where an agent:

  • Invokes 52 tools
  • Generates 47,000 input tokens (to the LLM reasoning engine)
  • Receives 3,200 output tokens (from the LLM)
  • Runs for 14 minutes and 3 seconds
  • Spans 3 distinct sessions (the agent paused and resumed)
  • Incurs 7 tool-specific errors (some retried, some escalated)
  • Achieves 94% success rate on transformation attempts

Which of these do you bill? All of them? In which ratio?

Complicating it further: each tool has its own cost structure. The transform_schema tool costs $0.10 per invocation because it runs a machine learning model. The validate_credentials tool costs $0.001 because it's a simple lookup. The read_database tool costs $0.05 because it's metered by the database vendor. You can't use a single "price per tool invocation" rate card.

Traditional billing systems have a single price or a single tiered scale. Agentic billing requires dimension pricing: the ability to price differently on each dimension (tool, token count, duration, error rate, agent tier, team tier, customer segment) simultaneously.

Aforo solved this with dimensionPricing JSONB stored on the RatePlan entity. Instead of:

price_per_call: 0.001

You define:

{
  "tool_invocations": {
    "transform_schema": 0.10,
    "validate_credentials": 0.001,
    "read_database": 0.05,
    "*": 0.002  // default for unlisted tools
  },
  "token_multipliers": {
    "input_tokens": 0.000001,
    "output_tokens": 0.000002
  },
  "duration_buckets": [
    { "max_seconds": 60, "rate": 0.10 },
    { "max_seconds": 300, "rate": 0.30 },
    { "max_seconds": 3600, "rate": 1.50 }
  ],
  "error_rate_penalties": {
    "threshold": 0.05,
    "multiplier": 1.5  // 50% upcharge if error rate > 5%
  }
}

This is not a UI feature. This is a database design and billing engine feature. Your system must ingest heterogeneous events (tool invocations with metadata), apply per-dimension pricing in real time, and aggregate them into a single invoice.

Problem 2 — Hierarchy-Aware Attribution

Who should be billed when an agent invokes an MCP server?

In traditional SaaS:

  • API Key → App → Team → Customer. Simple chain of ownership.

In agentic systems:

  • API Key → Agent → Team → Customer, but:
    • The Agent might be owned by one Team
    • The Agent might be invoked by another Team (shared agent scenario)
    • The Agent might serve multiple Customers (multi-tenant agent)
    • The MCP server might be shared across multiple Agents

Billing must traverse this graph in real time and correctly attribute costs at each level: per-agent (for capacity planning), per-team (for cost accounting), per-customer (for revenue recognition).

Here's the operational problem: Your usage events arrive asynchronously. An event from an MCP server contains:

{
  "api_key": "sk_live_xxxxx",
  "tool_name": "transform_schema",
  "agent_id": "agent_12345",
  "session_id": "sess_xyz",
  "timestamp": "2026-03-31T14:27:00Z"
}

You need to resolve:

  • Which Customer owns this key?
  • Which Team within the Customer owns the Agent?
  • Which Subscription was active at this moment?
  • Which RatePlan applied?
  • What Discount applies (if any)?

All of this must happen in the ingest pipeline, at line-rate speed, with sub-millisecond latency. If you defer this to a batch job, you can't generate real-time dashboards. If you query the database on every event, you'll hit throughput limits at scale.

Aforo's BillingHierarchyEnricher solves this: it caches the full resolution path (key → agent → team → customer → subscription → rate plan) in Redis (10-minute TTL), keyed on the API key. When an event arrives:

1. Look up API key in Redis cache (hit ~95% of the time)
2. If miss, fetch from database and cache
3. Resolve current subscription (in-memory, because subscriptions are immutable snapshots)
4. Apply rate plan and dimension pricing
5. Emit to three Kafka topics: billing + COGS + mcp-sessions (for analytics)

This way, you handle millions of events per hour with zero database queries on the hot path.

A note for the CFO reading this: Your engineering team will justify Redis and Kafka on latency grounds. Fine. But the business case is revenue recognition. Without real-time hierarchy resolution, your FP&A team spends the first two weeks after month-end manually untangling which team deployed which agent, which agent consumed which tools, and which customer should bear which cost. That's not a billing delay — it's a financial close delay. Every day your books stay open past month-end costs you in audit fees, delayed board reporting, and investor confidence. When Aforo's enrichment pipeline resolves the full attribution chain (key → agent → team → customer → subscription → rate plan) in under 50 milliseconds, the real beneficiary isn't the engineer who avoids a database round-trip. It's the controller who can close the books on Day 1 instead of Day 15, because every usage event was pre-attributed at ingest time. Real-time hierarchy resolution isn't a performance optimization. It's the difference between ASC 606-compliant revenue recognition and a qualified audit opinion.

Problem 3 — Per-Tool Cost Heterogeneity

MCP servers are not monolithic. They're collections of tools, each with wildly different cost structures. A single MCP server (say, "DatabaseToolkit") might expose:

  • read_query() — costs $0.001 per invocation
  • write_transaction() — costs $0.05 per invocation (transactional overhead)
  • analyze_schema() — costs $0.10 per invocation (runs ML inference)
  • validate_permissions() — costs $0.0001 per invocation (cached lookup)

When an agent invokes all four tools, you need to:

  1. Correctly identify which tool was called — not just "MCP server invocation" but "which specific tool within that server?"
  2. Apply tool-specific pricing — not a blanket rate per server, but per-tool rates
  3. Aggregate across sessions — if the same agent invokes the same tool 100 times in a session, you need to bill all 100, not one "batch"
  4. Handle tool aliases and versioning — tools get renamed, versioned, deprecated; your pricing rules must track these changes

This is hard because tool identity is not standardized. An agent calling an MCP server tool sends a JSON-RPC request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "analyze_schema",
    "arguments": {...}
  }
}

Your gateway/metering layer must:

  • Parse this JSON-RPC payload (not trivial—payloads can be chunked)
  • Extract the tool name
  • Route it to the correct pricing dimension
  • Log it durably (for audit and analytics)

Aforo solved this with gateway plugins that detect JSON-RPC tools/call requests at the protocol level, before they reach your application. We built detection plugins for all five major API gateways:

  • Kong (Lua plugin): Inspects request.body, parses JSON, extracts tool name
  • AWS API Gateway (Lambda authorizer): CloudWatch Logs → tool name extraction
  • Azure APIM (policy fragment): C# expression to parse request body
  • Apigee (Shared Flow): JavaScript callout on request payload
  • MuleSoft (custom policy): DataWeave transformation

Each plugin sets a standard header X-Aforo-Tool-Name: analyze_schema that your billing system can read downstream. No agent SDKs need to be modified. No application code changes required.

The Product Reality: Preventing Agentic Bill Shock

Chief Product Officers: this section is for you. Everything above describes how to capture revenue from agentic workloads. But aggressive agentic billing without defensive UX is a churn machine.

Here's the nightmare scenario your support team will face. A developer deploys an AI agent on a Friday afternoon. Over the weekend, the agent encounters an edge case that triggers a retry loop — calling transform_data 14,000 times against a malformed schema that never resolves. Monday morning, the developer opens their dashboard to a $5,200 invoice for a workload that produced zero value. They don't file a support ticket. They cancel.

Bill shock is the silent killer of usage-based products, and agentic workloads amplify it by an order of magnitude. Unlike a human developer who would notice after 10 failed API calls and stop, an autonomous agent will keep retrying until it hits a rate limit or exhausts its context window. The agent's persistence — its core feature — becomes a financial liability when billing operates on batch cycles.

This is why real-time event-level cost evaluation isn't just a billing architecture choice. It's a product trust requirement. Because Aforo evaluates cost at the moment each tool invocation arrives — not at end-of-month batch — product teams can implement three layers of spend defense:

  1. Hard spend caps: Set a per-agent, per-session, or per-team budget ceiling. The instant cumulative cost crosses the threshold, the agent is paused. Not at the next billing cycle. Not at the next daily aggregation. Immediately. The BillingHierarchyEnricher already has the running total in Redis; adding a ceiling check is a single comparison on every event.

  2. Velocity alerts: If an agent's invocation rate exceeds 3× its trailing 7-day average, fire an alert to the team owner before costs spiral. "Your Data Analyst Agent is invoking transform_data at 400/min (normal: 120/min). Investigate or pause?"

  3. Session cost projections: Show the developer, in real time, the estimated session cost as it accumulates. "This session has used $12.40 so far. At current rate, it will reach $45 by completion." No surprises. No trust destruction. No churn.

The companies that win in agentic billing won't be the ones that extract the most revenue per session. They'll be the ones whose customers trust the billing enough to deploy more agents. Defensive billing UX is a growth lever, not a cost center.


💡 CTO Reality Check: If you're building or deploying AI agents to production, ask yourself: Can you currently answer "how much did that agent session cost to compute?" with a number you trust within 10%? If the answer is no, you have a Problem #1, #2, or #3 lurking in your systems. By the time you ship to customers, you're already leaking revenue.


The Architecture That Solves It

Aforo's approach to agentic billing is built on four architectural pillars: event schema, session lifecycle, hierarchy enrichment, and dimension pricing. Let's examine each.

Event Schema for Agentic Workloads

The event is the foundation. Every tool invocation, token, duration, and error must be captured in a schema that survives the complexity of agentic compute.

Aforo's standard event for MCP server tool invocations:

{
  "event_id": "evt_xxxxx",
  "event_type": "mcp.tool_invoked",
  "api_key": "sk_live_xxxxx",
  "agent_id": "agent_12345",
  "agent_type": "CLAUDE",  // or GPT, LANGCHAIN, CREWAI, etc.
  "team_id": "team_67890",
  "customer_id": "cust_11111",
  "subscription_id": "sub_22222",
  "session_id": "sess_aaaaaa",
  "tool_name": "transform_schema",
  "mcp_server_id": "mcp_33333",
  "execution_duration_ms": 847,
  "execution_status": "SUCCESS",  // or FAILED, TIMEOUT, RATE_LIMITED
  "input_tokens": 2400,
  "output_tokens": 150,
  "error_type": null,
  "error_message": null,
  "retry_count": 0,
  "is_parallel_call": false,
  "parent_tool_invocation_id": null,  // for tool graphs
  "tenant_id": "tenant_xxxxx",
  "ingest_timestamp": "2026-03-31T14:27:00.123Z",
  "recorded_timestamp": "2026-03-31T14:26:59.045Z"  // when the tool was invoked
}

Key design decisions:

  • execution_status: Captures success/failure/timeout/rate-limited. Failed invocations still incur cost (they consumed compute).
  • input_tokens + output_tokens: Separate dimensions. Agents heavily optimize for input token reuse (context caching), so the ratio matters.
  • retry_count: If an agent retried a tool invocation, that's billable. You bill all attempts, not just the final success.
  • parent_tool_invocation_id: Allows you to model tool graphs. If tool A calls tool B, you can reconstruct the graph for analytics.
  • recorded_timestamp vs ingest_timestamp: The tool ran at recorded time; we ingested the event at ingest time. The delta tells you about queue/network latency.

This schema is sent to three Kafka topics simultaneously:

  1. aforo.usage.events — Standard usage event stream (shared with API/AI Agent products)
  2. aforo.mcp.sessions — MCP-specific session tracking (feeds McpSessionManager for session lifecycle)
  3. aforo.cost-of-goods-sold — COGS tracking (for unit economics analysis)

Every MCP tool invocation is routed to three Kafka topics simultaneously:

MCP Server Tool Invocation
         │
         ▼
  Gateway Plugin
  (extracts tool_name,
   agent_id, session_id)
         │
    ┌────┼────────────────┐
    ▼    ▼                ▼
 Kafka   Kafka           Kafka
 usage   mcp.sessions    cost-of-
 events  (lifecycle)     goods-sold
    │         │              │
    ▼         ▼              ▼
 Billing   Session       Unit Econ
 Pipeline  Manager       Analytics
Triple-Track Event Routing — Usage, Sessions, and COGS

Session Lifecycle Management

An MCP session is not just a collection of tool invocations. It's a stateful conversation with a defined lifecycle. Sessions have start times, end times, activity patterns, and termination conditions.

Aforo's McpSessionManager tracks:

Session Created: First tool invocation
Session Active: Continued tool invocations
Session Idle: No invocations for N minutes (default: 10 min)
Session Paused: Agent deliberately paused (returned control to human)
Session Resumed: Agent resumed after pause
Session Timed Out: Idle for >1 hour
Session Completed: Agent finished its task
Session Failed: Unrecoverable error

Why does this matter for billing? Sessions define the boundary of work. A single session might incur $5 in tool costs. But if the agent fails and retries, that could be two sessions: the first (failed) incurred $3; the second (succeeded) incurred $4. You bill both.

Sessions also define commitment levels. A Team might purchase a plan: "10 GB of token quota per month, organized as up to 100 active sessions." If they exceed this, what happens? Do you auto-upgrade them? Cap them? Alert them? Session tracking lets you answer these questions.

Operationally, the McpSessionManager:

  1. Listens to aforo.mcp.sessions Kafka topic (sourced from gateway plugins)
  2. Creates a session record on first tool invocation from an API key in a given timeframe
  3. Updates session counters (tool count, token count, duration) in real time
  4. Marks sessions as idle when no events arrive for 10 minutes
  5. Expires idle sessions (auto-close, still billable)
  6. Publishes session summary to aforo.analytics.mcp_sessions (ClickHouse)

The session record lives in PostgreSQL:

CREATE TABLE mcp_sessions (
  id UUID PRIMARY KEY,
  tenant_id VARCHAR NOT NULL,
  agent_id UUID NOT NULL,
  api_key_id UUID NOT NULL,
  session_start_at TIMESTAMP WITH TIME ZONE NOT NULL,
  session_end_at TIMESTAMP WITH TIME ZONE,
  status VARCHAR NOT NULL,  – ACTIVE, IDLE, COMPLETED, FAILED, EXPIRED
  tool_invocation_count INT NOT NULL DEFAULT 0,
  total_tokens INT NOT NULL DEFAULT 0,
  total_duration_ms BIGINT NOT NULL DEFAULT 0,
  error_count INT NOT NULL DEFAULT 0,
  created_at TIMESTAMP WITH TIME ZONE NOT NULL
);

For analytics, a materialized view in ClickHouse aggregates:

CREATE TABLE mcp_session_summary (
  session_id UUID,
  tenant_id String,
  agent_id UUID,
  session_date Date,
  tool_invocation_count UInt32,
  avg_tool_duration_ms UInt32,
  total_tokens UInt64,
  success_rate Decimal64(2)
) ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, session_date);

This feeds real-time dashboards showing "Agent X executed 47 sessions today, averaging 2.3 minutes per session."

The Billing Hierarchy Enrichment Pipeline

Here's where agentic billing gets its teeth. The enrichment pipeline is the nervous system of cost attribution.

When an event arrives from an MCP server (or any agentic source), it contains:

  • api_key: The credential used
  • tool_name: The tool invoked
  • timestamp: When it happened

It does not contain:

  • The Customer who owns this key
  • The Team that deployed the Agent
  • The Subscription currently active
  • The RatePlan that applies
  • Any discount codes or committed spend

Enrichment must resolve all of this, in <50 milliseconds, with 99.9% correctness.

Aforo's pipeline:

Stage 1: Key Resolution (Cache Lookup)

Redis.Get("api_key:sk_live_xxxxx")
→ Returns:
{
  "agent_id": "agent_12345",
  "team_id": "team_67890",
  "customer_id": "cust_11111",
  "product_id": "prod_mcp_server_001",
  "subscription_id": "sub_22222"
}

If cache miss, fetch from pricing_service.api_keys table and cache (TTL 10 min).

Stage 2: Subscription Resolution

In-memory map of subscription_id → RatePlan snapshot
(Loaded at service startup, updated on subscription changes via Kafka)

Subscriptions are immutable snapshots. When a Team upgrades plans, a new subscription record is created; the old one is closed. The snapshot captures:

  • Effective date (when the rate plan became active)
  • Expiration date (when it expires)
  • RatePlan ID (which pricing model applies)
  • Any overrides (custom pricing for this customer)

Stage 3: Dimension Pricing Application

Apply dimensionPricing rules from the RatePlan:
- Tool invocation cost: lookup[tool_name] or default
- Token cost: input_tokens × rate + output_tokens × rate
- Duration cost: bucket lookup based on execution_duration_ms
- Error multiplier: if error_count > threshold, apply upcharge

Result: Numeric cost for this single event

Stage 4: Ledger Write

Insert into billing.usage_ledger:
(
  tenant_id, subscription_id, rate_plan_id,
  event_type, tool_name, cost,
  recorded_timestamp
)

This ledger is the source of truth for invoicing. A nightly batch job sums the ledger per subscription and generates invoices.

Stage 5: Analytics Dispatch

Emit to ClickHouse (mcp_tool_invocations) for real-time dashboards
Emit to COGS topic for unit economics tracking

Event Flow: API Gateway → Ledger in < 50ms

API Gateway ──→ Kafka (usage.events) ──→ BillingHierarchyEnricher
                                              │
                                    ┌─────────┴─────────┐
                                    ▼                     ▼
                              Redis Cache           PostgreSQL
                           (key → agent →         (cache miss
                            team → customer        fallback)
                            → subscription)
                                    │
                                    ▼
                          Dimension Pricing Engine
                           (tool rate × tokens
                            × duration × volume)
                                    │
                          ┌─────────┼─────────┐
                          ▼         ▼         ▼
                    PostgreSQL  ClickHouse   Kafka
                    (billing    (analytics   (COGS
                     ledger)    dashboards)   topic)
Agentic Billing — Hierarchy Enrichment Pipeline

Why this architecture?

  • Redis cache minimizes database queries — 95% of lookups hit the cache, so no DB roundtrip
  • In-memory subscription snapshots — subscriptions don't change often, and loading them once at startup is faster than caching
  • Immediate ledger write — you have the cost within 50ms, so real-time dashboards can query the ledger
  • Separated concerns — dimension pricing logic is isolated from hierarchy logic, making it testable and auditable

Dimension Pricing in the Rate Plan

Once you have the event and the customer's subscription, you apply dimension pricing. This is the math layer.

A rate plan's dimensionPricing JSONB defines rules for each dimension:

Dimension 1: Tool-Specific Rates

{
  "tool_invocations": {
    "transform_schema": 0.10,
    "validate_credentials": 0.001,
    "read_database": 0.05,
    "*": 0.002
  }
}

When tool transform_schema is invoked, cost = 0.10. When tool xyz_unknown_tool is invoked, cost = 0.002 (default).

Dimension 2: Token-Based Multipliers

{
  "token_multipliers": {
    "input_tokens": 0.000001,
    "output_tokens": 0.000002
  }
}

If an invocation used 2,400 input tokens and 150 output tokens:

  • Input cost = 2400 × 0.000001 = 0.0024
  • Output cost = 150 × 0.000002 = 0.0003
  • Token cost = 0.0027

Total cost = 0.10 (tool) + 0.0027 (tokens) = 0.1027

Dimension 3: Duration Buckets

{
  "duration_buckets": [
    { "max_seconds": 60, "rate": 0.10, "unit": "per_invocation" },
    { "max_seconds": 300, "rate": 0.50, "unit": "per_invocation" },
    { "max_seconds": 3600, "rate": 2.00, "unit": "per_invocation" }
  ]
}

If execution_duration_ms = 847 ms = 0.847 seconds (bucket 1):

  • Duration cost = 0.10

Dimension 4: Error Rate Penalties

{
  "error_rate_penalties": {
    "threshold": 0.05,
    "multiplier": 1.5
  }
}

If error_count / total_invocations > 0.05, apply 1.5× multiplier to the whole session. Rationale: High error rates indicate debugging sessions, inefficient agents, or misconfigured tools—you want to signal this to customers and recover cost.

Dimension 5: Volume Discounts

{
  "volume_tiers": [
    { "min_invocations": 0, "rate": 0.10 },
    { "min_invocations": 10000, "rate": 0.08 },
    { "min_invocations": 100000, "rate": 0.05 }
  ]
}

Monthly aggregation: if total invocations > 100,000, apply 0.05 per-invocation rate retroactively.

These dimensions are applied combinatorially. A single tool invocation might incur:

  • Base tool cost (dimension 1)
  • Token cost (dimension 2)
  • Duration cost (dimension 3)
  • Error multiplier (dimension 4)
  • Volume discount (dimension 5)

Final cost = (base + tokens + duration) × error_multiplier × volume_discount

This is not a simple tier-based model. This is real-time, multi-dimensional pricing that captures the actual economics of running agentic workloads.


Real-World Monetization Patterns for MCP Servers

Theory is fine. Here's how companies actually price agentic products. These are four proven patterns we see in production.

Pattern 1 — Per-Tool-Invocation with Tiered Discounts

The Setup: Your MCP server exposes 12 tools. Some are cheap (data lookups), others are expensive (ML inference, real-time compute).

The Pricing:

read_cache: $0.001 per invocation
write_cache: $0.003 per invocation
analyze_schema: $0.10 per invocation
transform_data: $0.25 per invocation
run_validation: $0.05 per invocation

Tier 1: <1,000 invocations/month → pay listed prices
Tier 2: 1,000-10,000 invocations/month → 10% discount on all tools
Tier 3: 10,000-100,000 invocations/month → 25% discount
Tier 4: >100,000 invocations/month → 40% discount + dedicated support

Why This Works: It aligns price with actual cost (expensive tools cost more), rewards scale, and is easy to explain. Customers understand "Transform Data costs $0.25"—it's concrete.

The Aforo Implementation:

  • dimensionPricing.tool_invocations: Define per-tool rates
  • dimensionPricing.volume_tiers: Define discount brackets (monthly aggregation)
  • Real-time ledger tracks monthly invocation count; discount applied at invoice time

Pattern 2 — Session-Based with Included Tool Budget

The Setup: You want to simplify billing. Instead of per-tool pricing, charge a flat fee per session, with included budget.

The Pricing:

$5 per session (up to 100 tool invocations)
$0.05 per additional tool invocation

+ Token cost: $0.000002 per output token (expensive LLM reasoning)

Why This Works: Predictable billing for customers. A Team buying 100 sessions/month knows the base cost ($500). Overage is transparent. Output tokens reflect the actual LLM compute consumed.

The Aforo Implementation:

  • dimensionPricing.base_session_fee: $5
  • dimensionPricing.included_tool_count: 100
  • dimensionPricing.tool_invocations["*"]: 0.05 (overages)
  • dimensionPricing.token_multipliers.output_tokens: 0.000002
  • McpSessionManager tracks session boundaries; ledger write includes session_id for grouping

Pattern 3 — Token-Based with Tool Multipliers

The Setup: Your MCP server integrates with Claude/GPT. The primary cost is LLM reasoning (measured in tokens). Tool invocations are secondary.

The Pricing:

Input Tokens:  $0.0000005 per token
Output Tokens: $0.000002 per token

Tool Multiplier: Different tools have different "reasoning complexity"
- simple_lookup: 0.5× multiplier (cheap reasoning)
- schema_analysis: 2.0× multiplier (expensive reasoning)
- transform_multi_step: 3.5× multiplier (most expensive)

Base charge = token_cost × tool_multiplier

Why This Works: You're billing for what you actually pay the LLM. Tool multipliers let you differentiate which tools require more reasoning. Transparent to customers: "You're using 50,000 output tokens at $0.000002 each with a 2.0× multiplier for schema analysis = $0.05/session."

The Aforo Implementation:

  • dimensionPricing.token_multipliers: Base rates
  • dimensionPricing.tool_multipliers: Per-tool reason complexity multipliers
  • Calculation: cost = (input_tokens × input_rate + output_tokens × output_rate) × tool_multiplier[tool_name]

Pattern 4 — Committed Minimum with Agent-Level Attribution

The Setup: Enterprise customer. Deploying 3 agents (customer service bot, data analyst, integration manager). Wants predictable monthly cost.

The Pricing:

Committed: $10,000/month for up to 1M tool invocations

Per-Agent Allocation:
- Customer Service Agent: $3,000 (300K invocations)
- Data Analyst Agent: $5,000 (500K invocations)
- Integration Manager: $2,000 (200K invocations)

Overage: $0.01 per invocation beyond allocated amount

Why This Works: Enterprise deals need opex predictability. Committed spend forces planning; agent-level allocation shows who's using resources; overage keeps everyone honest. The customer gets 1M cheap invocations if they stay within commitment; going over costs more.

The Aforo Implementation:

  • Create RatePlan with billing_mode: COMMITTED
  • Define committed_amount: 1000000 and committed_price: 10000
  • Create 3 Subscriptions for the same customer, one per agent, with allocation
  • Overage pricing: dimensionPricing.tool_invocations["*"]: 0.01
  • Billing pipeline: sums all three subscriptions, applies committed discount, bills overages at line-item level

The Developer Portal Experience for Agentic Products

Good billing architecture is invisible if the developer experience sucks. Here's what enterprise customers expect from their MCP server provider.

Agent Registration and Key Management

When a developer deploys an AI agent to your MCP server, they need to:

  1. Register the Agent — Give it a name, describe its purpose, set rate limits
  2. Generate Credentials — Create CLIENT_CREDENTIALS (client_id + client_secret)
  3. View the Agent — See active sessions, tool invocations today, cost tracking
  4. Manage Lifecycle — Suspend, reactivate, delete agents

Aforo's AgentDashboardPage (in the Aforo Product UI) provides this UX:

  • Agent Grid: Searchable table of all agents owned by the team
  • Agent Stats: Active sessions, tools invoked today, tokens used this month, estimated cost
  • Register Form: Agent name, agent_type (CLAUDE/GPT/LANGCHAIN/CREWAI/AUTOGEN/CUSTOM), max_concurrent_sessions, rate_limit_per_minute
  • Credential Management: Display client_id + client_secret, rotate (auto-revoke old, create new), copy to clipboard
  • Permissions: Show which subscriptions/rate plans this agent can use
  • Activity Timeline: Recent tool invocations, session summaries, errors

Key feature: Per-Agent Cost Projection.

Based on this month's tool invocation rate and current rate plan, the dashboard calculates: "If usage continues at current pace, this agent will cost $2,400 by month-end (you have $5,000 committed)."

This is critical. Developers need to know when they're approaching limits.

Usage Dashboards with Agent-Level Granularity

Aggregate dashboards are useless for cost management. A team with 5 agents needs to see:

  • Monthly Cost by Agent (bar chart)
  • Cost Breakdown by Tool (pie chart)
  • Session Health (table: agent name, sessions today, success rate, avg duration)
  • Token Consumption (stacked area chart: input vs output tokens, monthly trend)
  • Cost vs Committed (gauge: $2,340 used of $5,000 committed)
  • Alerts (red banner if >80% of monthly commit used)

Aforo's analytics pipeline feeds these dashboards via ClickHouse:

SELECT
  agent_id,
  agent_name,
  SUM(cost) as total_cost,
  COUNT(*) as invocation_count,
  SUM(total_tokens) as token_count,
  AVG(execution_duration_ms) as avg_duration_ms
FROM mcp_tool_invocations
WHERE tenant_id = ? AND recorded_date >= date_trunc('month', now())
GROUP BY agent_id, agent_name
ORDER BY total_cost DESC;

Queries like this return in milliseconds, no matter how much data you have (thanks to ClickHouse compression and columnar storage).

Integration: From Gateway Plugin to SDK Decorator

For developers who don't want to use a gateway, Aforo provides language-specific SDKs that handle metering in-process.

Node.js SDK (@aforo/mcp-metering):

import { AforoMetering } from '@aforo/mcp-metering';

const metering = new AforoMetering({
  clientId: process.env.AFORO_CLIENT_ID,
  clientSecret: process.env.AFORO_CLIENT_SECRET,
  tenantId: process.env.AFORO_TENANT_ID
});

const toolHandler = metering.wrapToolHandler(async (toolName, args) => {
  // Your tool logic here
  return result;
});

The wrapper:

  • Calls tool
  • Measures execution time
  • Counts tokens (from LLM response)
  • Sends event to Aforo metering endpoint
  • Returns result to agent

Python SDK (aforo-mcp-metering):

from aforo_metering import AforoMetering

metering = AforoMetering(
    client_id=os.environ['AFORO_CLIENT_ID'],
    client_secret=os.environ['AFORO_CLIENT_SECRET'],
    tenant_id=os.environ['AFORO_TENANT_ID']
)

@metering.wrap_tool_handler
async def transform_schema(schema, target_type):
    # Your logic
    return transformed_schema

Same idea: decorator pattern, minimal changes to existing code, automatic event emission.

Behind the scenes, the SDK:

  • Batches events (100 events or 5 seconds, whichever comes first)
  • Retries failed sends (3× exponential backoff)
  • Caches credentials (if rotate happens, notices within 5min)

This is the low-friction path for integrating agentic metering without deploying a gateway.


Why This Matters Now — And Why Aforo Built It First

In 2024, agentic AI was theoretical. Model Context Protocol was announced but not widely deployed. Companies were curious: "Can agents use tools?"

In 2025, it became real. Claude 3.5 Sonnet's tool use landed in production. GPT-4 agent loops shipped. LangChain released agent frameworks. Dozens of startups began shipping MCP servers and agent-native products.

In 2026 (now), the market inflection point hit. Companies realized: Agentic workloads are our core monetization surface. You can't ship an AI agent product and bill customers like it's a REST API.

We at Aforo saw this coming. We built MCP Server as the fourth GA product type because we knew vendors would demand it. We engineered multi-dimensional pricing into the billing engine because we knew per-call billing would collapse under agentic compute. We shipped gateway plugins for all five major API gateways because we knew adoption would be broad.

The result: Aforo is the only platform that handles agentic billing correctly. Not mostly. Not "close enough." Correctly.

Here's what "correct" means:

  • Every tool invocation is tracked at line-item level with full context (agent, session, tool, tokens, duration, status)
  • Hierarchy is resolved in real-time with 95% cache hit rate, never blocking ingest
  • Dimension pricing is combinatorial — you can price on any dimension or combination
  • Session lifecycle is tracked, not just raw events
  • Real-time dashboards show cost per agent per day, not month-end surprise invoices
  • Overage tracking works, protecting both sides of the deal

"We'll Just Use Stripe Until We're Bigger" — Why That's Fatal in the Agentic Economy

A word for the CTO at a 50-person startup who's reading this and thinking: "This is enterprise-grade overkill. We have 200 customers. We'll use Stripe Billing, track usage in a Postgres table, and deal with the complexity when we scale."

In traditional SaaS, that's a reasonable bet. Deferring billing infrastructure means your finance team suffers with spreadsheets for a few quarters. You leak some revenue on edge cases. You manually reconcile overages. It's painful but survivable. The worst case is a few angry invoices and an embarrassed controller.

In the agentic economy, deferring billing infrastructure is an existential risk.

Here's why: autonomous agents don't sleep, don't pause to question their own behavior, and don't check with a human before retrying. A misconfigured agent caught in an inference loop can rack up $10,000 in LLM token costs over a single weekend. If your billing system is a basic monthly subscription with no real-time metering, no spend caps, and no session tracking, you won't know this happened until the LLM vendor's invoice arrives on the 1st. By then, you've consumed $10,000 in compute that you can never recover from the customer, because your pricing page says "$99/month unlimited."

This isn't hypothetical. We've talked to three YC-batch companies in 2026 who hit exactly this wall: they shipped agent-based products with flat-rate pricing, an agent ran wild on a customer's behalf, and the resulting cloud bill exceeded the customer's entire annual contract value. One of them nearly shut down.

The "start simple and fix it later" playbook assumes your billing mistakes are linear: a few percentage points of leakage, a few days of delayed invoices. Agentic workloads make your mistakes exponential. An unmetered agent doesn't leak revenue — it hemorrhages it. And because agents operate autonomously, the feedback loop between "something went wrong" and "someone noticed" can be days, not minutes.

You don't need Aforo's full 10-stage billing pipeline on Day 1. But you absolutely need three things before you ship an agentic product to a single paying customer: real-time event-level metering (know what's happening as it happens), per-agent spend caps (stop runaway costs before they destroy you), and session-aware billing (because "API calls" is not a meaningful unit when an agent makes 500 of them in 3 minutes). If your current billing vendor can't do these three things, you're not saving money by deferring — you're placing a bet that no agent will ever misbehave. That's not engineering. That's prayer.

Audit Yourself: 3 Questions for Your Agentic Billing Readiness

If you're building AI agents or MCP servers, ask yourself:

Question 1: Can you bill differently on different tools? If your answer is "all tools cost the same," you're leaving revenue on the table. Tools have different cost structures. Your pricing should reflect that. Aforo's dimension pricing handles this natively.

Question 2: Can you attribute costs to the team/agent that incurred them in real time? If you answer "we report monthly," you're operating blind. By then, cost surprises happen. Aforo's BillingHierarchyEnricher resolves attribution within 50ms, feeding real-time dashboards.

Question 3: Can you explain your agentic billing logic to a customer in <2 minutes? If you can't, it's probably broken. Aforo's four patterns (per-tool, session-based, token-based, committed) are each explainable in 30 seconds. Your model should be too.


What's Next

Agentic billing is moving fast. By end of 2026, we expect:

  • Agentic APIs to become the default for new enterprise SaaS (not a novelty)
  • Multi-agent orchestration to be common (teams of agents working together on hard problems)
  • Agent marketplaces to emerge (buy/sell specialized agents, metered at the session level)
  • Agentic cost optimization to become table stakes (agents that minimize their own cost while solving problems)

Aforo is built for all of this. Not as a bolted-on feature. As the core architecture.

If you're monetizing agents or MCP servers, the time to fix your billing is now—before customer agreements lock you into revenue leakage. Aforo makes it possible to do it right, at any scale.


Checklist: Your Agentic Billing Readiness

  • [ ] Multi-dimensional metering: Can you price different tools differently? Can you apply volume discounts? Can you charge for tokens and duration simultaneously?
  • [ ] Hierarchy-aware attribution: Can you resolve "which team/agent incurred this cost?" in real time? Can you show per-agent cost dashboards?
  • [ ] Tool identity tracking: Can you correctly identify which tool within an MCP server was invoked? Can you route this to per-tool pricing rules?
  • [ ] Session lifecycle management: Do you track session start/end, duration, tool count, and token count? Can you bill both successful and failed sessions?
  • [ ] Real-time dashboards: Can customers see their cost trend intra-month, or do they find out at invoice time?

If you checked fewer than 3 boxes, you need to rearchitect. If you checked 3-4, you're partially there. If you checked all 5, you might already be using Aforo—or you've built this yourself (congratulations, and reach out; we have reference architecture docs).


The agentic economy is here. Billing is no longer a commodity. It's a product.

Aforo. Built for agentic at scale.

Share this article
JB
Jay Bodicherla
Founder & CEO, Aforo

Product leader building Aforo, the production-grade enterprise monetization platform for SaaS teams scaling usage-based billing.

Ready to ship outcome-based pricing?

Deploy an Intercom-style billing model in 5 minutes.
No custom middleware required.

Try the sandbox free, or talk to our solutions team for a 1:1 enterprise architecture review. No credit card required.