The $3.2M Problem Nobody Talks About
It's Tuesday at 2:47 PM. Customer #247—a fintech company that represents 8% of your ARR—opens your billing dashboard to check this month's charges. They see $47,200. It matches what they expected based on their usage logs. They pay the invoice.
What they don't know is that your system actually measured $51,800 in billable usage. The difference—$4,600—was lost somewhere between their API gateway and your invoice PDF. It didn't get double-billed (your CFO would catch that). It simply vanished. Your system wrote it down in three different places with conflicting timestamps, applied a discount to the wrong product tier, waited too long to count late-arriving events, and then forgot to sum the results correctly.
This happens every single month. For a mid-market SaaS company with $25M ARR built on usage-based pricing, that $4,600 leakage on a single customer is a pattern playing out across 300 accounts. Some months you lose $15K to dropped events. Other months, $22K slips away because your wallet reconciliation process runs before the nightly batch, not after. Your CFO sees the invoicing total and accepts it as correct. Your engineers see event logs and assume everything made it through. Nobody connects the dots.
This is revenue leakage—and it's not fraud, double-counting, or customer non-payment. It's the natural entropy of a billing system that was designed for humans to manage, not for a continuous flow of billions of metered events.
The Leakage Math
For a mid-market SaaS company with $25M ARR, 2-5% invisible leakage means $500K-$1.25M in revenue evaporating annually between your meter and your invoice. Nobody notices because each individual gap is small. The sum is not.
What "Revenue Leakage" Actually Means in Usage-Based Billing
In traditional recurring subscription billing, "leakage" is straightforward: a customer on a $500/month plan has their billing scheduled but the payment fails, and you don't retry hard enough. You lose $500.
In usage-based billing, leakage is far subtler. You have multiple independent systems that must agree on a single number: how much the customer actually used. Your API gateway logs it one way. Your metering service normalizes it. Your rating engine calculates pricing from it. Your billing pipeline applies discounts and taxes to it. Your invoicing system renders it. Your accounting system records it.
Each system is designed, deployed, and scaled independently. Each has its own database, its own cache, its own failure modes. Each one can lose track of a few events, or misinterpret a timestamp, or apply a rule slightly differently than the system before it. When they all work in isolation and then converge in the final invoice, the errors don't cancel out—they accumulate.
Revenue leakage is the sum of all these small errors, across all your customers, across all your billing periods. It's not a bug in any single system. It's the natural result of treating billing as an afterthought and stitching together tools designed for different problems.
Why Traditional Billing Audits Miss It
Your finance team probably runs a monthly reconciliation. They compare the total invoiced amount to the total revenue recorded in your accounting system. They match. That pass-fail binary check creates a false sense of security.
What they're not doing—because it's almost impossible to do manually—is verify that each individual customer's usage, as measured at the gateway, as normalized by the metering service, as rated by the pricing engine, as adjusted by the billing pipeline, matches the amount that appears on their invoice. That would require instrumenting seven independent systems and comparing their outputs millisecond-by-millisecond for months. No spreadsheet can do that.
This is why leakage goes undetected. It's invisible to standard audits. A customer who doesn't aggressively scrutinize their bill won't complain. A customer who does won't have evidence of leakage on their side—they'll just see a lower invoice than they expected and rationalize it (maybe we didn't use as much as we thought, or our committed minimum covered more than I calculated).
Meanwhile, revenue that should have been yours is silently written off as "adjustment" or "variance" or simply accepted as the friction cost of running a modern billing system.
For a company with $25M ARR on usage-based pricing, at an average leakage rate of 0.5% per customer per month—conservative, based on audit patterns we've seen—that's roughly $3.2M in annual revenue loss to entropy.
Leakage Point #1 — Dropped Events During Ingestion Spikes
The Anatomy of a Dropped Event
Picture a Friday at 6 PM. Your largest customer is running their weekly batch job—a data processing task that fires 2,000 API calls in rapid succession. Each call should be metered: one unit for the API call itself, plus usage of three other dimensions (CPU time, data volume, storage ops).
Your ingestion service is designed to handle 50,000 events per second. Your customer fires 8,000 events per second for 90 seconds. The system drops the last 18,000 events (events #32,000–#49,999 in the batch) because the message queue filled faster than the consumer could drain it.
Your queue has retry logic, but it only works for messages that made it to the queue. If the ingestion service's HTTP handler dropped the request before publishing to Kafka, there's nothing to retry. No error response goes back to the customer (the request timed out), so they can't tell you it failed. No error log is created (the request was dropped at connection level). The customer's metering client sees the timeout and implements its own retry with exponential backoff, but by then it's tried three times and given up.
From your perspective: the daily usage report shows $18,000 in metered usage. That's what got into your Kafka topics and persisted in ClickHouse. The customer's logs show they ran 50,000 API calls and should have been charged for all of them. The difference—potentially $3,200 to $7,400 depending on your pricing model—is invisible to both parties.
The customer doesn't complain because they don't have direct access to your metering pipeline. You don't compensate them because you have no evidence of dropped events. Both parties accept the invoice as correct.
This is especially dangerous because dropped-event leakage tends to happen to your highest-volume customers, because they're the ones stressing your system. It's concentrated leakage, not evenly distributed. Instead of losing 0.2% from 500 customers, you might lose 2% from your top 10. The financial impact is concentrated where visibility is lowest.
The Architectural Fix: Transactional Event Guarantees
The fix is to guarantee that if an event was generated by the customer, it will be counted—or you will know it wasn't, and you can refund them. This requires three architectural layers:
Layer 1: At-Least-Once Delivery on the Ingestion Path. Aforo's usage-ingestor service publishes every received event to Kafka with transactional guarantees. If the event made it to Kafka, it will be processed. If Kafka publishing fails, the HTTP request fails visibly, and the customer's SDK retries (as designed). There are no silent drops between ingestion and Kafka. This is achieved through Kafka transactions and producer acks: all configuration—expensive in latency terms (adds ~50ms to ingestion), but the cost is borne once per event, not per-month per-customer.
Layer 2: Idempotent Event Processing. Every event carries a customer-generated idempotency_key (UUID) that prevents double-counting if the same event is processed twice. ClickHouse and PostgreSQL both store this key and deduplicate on ingest. If a customer retries the same event three times due to network timeouts, your billing system sees it once. If a customer replays historical events (e.g., "we found logs from last Tuesday, here's what was really used"), your system detects the replay via the key and reconciles correctly rather than double-billing.
Layer 3: Reconciliation Audit Trail. At the end of each billing period, the pricing service queries usage-ingestor for a cryptographic hash of all events for the customer in that period, filtered by product. Aforo compares this hash to the hash of the events that actually made it into the billing pipeline. If they don't match, a reconciliation workflow is triggered: investigate the discrepancy, determine if leakage occurred, and credit the customer automatically.
The cost of this is primarily infrastructure (Kafka clusters are not cheap) and operational (reconciliation workflows need to be monitored). But the ROI is straightforward: if it prevents even 0.1% of your ARR from leaking due to dropped events, it pays for itself immediately on a $10M+ ARR company.
Leakage Point #2 — Late-Arriving Events After Billing Window Close
The 11:59 PM Problem
Your billing window closes on the last day of the month at 11:59:59 PM UTC. At 11:59:47 PM, a customer's distributed system generates a usage event and queues it for delivery to your metering service. The event is timestamped (correctly) at 11:59:47 PM UTC.
The event doesn't reach your metering service until 12:03:22 AM UTC the next day—a 3 minute, 35 second delay caused by transient network latency in the customer's ISP and a brief congestion spike in your ingestion service.
Your billing pipeline runs at 12:05 AM UTC on the first of the month. It looks for all events timestamped between the 1st at 00:00:00 and 31st at 23:59:59, summing them into the monthly charge. The event with the 11:59:47 PM timestamp should be included. But the event arrives at 12:03:22 AM, after the billing pipeline has already closed the month and calculated the total.
What happens next depends on your system's design:
- Naive System: The event is ignored. It was late. The monthly bill is already calculated. The next time this customer is billed, the event counts (if your system even remembers it). You've just lost that revenue from Month 1 and will apply it to Month 2's bill, which might distort Month 2's pricing if the customer has volume discounts or tiered rates.
- Better System: The event is counted in a monthly "reconciliation" report that gets sent to accounting. The event is noted but the invoice is not reissued. It accrues as a deferred credit to be applied to next month. But what if next month the customer upgrades tiers? Now the deferred credit is applied to a different tier structure. The math becomes opaque.
- Good System (Aforo): Late-arriving events within a grace period (Aforo defaults to 72 hours) trigger an automatic retroactive adjustment. The billing pipeline is re-run with the new event included, the customer is sent a corrected invoice (or a credit memo if they already paid), and an audit trail records the adjustment.
The issue is that in usage-based billing, events are not synchronized with invoicing windows. Your customer's clock may be slightly off. Network transit times vary. A distributed system might batch events and flush them in batches, introducing variability in when events arrive. The later an event arrives relative to the billing window close, the more assumptions your system must make about whether to include it.
The Architectural Fix: Grace Periods and Retroactive Adjustments
Aforo's UsageEventValidator handles this systematically:
- Events timestamped >90 days in the past are rejected outright (either fraud or a clock-skew issue that can't be resolved). The customer is notified to re-send with corrected timestamps.
- Events timestamped >5 minutes in the future are flagged as clock-skew concerns and logged but processed (the customer's clock is ahead of ours, and we want to count their usage). Aforo assumes the event is valid and counts it but marks it for audit review.
- Events that arrive >24 hours after their timestamp are logged as "late arrivals" and counted immediately, but flagged for a monthly reconciliation process.
When billing window close approaches (12 hours before, 6 hours before, 1 hour before, 5 minutes before), your system can query the usage service for events that arrived late in the previous billing period. If any are found, a reconciliation invoice adjustment is triggered:
- Recalculate the previous month's charges including the late events.
- Compare to the already-issued invoice.
- If the difference is material (e.g., >$100 or >2%), issue a corrected invoice or credit memo.
- Update the customer-facing billing record to show the adjustment and its reason.
- Create an audit log entry recording the adjustment, the events that triggered it, and the responsible system component.
This is more operationally complex than simply ignoring late events, but it eliminates the hidden revenue loss. The cost is paid in infrastructure (you need to store historical events long enough to re-run billing calculations) and in customer education (some customers will be confused by "corrected" invoices and need explanation). But the alternative—silently losing revenue—is worse.
Leakage Point #3 — Dimension Mismatch Between Metering and Rating
WHAT THE METER COUNTS WHAT THE RATE PLAN PRICES
┌────────────────────┐ ┌────────────────────┐
│ 2xx ✓ counted │ │ 2xx ✓ billable │
│ 4xx ✓ counted │ │ 4xx ✗ excluded │
│ 5xx ✓ counted │ │ 5xx ✗ excluded │
│ dupes ✓ counted │ │ dupes ✗ deduped │
│ health ✓ counted │ │ health ✗ excluded │
└────────────────────┘ └────────────────────┘
│ │
└──────── GAP = $$$ ──────────┘
AFORO FIX:
Metric Catalog defines filter criteria:
status_code IN (2xx) + deduplicate(session_id)
Rate Plan references metric by ID, not name
When "Requests" ≠ "Billable Requests"
Your customer's API generates three types of requests: reads, writes, and deletes. Your metering service dutifully logs all three. Your pricing model (negotiated during sales) charges for reads and writes, but not deletes.
This seems straightforward. Your rating engine should apply the pricing rule: "Calculate charges = (reads + writes) × $0.001 per request, exclude deletes."
But here's the problem: your metering service logs all three request types in a single metric called "api_requests". Your pricing engine receives a total count of 1.2M API requests for the month. The pricing rule says "multiply by $0.001". The resulting charge is $1,200.
Except the actual charge should be lower, because 15% of those requests are deletes. The correct charge is 0.85 × 1.2M × $0.001 = $1,020. You've overbilled by $180.
Now flip it: your customer has negotiated a volume discount. "The first 100K requests are at $0.001, requests 100K–500K are at $0.0008, and anything above 500K is at $0.0005."
Your metering service logs the total (1.2M). Your pricing engine applies the tiered rate. But it applies the tiers assuming all 1.2M requests are billable. In reality, only 1.02M are (after excluding deletes). The correct calculation is:
- 100K @ $0.001 = $100
- 400K @ $0.0008 = $320
- 520K @ $0.0005 = $260
- Total: $680
But your system calculated:
- 100K @ $0.001 = $100
- 400K @ $0.0008 = $320
- 700K @ $0.0005 = $350
- Total: $770
You've overbilled by $90. The customer might catch this if they audit carefully. Or they might not, especially if their request volume varies month-to-month. Over a year, that's $1,080. Over a contract lifetime, it's thousands.
This pattern is called dimension mismatch—when the metric reported by metering doesn't match the dimension expected by pricing. It's especially common when:
- Multiple meter integrations report the same dimension inconsistently (one gateway includes retries, another doesn't)
- Pricing rules are negotiated by non-technical stakeholders who say "charge per request" without specifying which request types count
- You have country-level pricing (EUR charges different from USD) but your metering service doesn't tag events with geography
- You have custom pricing per customer, and the custom rule uses a dimension not logged by the standard metering instrumentation
The Architectural Fix: Explicit Metric-to-Rate-Plan Binding
Aforo solves this with an explicit metric configuration per rate plan. When a product is added to a rate plan, the product's metrics are not inherited implicitly. Instead, the rate plan manager explicitly configures each metric:
Product: API Gateway
Metrics:
- metric_id: api_requests
dimension_filter: request_type NOT IN ('delete')
pricing_model: GRADUATED
tiers: [
{tier_start: 0, tier_end: 100000, unit_price: 0.001},
{tier_start: 100001, tier_end: 500000, unit_price: 0.0008},
{tier_start: 500001, unit_price: 0.0005}
]
included_free: 0
overage_behavior: BILLABLE
billing_timing: POSTPAID
The dimension_filter is a SQL where-clause applied at query time. When the billing pipeline runs, it doesn't ask "how many api_requests this month?" It asks "how many api_requests this month WHERE request_type NOT IN ('delete')?" The filter is version-controlled, part of the rate plan definition, and auditable.
If the customer's negotiated agreement changes (e.g., "now charge for deletes too"), you create a new version of the rate plan, update the dimension_filter, and future billing runs use the new filter. Old invoices remain unchanged and auditable.
This is operationally more expensive because the pricing team must manually configure filters instead of relying on defaults. But it forces clarity. Dimension mismatches become visible during the configuration step, not discovered by angry customers weeks later.
Leakage Point #4 — Timezone Misalignment on Billing Period Boundaries
When "Monthly" Means Different Things
Your customer is in Tokyo (UTC+9). You are in San Francisco (UTC-7). Your billing system runs every month at midnight UTC on the 1st.
At the stroke of midnight UTC on January 31st, your billing pipeline closes the January billing period and sums all usage events timestamped between January 1st 00:00:00 UTC and January 31st 23:59:59 UTC.
But your customer's operational team thinks in Tokyo time. For them, January 31st ends at 23:59:59 JST, which is 14:59:59 UTC on January 31st. When they say "our January usage", they mean everything up to 14:59:59 UTC. Usage from 15:00:00 UTC on January 31st (which is 00:00:00 JST on February 1st for them) should be in February's bill.
Your system is using UTC boundary times. Your customer is thinking in local-time boundary times. The mismatch means the first 10 hours of their February usage (UTC 00:00:00 to 09:59:59 on February 1st) is actually included in their January bill, and the last 10 hours of their January usage (UTC 15:00:00 to 23:59:59 on January 31st) is included in their February bill.
If their usage is evenly distributed across the day, this might wash out (one month gets +10 hours, next month gets -10 hours). But if their usage has a temporal pattern—spike early in the morning Tokyo time, low usage late in the evening—then the period boundaries cut across their spike. One month gets a partial spike, the next month gets the rest. Depending on how steeply your pricing escalates (volume discounts, max-spend caps, tiered rates), the charge can vary significantly.
Moreover, if a customer has a timezone-specific SLA ("we guarantee 99.5% uptime between 6 AM and 6 PM Tokyo time"), and you bill based on UTC boundaries, the SLA window no longer aligns with the usage you're measuring. The customer might claim an SLA violation occurred on January 31st UTC 18:00 (which is February 1st JST 03:00, outside their SLA window), but your billing shows that time in the previous period, creating confusion about which month's SLA applies.
The Architectural Fix: Timestamps with Time Zone, Always
Aforo solves this by storing every event timestamp with its timezone offset, and allowing customers to specify a billing-period timezone.
Every usage event carries two timestamps:
event_timestamp: The time the event occurred, stored asTIMESTAMP WITH TIME ZONE(PostgreSQL / ISO 8601). This captures both the moment in time and the local timezone context when the event was generated.event_timestamp_utc: The same moment, converted to UTC for indexing and standard processing.
When a customer onboards, they specify their billing timezone (e.g., "Asia/Tokyo"). The pricing service stores this preference. When the billing pipeline calculates monthly charges, it:
- Determines the customer's billing period boundaries in their local timezone: January 1st 00:00:00 JST to January 31st 23:59:59 JST.
- Converts these boundaries to UTC: December 31st 15:00:00 UTC (previous month) to January 31st 14:59:59 UTC.
- Queries all events with
event_timestamp_utcwithin these UTC boundaries. - Calculates charges for the period.
Now when the customer reviews their invoice, the period dates listed match their local timezone expectations. If they ask "why does January 31st show usage from February 1st", you can explain: "Because February 1st 00:00:00 JST is January 31st 15:00:00 UTC, which is still in your January billing period when using UTC boundaries."
The key is that this conversion is deterministic and auditable. Every invoice shows both the local-timezone dates (for the customer's understanding) and the UTC boundaries (for your system's determinism). An audit trail records the customer's declared timezone and the conversion logic used.
This prevents surprise invoices and eliminates the "I thought that was in a different month" arguments.
Leakage Point #5 — Stale Entitlement Caches
The Race Condition That Costs You Money
Your customer upgrades their subscription on the 15th of the month. They move from the "Starter" tier (1,000 requests/day included) to the "Professional" tier (10,000 requests/day included).
Your pricing service updates the subscription record in PostgreSQL immediately and broadcasts a subscription.upgraded Kafka event. All downstream systems listen for this event and should invalidate their cached entitlements for this customer.
But here's the race:
- 3:14:22 PM: Subscription updated in PostgreSQL.
subscription.upgradedevent published to Kafka. - 3:14:23 PM: Customer's API gateway calls your quota-check endpoint: "Can I make a request? How much usage have I accrued so far?"
- 3:14:23 PM (same microsecond): The quota-check service's Redis cache hasn't been invalidated yet (the Kafka event is still in flight, hasn't been consumed). The service returns the old entitlement: 1,000 requests/day, and the old usage counter: 998 accrued.
- 3:14:24 PM: The Kafka event is consumed. The Redis cache for this customer is invalidated.
- 3:14:25 PM: A new quota-check request comes in, and the service returns the new entitlement: 10,000 requests/day, and the fresh usage counter: 998 accrued.
From the customer's perspective, they made 3 API calls between 3:14:23 and 3:14:25 (while the cache was stale). All 3 were accepted and charged against the old "Starter" tier, even though they should have been free (under the Professional tier's 10,000 included).
This is a silent underbilling. You charged the customer less than they expected, and they may never notice. If you batch reconciliation (checking for underbilling) into a monthly process, this type of error goes undetected for weeks. If you don't reconcile at all, it goes undetected forever.
The race condition is exacerbated because cache invalidation is asynchronous. The Kafka event is published, but there's a delay (typically 100–500ms, sometimes seconds) before consumers process it and invalidate the cache. During this window, any quota-check request returns stale data.
The Architectural Fix: Event-Driven Cache Invalidation
Aforo eliminates this race through two mechanisms:
Mechanism 1: Synchronous Cache Invalidation on Writes. When a subscription upgrade is written to PostgreSQL, the same transaction also publishes the Kafka event and immediately invalidates the Redis cache key (within the same database transaction). No async delay. The subscription.upgraded event is Kafka-bound, but the Redis invalidation is synchronous:
@Transactional
public void upgradeSubscription(UUID subscriptionId, RatePlanId newPlan) {
subscription.upgrade(newPlan);
subscriptionRepository.save(subscription);
// Synchronous cache eviction
redisTemplate.delete("entitlements:" + subscription.getCustomerId());
// Then publish Kafka event (async in background, but cache is already cleared)
kafkaTemplate.send("pricing.subscription.upgraded", subscriptionEvent);
}
After this method returns, the Redis cache is guaranteed to be cleared. The next quota-check call will see stale cache (cache miss) and fetch fresh data from PostgreSQL. There's a temporary performance cost (the quota-check is slightly slower until the cache re-warms), but data consistency is guaranteed.
Mechanism 2: Versioned Cache Keys. Each cached entitlement carries a version number tied to the subscription's version. When a subscription is upgraded, its version increments. The cache key changes. Old cache entries remain in Redis but are never accessed again (they have a different key pattern). A weekly cleanup job deletes orphaned cache entries.
This approach ensures that even if a Kafka consumer crashes and never processes the invalidation event, the cache becomes inconsistent temporarily but self-heals within one billing cycle when the subscription is refreshed from PostgreSQL.
The cost is higher Redis churn (cache keys are ephemeral, not stable) and slightly more complex cache-key generation logic. The benefit is that race conditions between upgrades and quota-checks are impossible. Every quota-check gets either fresh data from PostgreSQL or a cache entry that's guaranteed to be current.
Leakage Point #6 — Minimum-Spend Commitments Not Enforced
The "True-Up" That Never Happens
Your customer commits to spending $10,000 per month. You've negotiated a 15% volume discount on their rates in exchange for this commitment. They're expected to use enough to naturally hit $10,000. If they don't, you'll invoice them for the shortfall at the end of the month.
In January, they use less than expected. The metered usage charges to $8,200. You should add a $1,800 true-up charge to bring them to the minimum. The final invoice should be $10,000.
But here's what actually happens:
- Your billing pipeline calculates the metered charge: $8,200.
- The configuration for "minimum monthly spend" exists in your contracts database, but it's not integrated with your billing pipeline.
- Your invoicing system generates the invoice for $8,200.
- The true-up should be added as a separate line item, but only if someone manually kicks off a post-processing step in your contracts system.
- No one remembers to run that step, or they run it on the wrong date (after the invoice has already been sent), or the contract record doesn't specify which invoice it should apply to.
- The invoice goes out for $8,200. The customer pays. The month closes.
- A month later, someone notices the discrepancy during reconciliation and manually adds a credit memo or a separate invoice for $1,800. It's confusing and creates customer friction.
Over a year, if 3 out of your 50 committed-spend customers undershoot their commitment 5 months out of the year, and the average shortfall is $2,000, you've just left $150,000 on the table due to the commitment enforcement falling through the cracks.
This is especially common for companies with complex pricing models (tiered commitments, multi-year deals with annual minimums, overage discounts that apply or don't based on commitment tiers). The more complex the commitment terms, the more likely they are to be encoded in the contracts database but forgotten in the billing system.
The Architectural Fix: Commit Stage in the Billing Pipeline
Aforo's 10-stage billing pipeline explicitly includes a Commit Stage (stage 6 of 10), which runs after the metered usage has been rated and discounts applied, but before final settlement.
The Commit Stage retrieves the customer's subscription record and checks for commitment terms:
Commitment Terms: - min_monthly_spend: $10,000 USD - overage_rule: BILL_AT_FULL_RATE (discount does not apply to overages) - catch_up: ALLOWED (can credit overages from future months toward the commitment)
The stage compares the post-discount charge to the minimum:
- If charge ($8,200) < minimum ($10,000): Calculate true-up = $10,000 - $8,200 = $1,800.
- Apply the true-up as a separate line item on the invoice, labeled "Minimum Monthly Commitment".
- If catch-up is allowed, record that the customer has $1,800 in credit for the next month (don't charge them the full true-up; instead, offset their next month's invoice).
If the commitment is annual (not monthly), the stage tracks it across 12 billing cycles, summing the charges and true-ups only at the year-end invoice.
The key is that commitment enforcement is part of the billing pipeline, not a separate post-processing step. It's deterministic, testable, and versioned. If commitment terms change (e.g., the customer negotiates a lower minimum in February), you create a new subscription version and the pipeline uses the new terms going forward.
Leakage Point #7 — Wallet and Credit Misapplication
When Credits Apply to the Wrong Product
Your customer has a committed credit of $5,000 from a previous overpayment. They have two subscriptions:
- API Gateway subscription ($7,000 this month)
- Data Processing subscription ($3,000 this month)
Your billing system needs to apply the $5,000 credit across both subscriptions. The question is: how?
Option A: Apply the entire $5,000 to the API Gateway subscription first, leaving a $2,000 credit on that product. The Data Processing subscription charges the full $3,000. Total charge to customer: $5,000 (API Gateway) + $3,000 (Data Processing) - $0 (credit, already exhausted) = $8,000.
Option B: Apply the credit proportionally across both subscriptions. API Gateway gets $3,500 credit, Data Processing gets $1,500 credit. Total charge: $3,500 + $1,500 = $5,000.
Option C: Apply the credit to the highest-priority product first (e.g., "API Gateway is essential, Data Processing is optional"). API Gateway gets the full $5,000, Data Processing gets charged the full $3,000. Total charge: $2,000 + $3,000 = $5,000.
These three options result in different final charges. If your customer negotiated that their credit should apply to Data Processing only (because the credit came from a Data Processing overbilling issue), then Option A is wrong—it applies the credit to the wrong product.
Now add complexity: the customer also has a "prepaid wallet" with a balance of $2,000 (they prepaid cash at the start of the contract). The wallet should be used before any credits from overpayment are applied. So the correct sequence is:
- Use prepaid wallet balance ($2,000): Total remaining charge = $10,000 - $2,000 = $8,000.
- Apply overpayment credit ($5,000), but only to Data Processing: API Gateway still charges $7,000, Data Processing is credited $3,000 and charges $0.
- Final invoice: $7,000 (API Gateway) + $0 (Data Processing) + $1,000 (remaining prepaid balance was only $2,000, so $1,000 of the overpayment credit is wasted).
If your system doesn't carefully model the order of precedence (wallet first, then credits, then product-specific credits), and doesn't track which credit was for which product, you'll either leave money on the table or apply credits to the wrong place and create customer complaints.
The Architectural Fix: Route Stage with Explicit Settlement Rules
Aforo's Route Stage (stage 9 of 10) handles the complete settlement logic.
The Route Stage receives the post-discount, post-tax charge and asks: "How should this be settled?"
Settlement rules are part of the subscription's offering configuration:
Settlement Rules for This Subscription:
- payment_model: HYBRID
- wallet_priority: PREPAID (use prepaid wallet first)
- credit_priority: [
"credit.overpayment.data_processing",
"credit.overpayment.general",
"credit.promo"
]
- fallback: POSTPAID (any remaining amount goes to invoice)
The Route Stage processes each charge (API Gateway: $7,000; Data Processing: $3,000) according to these rules:
API Gateway Charge ($7,000):
- Check if there's a product-specific prepaid wallet for API Gateway: No.
- Check if there's a general prepaid wallet and wallet_priority is PREPAID: Yes, $2,000 available.
- Deduct $2,000 from general wallet. Remaining charge: $5,000.
- Check for product-specific credits (credit.overpayment.api_gateway): None.
- Check for general credits in priority order: overpayment.general ($5,000 available).
- Deduct $5,000 from general credit. Remaining charge: $0.
- Route the $0 charge to POSTPAID (no invoice line item).
Data Processing Charge ($3,000):
- Check prepaid wallet: Already exhausted by API Gateway.
- Check for product-specific credits (credit.overpayment.data_processing): None found in this example.
- Check for general credits: Already exhausted by API Gateway.
- Route the $3,000 charge to POSTPAID (invoice line item).
Final invoice: $0 (API Gateway) + $3,000 (Data Processing) = $3,000. The customer's prepaid wallet of $2,000 and overpayment credit of $5,000 were applied in the correct order, to the correct products, and the settlement is auditable.
Each routing decision is logged with the subscription version, the credit IDs used, and the timestamp. If a customer disputes the settlement months later, you can replay the exact routing logic that was used.
Closing the Loop — From 7 Gaps to 10 Pipeline Stages
Why Point Fixes Don't Work
At this point, you might be thinking: "We can fix Point #1 by adding idempotency keys. We can fix Point #4 by using UTC everywhere. We can fix Point #7 by implementing a wallet system."
This approach—solving each gap independently—is seductive because each fix is surgically targeted and easy to explain. But it doesn't work.
Here's why: billing is a system, not a collection of isolated steps. Every step depends on the output of the previous step. When you fix one step in isolation, you create new failure modes downstream.
Example: You implement idempotent event processing (Point #1 fix) to eliminate dropped events. Now events are never lost. But your billing pipeline still closes immediately at midnight UTC (Point #4 problem). Late-arriving events that arrive after the close are counted on next month's bill. Your idempotency system is now working against your billing logic.
Or: You implement commit-stage enforcement (Point #6 fix) to calculate true-ups. But your credit application logic (Point #7) doesn't know about minimum-spend commitments. A customer with a true-up charge disputes the invoice because a credit was applied to the wrong product tier, and the true-up amount becomes a tangled mess.
The only way to close all 7 gaps is to view billing as a deterministic pipeline with explicit stages, where each stage receives a well-defined input, applies a well-defined transformation, and produces a well-defined output that feeds the next stage.
The Pipeline Approach: Aforo's 10-Stage Architecture
💡 CFO Reality Check: If you can't run your billing calculation twice and get the same number, you have an architectural problem. If you can't explain to a customer exactly which events contributed to which charges on their invoice, you're bleeding money. If your true-up calculations are handled by a spreadsheet instead of your billing system, you're not actually billing—you're guessing.
Aforo's billing pipeline has 10 explicit, versioned stages:
MetricUsage ──→ Each stage transforms the charge in sequence:
┌───────────┐ ┌──────────┐ ┌───────────┐ ┌───────────┐ ┌──────┐
│1 QuotaChk │→ │2 Rollover│→ │3 Aggregate│→ │4 Allowance│→ │5 Rate│
└───────────┘ └──────────┘ └───────────┘ └───────────┘ └──┬───┘
│
┌──────────┐ ┌─────────┐ ┌───────┐ ┌─────────┐ │
│10 Settle │← │9 Route │← │8 Tax │← │7 Discount│← ┌───────┴──┐
└──────────┘ └────┬────┘ └───────┘ └─────────┘ │6 Commit │
│ └──────────┘
┌────────┼────────┐
▼ ▼ ▼
Invoice Wallet Hybrid
(postpaid)(prepaid) (split)
QuotaCheck: Are there any quotas (hard limits) that prevent the customer from incurring charges? If yes, block or throttle usage. Output: MetricUsage with flagged events.
Rollover: If the customer had unused quota from the previous period and the plan allows rollover, subtract the rollover amount from this period's usage. Output: MetricUsage after rollover adjustment.
Aggregate: Sum the usage across all dimensions and filter conditions for each metric. Output: per-metric usage totals (e.g., 1.2M API requests, 45GB data transfer).
Allowance: Subtract any included-free usage from the aggregated total. Output: per-metric chargeable usage (e.g., 1.2M requests - 100K free = 1.1M chargeable).
Rate: Apply the pricing model (PER_UNIT, FLAT_RATE, PERCENTAGE, INCLUDED_QUOTA, GRADUATED, VOLUME_TIERED) to calculate the charge for each metric. Output: MetricCharge objects with per-metric subtotals.
Commit: Check for minimum-spend commitments. If the sum of all charges is below the minimum, add a true-up charge. Output: MetricCharge + CommitAdjustment (if applicable).
Discount: Apply any negotiated discounts (percentage or fixed-amount). Output: MetricCharge - DiscountAmount.
Tax: Calculate tax (if applicable based on customer location and product category). Output: TaxAmount (usually $0 for B2B SaaS, but calculated for completeness).
Route: Determine how to settle the charge: POSTPAID (invoice), PREPAID (wallet), or HYBRID (split). Apply credits and prepaid balances. Output: InvoiceAmount (amount to invoice), WalletDebit (amount to deduct from prepaid wallet).
Settle: Create the invoice, debit the wallet, record all transactions, dispatch webhooks, publish Kafka events. Output: Invoice and WalletTransaction objects in PostgreSQL.
Every stage is deterministic. Every stage is versioned (if pricing rules change, the version increments). Every stage produces output that's stored in PostgreSQL, not ephemeral. At any point, you can replay the calculation and get the same number.
The 7 leakage points map to specific stages:
- Point #1 (Dropped events): Handled before stage 1 (guaranteed delivery at the ingestion layer).
- Point #2 (Late-arriving events): Handled in stage 3 (Aggregate accepts events up to grace period).
- Point #3 (Dimension mismatch): Handled in stage 3 (Aggregate applies dimension filters from rate plan config).
- Point #4 (Timezone misalignment): Handled in stage 2 (Rollover respects customer's local timezone for period boundaries).
- Point #5 (Stale entitlement caches): Handled before stage 1 (QuotaCheck uses synchronously-invalidated cache).
- Point #6 (Commitment enforcement): Handled in stage 6 (Commit stage is explicit).
- Point #7 (Credit misapplication): Handled in stage 9 (Route stage applies settlement rules in defined order).
This is not a collection of fixes. It's a holistic system where each stage depends on the previous one and feeds the next one. Leakage in any stage becomes visible because the stage outputs are auditable and version-controlled.
Audit Yourself: 3 Questions to Quantify Your Leakage
If you're operating a usage-based billing system on a non-integrated stack (metering service + separate pricing calculator + separate invoicing tool, all stitched together with cron jobs and manual reconciliation), you're almost certainly experiencing revenue leakage. Here's how to find out:
Question 1: Can you reproduce the same invoice if you re-run your billing calculation?
Pick a random customer and a past billing period. Manually extract their raw usage events from your metering system (API calls, data transferred, compute hours, whatever the metric is). Run your pricing calculation on those events. Does the charge match the invoice that was issued?
If yes, good. If no, investigate where the mismatch is. Is it because of a late-arriving event that was included in the invoice but not in your extraction? Is it because the discount rules changed between when the invoice was issued and now, and your calculation is using the new rules? Is it because the true-up wasn't calculated when the invoice was issued, but your calculation includes it now?
Document the mismatch. This is your leakage.
Question 2: For your top 10 customers, what's the total invoiced amount vs. the total usage-based charge (before credits/discounts)?
This is your "gross charge before adjustments". Now ask your finance team: "How much did we actually apply in credits, discounts, and true-ups to these customers last month?"
If the credit/discount/true-up amount is highly variable month-to-month (e.g., $8,200 one month, $1,400 the next), that's a sign that commitments aren't being enforced consistently, or credits are being applied retroactively instead of at billing time.
Question 3: Do you have an audit trail that shows every event that contributed to every charge, at the customer level, down to the dimension and the pricing model applied?
Not a report that summarizes by dimension. An actual audit trail: "Event #2,847,192 (api_request, timestamp 2026-01-15T14:23:08Z, customer #247, dimension=write) was included in the January invoice because [reason]. It was rated at $0.001 because [pricing model + version] with [dimension filters] applied."
If you can't produce this audit trail, you can't defend a dispute. And you have no visibility into where leakage is happening.
If you can't answer "yes" to all three questions, you have leakage. The amount of leakage is the sum of:
- (Invoiced charge) - (reproducible charge from raw events) = Error from system uncertainty
- (Unpaid true-ups) = Error from commitment enforcement gaps
- (Credits applied outside the billing pipeline) = Error from settlement gaps
For a $25M ARR company with 250 customers, at a 0.5% average leakage rate, this sums to roughly $3.2M annually—or the equivalent of a fully-staffed finance team that spends all year chasing discrepancies instead of focusing on growth.
Conclusion: The Cost of Entropy
Revenue leakage in usage-based billing is not a feature bug or a single point of failure. It's the inevitable result of treating billing as a series of independent steps that happen to touch the same data.
The fix is not to patch each step. It's to design billing as a deterministic pipeline with explicit stages, versioned configurations, and auditable outputs. This is more complex operationally than a simple "measure, rate, invoice" flow. But for companies with usage-based pricing and committed minimums, the ROI is immediate.
For a company with $25M ARR, closing even a 0.5% leakage gap is $125K per year. For companies with $100M ARR and more complex pricing models, it's often $500K to $2M per year.
The alternative—continuing to operate a Frankenstein billing stack, reconciling discrepancies monthly, and hoping customers don't audit too carefully—gets exponentially more expensive as you scale.
Aforo's 10-stage pipeline isn't a feature. It's a recognition that usage-based billing has fundamental complexity that must be addressed architecturally, not hoped away.
Appendix: Revenue Leakage Diagnostic Checklist
Use this checklist to assess your current billing system:
- [ ] Dropped Events: Can you definitively prove that every usage event generated by your customer made it into your invoice? (Do you have an idempotency key on every event?)
- [ ] Late Events: Do you have a policy for events that arrive after the billing period closes? Is it documented and applied consistently? (Do you have a grace period, or do you reject them silently?)
- [ ] Dimension Filtering: Does your pricing model exactly match your metering model, or are you applying filters in post-processing? (Are filters version-controlled and part of the rate plan config?)
- [ ] Timezone Handling: Are all event timestamps stored with timezone information, or do you assume UTC and hope for the best? (Can a customer in Tokyo reproduce their invoice using their local timezone boundaries?)
- [ ] Cache Coherence: How often are your entitlement caches stale relative to subscription changes? Do you synchronously invalidate caches on writes? (Or do you rely on async Kafka events?)
- [ ] Commitment Enforcement: Is minimum-spend enforcement part of your billing pipeline, or is it a post-processing step? (Can you recreate a customer's true-up charge from the pipeline alone, or do you need external tools?)
- [ ] Credit Settlement: Do you have a defined order of precedence for applying prepaid wallets, credits, and discounts? Is it versioned and auditable? (Or is it a manual process that varies by customer?)
For each "no" answer, you have a potential leakage point. For each leakage point, ask: "If we fixed this, how much revenue would we recover?"
That answer is your business case.