The Dollar Value of a Dropped Event
A single API call generates a billing event. A usage record with a tenant ID, a metric identifier, a timestamp, a quantity, and a set of dimension tags. It is 400 bytes. It exists for less than 50 milliseconds in the ingestion buffer before it is acknowledged, persisted, and forwarded for enrichment.
When your platform processes 10 billion of these events per month across 1,000 tenants, the margin for error is not a percentage. It is measured in dollars per dropped event. If your average event is worth $0.003 in billable revenue and your pipeline drops 0.1% of events, you are losing $30,000 per month — $360,000 per year — in revenue that was earned, consumed by the customer, and never billed. At a 0.5% drop rate, that number is $1.8 million.
These are not hypothetical losses. They are the silent, structural revenue leakage that occurs whenever a billing pipeline treats event delivery with the same tolerance that an analytics pipeline does. In analytics, you can afford to lose events. In billing, every lost event is money that vanishes from your revenue line and never appears in any report, because no system knows it was supposed to be there.
Aforo's ingestion pipeline was designed around a single invariant: "we lost some usage data" is an architectural impossibility, not a monitoring alert. This article explains how.
The Aforo Invariant
At 10 billion events/month and $0.003 per event, a 0.1% drop rate costs $360K/year. Aforo's pipeline guarantees zero silent data loss — every billable event is captured, deduplicated, and rated exactly once.
A Leaky Pipeline Is Embezzlement from Your Own Company
In analytics engineering, a 0.5% event drop rate is a rounding error. Your dashboards are directionally correct. Your A/B tests still converge. Your product manager can still identify the top feature by usage volume. Nobody files a bug report for a 0.5% variance in a Mixpanel funnel chart.
In billing engineering, a 0.5% drop rate is hundreds of thousands of dollars evaporating annually. It is the financial equivalent of embezzlement from your own company — except there is no perpetrator to catch, because the money was never recorded as owed. The event was generated by the customer's API call, consumed your compute, traversed your infrastructure, and then disappeared into a Kafka consumer lag spike, an out-of-memory crash, or a silent deserialization failure that nobody noticed because the consumer did not emit a metric for events it could not parse.
The distinction between analytics-grade and billing-grade data pipelines is not one of scale. It is one of correctness invariants. An analytics pipeline optimizes for throughput and latency, tolerating occasional data loss. A billing pipeline optimizes for completeness and accuracy, tolerating increased latency if necessary to guarantee that every billable event is captured, deduplicated, and rated exactly once.
If your billing pipeline shares infrastructure, delivery guarantees, or operational practices with your analytics pipeline, you are running a billing system that is architecturally incapable of guaranteeing revenue accuracy. Every batch processing gap, every silent consumer failure, every unmonitored dead-letter topic is a hole through which your revenue leaks.
Architecture Reality Check
If your cloud provider experiences a 5-minute network partition, does your billing pipeline silently drop the usage events generated during that window, or does it mathematically guarantee they are captured, deduplicated, and rated once the network recovers?
If you cannot answer this question by pointing to a specific architectural mechanism — not a monitoring alert, not a manual reconciliation process, but a structural guarantee in the pipeline design — then your billing pipeline is analytics-grade, not billing-grade.
The Dual-Path Architecture: Hot Path and Cold Path
Aforo's ingestion pipeline separates event processing into two distinct paths with different latency targets, consistency guarantees, and failure modes. This separation is the foundational architectural decision that enables both sub-50ms acknowledgement latency and exactly-once billing accuracy.
Hot Path vs. Cold Path — Architectural Comparison
| Dimension | Hot Path (Ingest + Ack) | Cold Path (Enrich + Rate) |
|---|---|---|
| Latency target | < 50ms p99 (accept) | < 5s p99 (enrich + rate) |
| Primary concern | Never lose an event | Correctness of enrichment |
| Storage | Kafka (durable, replicated) | PostgreSQL + ClickHouse |
| Delivery guarantee | At-least-once (Kafka acks=all) | Exactly-once (idempotent writes) |
| Deduplication | Client-generated event ID | Upsert on (tenant_id, event_id) |
| Ordering | Per-partition (tenant-affine) | Not required (idempotent) |
| Failure mode | Backpressure (producer blocks) | Dead-letter + retry (3x exp backoff) |
| Scaling axis | Kafka partitions (horizontal) | Consumer group parallelism |
The Hot Path: Accept, Validate, Acknowledge
The hot path has one job: accept the event from the producer, validate its schema and timestamp, persist it to durable storage, and return an acknowledgement to the caller. The target latency is under 50ms at p99. The target availability is 99.99%. The target data loss rate is zero.
The implementation is deliberately minimal. The ingest endpoint receives the event over HTTPS, performs structural validation (required fields, valid JSON, known metric identifier, tenant ID present), runs timestamp validation (described below), and produces the event to a Kafka topic with acks=all. The acks=all configuration means the Kafka producer does not consider the write complete until the message has been written to the leader partition and all in-sync replicas. Only after Kafka confirms durability does the ingest endpoint return HTTP 202 Accepted to the caller.
This design guarantees that if the caller receives a 202, the event is durably stored in Kafka with replication factor 3. If the caller does not receive a 202 (network timeout, 5xx error), the caller retries with the same client-generated event ID. The hot path is idempotent by design: duplicate submissions with the same event ID are deduplicated downstream, so at-least-once delivery from the client converges to exactly-once processing in the billing pipeline.
The hot path performs no enrichment. It does not look up the customer's subscription. It does not resolve the rate plan. It does not compute the billable amount. These operations require cross-service calls that would add latency and introduce failure modes into the acknowledgement path. By deferring enrichment to the cold path, the hot path remains fast, simple, and resilient to downstream failures.
The Cold Path: Enrich, Deduplicate, Rate
The cold path is a Kafka consumer group that reads events from the durable topic and performs the computationally expensive work: product-type detection, billing hierarchy resolution, entitlement validation, metric enrichment, deduplication, and — for prepaid and hybrid models — wallet hold creation.
The processing stages are:
Product-Type Detection: The consumer reads the raw event and identifies the product type (API, Agentic API, AI Agent, MCP Server) based on the metric identifier and event payload.
Billing Hierarchy Resolution: The consumer resolves the API key to a customer, team, and subscription using a Redis-cached hierarchy (10-minute TTL). This enrichment adds the billing context that the hot path deliberately omitted.
Entitlement Validation: The consumer verifies that the customer's subscription is active, the metric is within the subscription's entitlements, and any quota limits have not been exceeded. Quota enforcement uses a Redis counter with a 30-second sync interval.
Deduplication: The consumer performs an upsert on the (tenant_id, event_id) composite key. If the event ID already exists, the write is a no-op. This is the exactly-once guarantee: at-least-once delivery from Kafka, combined with idempotent writes, produces exactly-once processing semantics.
Rating and Routing: For events that route to the billing pipeline (as opposed to analytics-only events), the consumer creates wallet holds for prepaid customers and records the enriched event for inclusion in the next invoice run.
Events that fail processing — deserialization errors, unknown metric IDs, subscription lookup failures — are routed to a dead-letter topic with 3-retry exponential backoff (initial delay 1 second, multiplied by 2 per retry). Non-retryable errors (malformed JSON, invalid event structure) are immediately dead-lettered without retry. The dead-letter topic is monitored by a DeadLetterTopicMonitor that persists failed events to a PostgreSQL table, enabling manual inspection, replay, and audit.
Timestamp Validation: The Subtle Art of Temporal Correctness
Timestamp handling in a billing pipeline is more nuanced than it appears. Events arrive with a client-supplied timestamp that represents when the billable action occurred, not when the event arrived at the ingest endpoint. These two times can diverge significantly: network latency, client-side batching, clock skew, and retry logic all introduce gaps between occurrence and arrival.
Aforo's timestamp validation applies a four-rule classification:
Timestamp Validation Rules
| Condition | Threshold | Action | Rationale |
|---|---|---|---|
| Future timestamp | > 5 min ahead of server | Reject (HTTP 422) | Clock skew / replay attack |
| Stale event | > 90 days old | Reject (HTTP 422) | Outside billing window |
| Late arrival | > 24 hours old | Accept + flag for review | Valid but needs reconciliation |
| On-time event | ≤ 24 hours old | Accept (standard path) | Normal ingestion |
The 5-minute future threshold accommodates reasonable clock skew between the client and the server without accepting events from the future, which could be used to manipulate billing periods. The 90-day stale threshold ensures that events from long-past billing periods — which may have already been invoiced and closed — do not silently appear in the pipeline. The 24-hour late-arrival flag is the critical middle ground: the event is valid and billable, but it arrived after the standard aggregation window, so it is flagged for inclusion in the next billing period or for a catch-up adjustment if the original period has already been invoiced.
All thresholds are configurable per tenant via aforo.ingestion.* properties. Tenants with high-latency event sources (IoT devices, batch-processing systems) can extend the late-arrival window. Tenants with strict real-time requirements can tighten it.
The Dual-Engine Database: Analytical and Transactional Isolation
A billing pipeline that processes 10 billion events per month must serve two fundamentally different query patterns: high-cardinality analytical queries ("Show me API call volume by tenant, by product, by hour, for the last 90 days") and low-latency transactional writes ("Debit this customer's wallet by $4.23 and create an invoice line item"). These patterns have incompatible performance characteristics. Running both on the same database engine forces a choice between analytical throughput and transactional consistency — and in billing, you cannot sacrifice either.
Aforo's architecture uses two purpose-built engines, each optimized for its workload, with Kafka as the bridge between them.
Dual-Engine Database Architecture
| Dimension | ClickHouse (Analytical) | PostgreSQL (Transactional) |
|---|---|---|
| Engine type | Column-oriented OLAP | Row-oriented OLTP |
| Primary use | Analytics, dashboards, anomaly detection | Billing transactions, wallet debits, invoices |
| Consistency | Eventual (async from Kafka) | Strong (ACID transactions) |
| Query pattern | Aggregate 10B events by tenant, metric, time | Point lookups, FK joins, atomic writes |
| Cardinality | High (millions of distinct metric values) | Moderate (thousands of subscriptions) |
| Retention | 365 days, auto-TTL | Indefinite (audit requirement) |
| Write pattern | Batch insert (100 events / 5s) | Single-row transactional |
| Isolation | Tenant-partitioned (MergeTree) | Tenant-scoped RLS (row-level security) |
ClickHouse: The Analytical Engine
ClickHouse is a column-oriented OLAP database designed for high-throughput analytical queries on large datasets. In Aforo's architecture, it serves as the analytical layer for usage data: real-time dashboards, tenant-level consumption reports, anomaly detection, and the usage velocity metrics that feed the FP&A forecasting framework described in Article 11.
The schema is optimized for time-series aggregation. The primary table (usage_events) uses a MergeTree engine ordered by (tenant_id, created_at) with a 365-day TTL. Materialized views pre-aggregate daily statistics (event_daily_stats using SummingMergeTree) and MCP-specific metrics (mcp_daily_stats and mcp_session_summary using AggregatingMergeTree). These materialized views enable sub-second dashboard queries on datasets that would require minutes to scan in a row-oriented database.
Events flow into ClickHouse via a Kafka consumer that batch-inserts 100 events (or flushes every 5 seconds, whichever comes first). The batch insertion pattern is critical for ClickHouse performance: the engine is optimized for large, infrequent inserts rather than small, frequent ones. The consumer maintains an in-memory buffer and flushes to ClickHouse in a single batch INSERT, achieving write throughput of 500,000+ events per second per consumer instance.
PostgreSQL: The Transactional Engine
PostgreSQL serves as the transactional backbone for all billing-critical operations: wallet debits and holds (with SELECT FOR UPDATE pessimistic locking), invoice line items, subscription state transitions, payment records, and credit notes. These operations require ACID guarantees that a column-oriented OLAP engine cannot provide.
The key architectural constraint is that analytical queries never touch the transactional database. Dashboard queries, consumption reports, and anomaly detection run exclusively against ClickHouse. This ensures that a heavy analytical query — scanning 90 days of usage data across all tenants for an internal report — cannot degrade the latency of a wallet debit that is blocking a customer's API call or a subscription state transition that is closing a billing period.
Multi-tenancy in PostgreSQL is enforced at two levels: application-level filtering (every query includes a tenant_id predicate, enforced by a Spring Security filter) and row-level security policies that serve as a defense-in-depth layer preventing cross-tenant data access even if the application logic contains a bug. Connection pooling is tuned per workload: the usage-ingestor service (which handles the highest write volume) is configured with 60 max connections and 20 minimum idle, while lower-throughput services use 40/10.
Kafka: The Bridge
Kafka is not merely a message bus in this architecture. It is the durable event log that serves as the source of truth between the hot path and both database engines. Events are written to Kafka once (in the hot path) and consumed independently by the ClickHouse analytics consumer and the PostgreSQL billing consumer. This fan-out pattern means the two engines can evolve independently: ClickHouse can be rebuilt from the Kafka log if schema changes require a backfill. PostgreSQL can replay events from a specific offset if a billing bug is discovered and needs correction.
Producer configuration enforces durability: acks: all, retries: 3, linger.ms: 5 (batching for throughput), and compression.type: lz4 (reducing network bandwidth by ~60% with minimal CPU overhead). Consumer configuration uses auto-offset-reset: latest to avoid reprocessing the entire log on consumer group restart, with manual offset management for billing-critical consumers that require at-least-once semantics.
Multi-Tenant Isolation at Ingestion Scale
At 10 billion events per month across 1,000+ tenants, a single tenant's traffic spike cannot be allowed to degrade the ingestion latency or billing accuracy for other tenants. Aforo enforces tenant isolation at three layers:
Kafka partitioning: Events are partitioned by tenant ID, ensuring that all events for a given tenant land on the same Kafka partition. This provides per-tenant ordering guarantees and enables consumer-side parallelism without cross-tenant coordination.
Redis-cached entitlements: Entitlement checks, billing hierarchy lookups, and quota counters are cached in Redis with tenant-scoped keys (pattern: {feature}:{tenant_id}:{entity}:{id}). A cache miss for Tenant A does not invalidate the cache for Tenant B.
Database-level RLS: Every PostgreSQL query includes a tenant_id predicate. Row-level security policies enforce this at the database level. Composite indexes on (tenant_id, status) ensure that tenant-scoped queries use index scans, not table scans, regardless of table size.
Rate limiting is also tenant-aware. Each tenant's ingestion rate is governed by a configurable throttle (events per second) enforced at the API gateway layer. A tenant that attempts to exceed their ingestion rate receives HTTP 429 responses, while other tenants continue to ingest at full speed. The throttle is not a hard cap — bursts are accommodated by Kafka's internal buffering — but sustained overload triggers backpressure rather than data loss.
Observability: Proving the Pipeline's Guarantees
Architectural guarantees are only as strong as your ability to verify them in production. Aforo's ingestion pipeline exposes metrics at every stage, enabling real-time verification that the completeness invariant holds:
Ingest endpoint: Events accepted, events rejected (by reason), p50/p95/p99 accept latency, per-tenant ingestion rate
Kafka: Producer success/failure rate, partition lag per consumer group, consumer processing rate, dead-letter topic depth
Cold path: Deduplication hit rate (percentage of events that were duplicates), enrichment failure rate, entitlement rejection rate
ClickHouse: Batch insert latency, materialized view refresh lag, query p95 latency
PostgreSQL: Connection pool utilization, transaction commit rate, wallet debit latency
All metrics are exposed via Prometheus endpoints (/actuator/metrics/prometheus) and visualized in Grafana dashboards with per-service, per-tenant granularity. Distributed tracing via OpenTelemetry (Micrometer → Jaeger) allows an operator to trace a single event from the ingest endpoint through Kafka, through the cold path consumer, and into both database engines, verifying that it was captured, deduplicated, enriched, and persisted correctly.
The most critical alert is consumer lag. If the billing consumer falls behind the Kafka log by more than 60 seconds, an alert fires. This is not a convenience metric — it is a leading indicator of a pipeline that is falling behind the event production rate and will eventually either cause billing delays or, worse, trigger an out-of-memory condition that drops events. Consumer lag is the canary in the billing pipeline's coal mine.
The "Audit Yourself" Checklist
Before your next architecture review, your engineering team should be able to answer these three questions with specific, verifiable mechanisms. If the answers reference monitoring alerts or manual reconciliation processes instead of structural guarantees, your billing pipeline is not billing-grade.
1. The Delivery Guarantee Test
What is the delivery guarantee of your billing event pipeline? If the answer is "at-least-once," describe the specific mechanism that converts at-least-once delivery into exactly-once processing. If that mechanism is "we deduplicate in the database on insert," what is the deduplication key? Is it client-generated (correct) or server-generated (incorrect, because retries create new IDs)? If you cannot trace the path from producer retry to consumer deduplication to idempotent database write, your pipeline has a gap where events can be either lost or double-counted.
2. The Deduplication Audit
Query your billing events database and count the number of events with duplicate (tenant_id, event_id) pairs over the last 30 days. If the answer is zero and your pipeline claims at-least-once delivery, either your deduplication is working perfectly (verify this independently) or your pipeline is actually at-most-once (events are being silently dropped before they reach the database). Now count the number of events in your dead-letter topic. If that number is higher than 0.01% of total events, investigate every category of failure. Each one is either a lost billing event or a bug in your validation logic.
3. The Late-Arrival Reconciliation Test
Artificially inject a batch of 1,000 usage events with timestamps from 48 hours ago into your ingest endpoint. Verify that all 1,000 events (a) are accepted by the pipeline, (b) are flagged as late arrivals, (c) are correctly attributed to the billing period in which they occurred (not the period in which they arrived), and (d) are included in the invoice if the original period has not yet been closed, or queued for a catch-up adjustment if it has. If your pipeline cannot handle late arrivals without manual intervention, it is not billing-grade.
The Bottom Line
A billing pipeline is not an analytics pipeline with higher stakes. It is a fundamentally different class of system with different invariants, different failure modes, and different correctness requirements. The analytics pipeline asks: "Did we capture enough events to draw a statistically valid conclusion?" The billing pipeline asks: "Did we capture every event, deduplicate every retry, rate every event exactly once, and persist the result to both the analytical and transactional stores without data loss?"
At 10 billion events per month, the engineering required to answer "yes" to every part of that question is substantial. It requires a hot path that acknowledges events in under 50ms with durable Kafka persistence. It requires a cold path that enriches, deduplicates, and rates events with exactly-once semantics. It requires timestamp validation that handles clock skew, late arrivals, and stale events without losing billable data. It requires a dual-engine database architecture where analytical queries and billing transactions never compete for the same resources. And it requires an observability layer that proves — continuously, in production — that the guarantees hold.
Building this from scratch takes 12-18 months of senior distributed-systems engineering. Maintaining it is a permanent line item. The companies that will scale usage-based billing to billions of events without revenue leakage are the ones that recognize this pipeline as core billing infrastructure — not a side project for the data engineering team.
Ready to see a billing-grade ingestion pipeline?