Case Study
Payment orchestration with Temporal, Kafka, and Stripe
This case describes a backend service for payment workflows where the hard part was not charging a card, but keeping state correct when requests were retried, downstream systems were slow, and billing steps had to stay consistent across multiple services.
Context
What problem needed to be solved
The platform needed a reliable payment workflow around billing and subscription operations. A single user action could trigger several dependent steps: validate intent, create or confirm a charge, persist local state, publish events, and reconcile the final outcome for internal systems.
The system could not rely on simple request-response logic because payment providers and internal consumers fail in different ways. Some failures are transient, some are partially completed, and some return ambiguous outcomes that must be reconciled later.
Product-level context for this system is described on the MyAutoData technical context page.
MyAutoData Context
What kind of product this workflow lived in
MyAutoData is a multi-domain vehicle data platform combining user vehicle data, analytics, marketplace-style flows, and payment-related operations. That matters because the payment service was not isolated: it had to fit into a broader system with internal APIs, asynchronous processing, and user/account state that needed to stay consistent.
In practice, the payment workflows sat next to other business processes and had to handle retries, provider uncertainty, and downstream event consumers without breaking the rest of the platform.
Constraints
Why the naive approach was not enough
- Duplicate execution could cause duplicate charges or inconsistent internal state.
- Network failures could happen after Stripe accepted a request but before the service received a stable response.
- Publishing events directly from application code risked losing messages or producing state transitions out of order.
- Retrying from cron jobs or ad-hoc workers made failure handling implicit and hard to reason about.
- Operations needed a clear place to inspect workflow state instead of reconstructing it from scattered logs.
Solution
Architecture and engineering decisions
1. Temporal owned workflow progression
Instead of encoding retries and compensation logic across handlers and cron tasks, the payment flow was modeled as an explicit workflow. That made step ordering, retry policy, and timeout behavior visible in one place.
2. Idempotency was enforced at command boundaries
Each externally visible payment action used stable identifiers and repeat-safe commands so the same request could be replayed without creating duplicate charges. This mattered for both client retries and worker restarts.
3. Kafka publication went through Outbox
Instead of writing domain state and publishing events in separate non-atomic operations, state changes were stored together with an outbox record in PostgreSQL. Event publication happened asynchronously, reducing the risk of lost or premature messages.
4. Reconciliation handled ambiguous provider outcomes
Some failures were not true failures but unknown outcomes. Separate reconciliation logic verified final Stripe state and brought internal records back to a consistent terminal state.
Trade-offs
Why this design was chosen
Temporal adds operational and conceptual weight compared with simple background jobs. That trade-off was worth it because the problem space was inherently stateful and failure-heavy. The team needed a deterministic place to reason about retries, timeouts, and compensation.
Kafka plus Outbox also adds moving parts, but it gives a clearer boundary between transactional state changes and asynchronous communication. For payment-related workflows, that boundary is more valuable than reducing the number of components.
Result
Outcome
- Supported billing flows handling 10K+ payment transactions per month.
- Reduced the risk of duplicate charges under retries and downstream instability.
- Made workflow execution inspectable and easier to debug in production.
- Created a more reliable path from payment state transition to downstream event publication.