Skip to main content

The Saga pattern in distributed systems

·5 mins

Alex is planning a dream vacation using SuperTrip, an online travel platform. With a few clicks, Alex books:

  1. A round-trip flight to Bali
  2. A beachfront hotel for a week
  3. A rental car for local exploration
  4. Several exciting excursions

SuperTrip’s system processes each booking through different microservices:

flowchart LR BookVacation(Book vacation) BookFlight(Book flight) ReserveHotel(Reserve hotel) ArrangeCar(Arrange car) ScheduleExcursions(Schedule excursions) User((Alex)) --> BookVacation BookVacation --> BookFlight BookVacation --> ReserveHotel BookVacation --> ArrangeCar BookVacation --> ScheduleExcursions

Alex’s credit card is charged for the flight, and a confirmation email arrives. The hotel reservation is confirmed next. But then disaster strikes – there are no rental cars available for the dates!

Now what? The flight and hotel are already booked and paid for, but without a car, Alex’s plans are ruined.

The customer service representative apologizes: “I’m sorry, but our flight and hotel bookings are handled by different systems. I can cancel them, but you’ll incur cancellation fees.”

Alex is frustrated: “Why did you confirm part of my vacation if you couldn’t confirm all of it?!”

Getting distributed transactions right #

Suppose we have a distributed system with multiple services. Each service has its own database and individually provides transactional consistency in data.

In usage flows where only one service is involved at a time, we don’t need anything special. The data can be accessed and modified with strong data consistency without much thought.

However, in usage flows where multiple services are involved, then we need a way to ensure that the data is consistent as we operate across all of these services. That is challenging.

That’s why Alex’s vacation booking nightmare happened. Ideally, Alex would have either received a complete vacation package or a full refund with no partial bookings, making for a much happier customer experience.

This is exactly the kind of problem the Saga pattern solves.

So what is the Saga pattern? #

The pattern uses a sequence of transactions where each transaction updates data within a single service.

One transaction completing triggers the next transaction in the sequence. If any transaction fails, the saga executes compensating transactions to unwind the prior changes.

The concept of a “saga” was actually introduced in a 1987 paper1

A LLT [long lived transaction] is a saga if it can be written as a sequence of transactions that can be interleaved with other transactions. The database management system guarantees that either all the transactions in a saga are successfully completed or compensating transactions are run to amend a partial execution.

A saga is defined by two key types of transactions:

  1. Forward transactions: The regular operations that move the business process forward
  2. Compensating transactions: Operations that undo the effects of the forward transactions when something goes wrong

In our vacation booking example, if the car rental fails, compensating transactions would automatically refund the hotel and flight bookings.

Ways to approach the Saga pattern #

Orchestration #

In this approach, a central orchestrator (or coordinator) directs and controls the execution of the saga’s transactions. The orchestrator maintains the saga’s state, invokes the participating services, and handles failures by triggering compensating transactions.

sequenceDiagram participant Orchestrator participant FlightService participant HotelService participant CarRentalService Orchestrator->>FlightService: Book flight FlightService-->>Orchestrator: Flight booked Orchestrator->>HotelService: Reserve hotel HotelService-->>Orchestrator: Hotel reserved Orchestrator->>CarRentalService: Rent car CarRentalService-->>Orchestrator: No cars available (Failure) Orchestrator->>HotelService: Cancel hotel reservation HotelService-->>Orchestrator: Hotel cancelled Orchestrator->>FlightService: Cancel flight booking FlightService-->>Orchestrator: Flight cancelled

Choreography #

In this approach, there’s no central coordinator. Instead, each service publishes domain events that trigger the next service in the chain. Services also listen for failure events to execute their compensating transactions when needed.

sequenceDiagram participant FlightService participant HotelService participant CarRentalService participant EventBus FlightService->>EventBus: Flight booked event EventBus->>HotelService: Flight booked event HotelService->>EventBus: Hotel reserved event EventBus->>CarRentalService: Hotel reserved event CarRentalService->>EventBus: Car rental failed event EventBus->>HotelService: Car rental failed event HotelService->>EventBus: Hotel reservation cancelled event EventBus->>FlightService: Hotel reservation cancelled event FlightService->>EventBus: Flight booking cancelled event

When to use which approach #

Orchestration is slightly easier to implement, makes it easy to trace the state of the saga, and allows the services themselves to have simpler logic.

Choreography provides superior decoupling, better distribution of load, and is more flexible.

Generally speaking, the strengths of one approach happen to be the weaknesses of the other. Other important dimensions that comes into play are what the services actually doing and what tooling is available in the particular software ecosystem.

Applications in real systems #

E-commerce platforms #

E-commerce platforms like Amazon may use the Saga pattern to manage complex order processing workflows that involve inventory checks, payment processing, shipping arrangements, and more. If payment fails after inventory is reserved, compensating transactions release the reserved inventory.

Food Delivery Services #

Services like DoorDash or Uber Eats use sagas (or saga-like flows) to coordinate restaurant orders, payment processing, and driver assignments. If a driver can’t be found, the order and payment must be reversed.

Travel Industry #

As in our fictional example, travel booking platforms implement sagas to ensure all components of a trip (flights, hotels, cars, activities) are either all confirmed or all cancelled without penalties.

Challenges #

Implementing sagas adds complexity #

Implementing sagas adds complexity to the system. The more service steps there are, the more pathways there are to test, since the user flow might fail at any step.

Each service must be individually transactional #

A key prereq for the Saga pattern is that each service must be individually transactional. Without this, composing a saga of transactions is out of the question. It may not be possible or not easily achievable to ensure all services behave this way.

Idempotency #

Services must be designed to handle duplicate requests. Compensating or forward transactions might be retried during failure recovery.


  1. Hector Garcia-Molina and Kenneth Salem. 1987. Sagas. In Proceedings of the 1987 ACM SIGMOD international conference on Management of data (SIGMOD ‘87). Association for Computing Machinery, New York, NY, USA, 249–259. https://doi.org/10.1145/38713.38742 ↩︎