Saga Pattern #2: ACD Transactions

Bianca Cristina
7 min readJul 27, 2022

Se preferir, você pode ler esse artigo em Português aqui

In the first article of the serie, we talked about Saga pattern definition and how we can use it to provide data consistency levels between microservices. However, there’s a relevant difference between “ordinary” transactions and Saga pattern’s transactions: the Saga pattern is not enough to guarantee the isolation property.

In case you need it, I’ll refresh your memory: the isolation property provides the ilusion that each operation has exclusive access to data even if there are multiple transactions running at the same time.

So, if the Saga pattern lacks isolation property, then is it possible that multiple local transactions modify the same information simultaneously? 😨

Unfortunately, yes! 😓 And precisely because of the isolation lack, the execution of local transactions can cause data inconsistencies categorized as anomalies. In this second article, we’ll explore these anomalies and their possible solutions.

Don’t give up yet, give this article a try first!

Anomalies

For the next sections, consider the “Request Trip” saga presented in the first article:

“Request Trip” saga

Lost Update

Suppose the passenger requests a trip and the driver accepts it, but then the passenger decides to cancel the trip before the driver arrives.

“Cancel Trip” saga

In this case, the application triggers the saga responsible for cancelling the trip and voilá trip canceled successfully, right? Well, there’s no easy way to say this but… 😬

In the event’s world, we need to embrace the fact that ensuring the order of events is challenging and not always possible. Considering this, the following scenario can happen:

  1. The “Request Trip” saga is triggered and the transactions T1, T2, and T3 are finished
  2. Between transactions T3 and T4, the passenger decides to cancel the trip, the “Cancel Trip” saga is triggered and the trip is canceled before the completion of the transaction “Pay” (T4) in “Request Trip” saga
  3. Due to some delay in the event delivery, transaction T4 is executed successfully and the passenger is charged

This situation describes how transaction T4 of “Request Trip” saga ignored the updates made by “Cancel Trip” saga and caused an improper charge. This scenario represents the anomaly known as lost update.

Dirty Read

Lucky for you (or not), the passenger is an expert bug hunter and found one more possible error situation: imagine the scenario in which the trip can only be canceled in a maximum of five minutes after the driver accepts it.

“Cancel Trip with Time Limit” saga

Now, suppose the user requests trips using only the virtual wallet and, at this moment, has enough balance for a single trip and requests it. However, something happens and the passenger decides to cancel the trip after the five minutes. Therefore, the following scenario is possible:

  1. The “Cancel Trip with Time Limit” is triggered and the refund is executed, thus increasing the credit available in the passenger’s wallet
  2. The passenger tends to be fickle and decided to request another trip using the refunded value. So, the “Request Trip” saga is triggered for a new trip, thus reducing the credit available in the passenger’s wallet
  3. The saga from the first step is canceled because transaction T3 found that the maximum cancellation time was exceeded. Due to this, compesation transactions will be triggered, thus reducing the credit available in the passenger’s wallet

In this case, the “Request Trip” saga read a temporary data from “Cancel Trip with Time Limit” saga and took decisions based on that information. However, the “Cancel Trip with Time Limit” saga triggered compensation transactions, thus allowing the passenger to request a trip even without enough balance. This scenario describes the anomaly known as a dirty read.

Fuzzy Read

Not even the route choice was able to avoid the bugs caused by its faithful passenger. Suppose the application supports changing the route even after the completion of transaction T1 in “Request Trip” saga. This possibility is represented by “Update Route” saga.

“Update Route” saga

Also, consider that the transaction “Find Driver” (T2) from “Request Trip” saga performs two searches to get trip information: one at the start of the transaction and another at the end. In this context, the following scenario is possible:

  1. The “Request Trip” saga is triggered and is in transaction T2 performing the first search
  2. Before completion of transaction T2 and after the first search is executed, the passenger decides to update the route. Therefore, the “Update Route” saga is triggered and completed before the start of the second search in transaction T2 from “Request Trip” saga
  3. After the “Update Route” saga is finished, transaction T2 performs the second search and the trip information result is different from the first search

In this case, if the information obtained in the different search results in T2 are used, the application might assume an inconsistent state. This scenario describes the anomaly known as fuzzy read.

Possible Solutions

Finally, the good part! Now you’re gonna show me how to solve all anomalies, right? 🥳

Well… Briefly, it’s practically impossible to eliminate all anomalies, what we can do is to minimize the damages caused using specific techniques.

Semantic Lock

The semantic lock consists of adding a flag to the data indicating that it has not yet been commited and other transactions that require this data should treat it as untrusted, thus minimizing the impacts caused by all three anomalies: lost update, dirty read, and fuzzy read.

In “Request Trip” saga, for example, we could add an extra information to the trip indicating that its current state is “Awaiting Driver” while we still are in transaction T2, so if it’s not possible to find a suitable driver, the trip can be canceled.

But what to do when the data current state is untrusted? 🤔

That’s the million dollar question and only you are able to answer it, as each application has its own limits. For example, an application that supports reprocessing events could choose to mark the event as unprocessed when an untrusted data is read in the middle of a transaction and force reprocessing that event in the future. For an application that doesn’t support reprocessing events, but supports latencies a lit bit higher, a viable option is to block the transaction until the necessary data is committed. The best way to deal with uncommited data is the one that suits your business better.

Commutative Updates

By definition, commutative updates are operations that can be executed in any order, since they don’t affect each other. Consider the transaction “Pay” (T4) and “Find Driver” (T2) from “Request Trip” saga, if we assume that “Cancel Trip” saga is able to undo all steps from “Request Trip” saga in case of a problem, then we could reverse the order of transactions T2 and T4 without major effects for “Request Trip” saga, so these transactions are commutative. Understading which transactions are commutative is useful for reversing the order of transactions that can cause the lost update anomaly.

Pessimistic View

The pessimistic view is a way of organizing the stages of a saga in order to reduce the risks related to the dirty read anomaly. Consider the situation presented in the dirty read anomaly explanation in which was possible to request a trip exceeding the credit available in the passenger’s wallet. In this case, the error was due to the fact that the refund is carried out before checking if the trip can still be canceled. Thus, if we reorganized the steps of “Cancel Trip with Time Limit” saga by placing the transaction that makes the refund in the last step, we would minimize the risks of a dirty read of the limit available in the passenger’s wallet.

“Cancel Trip with Time Limit Reorganized” saga

Reread Value

This technique is a simple way to minimize the damage caused by lost update and fuzzy read anomalies by rereading the value before taking any action involving that data. Therefore, if we detect some data change, then the current transaction can be aborted and possibly restarted in the future.

Operations History

The operations history consists of recording all the actions performed in some data in order to take decisions based on operations that have already been performed, thus reducing the impacts of the lost update anomaly.

Consider the “Request Trip” and “Cancel Trip” sagas that perform, respectively, operations to accept the trip and change the trip status to canceled. If both sagas run simultaneously, it’s possible that the transaction to change the trip status to canceled is completed before the one responsible for accepting the trip. To prevent this from happenning, we could record the trip’s transaction history, and as the trip will be canceled before being accepted, then it’ll be possible to ignore the transaction that accepts the trip.

If you got here, then you probably understood that the Saga pattern isn’t silver bullet: although it provides certain levels of data consistency, there are drawbacks associated and it’s the developer’s responsability to reduce them.

That’s all for today, I hope these first two articles were useful and for the next one (and finally the last! 🎉) I’ll show you how to implement the Saga pattern using AWS Lambda. See you later! 🙂

--

--