Starbucks as Order Management
Cool analysis of the Starbucks order processing method:
By taking advantage of an asynchronous approach Starbucks also has to deal with the same challenges that asynchrony inherently brings. Take for example, correlation. Drink orders are not necessarily completed in the order they were placed. This can happen for two reasons. First, multiple baristas may be processing orders using different equipment. Blended drinks may take longer than a drip coffee. Second, baristas may make multiple drinks in one batch to optimize processing time. As a result, Starbucks has a correlation problem. Drinks are delivered out of sequence and need to be matched up to the correct customer. Starbucks solves the problem with the same "pattern" we use in messaging architectures -- they use a Correlation Identifier. In the US, most Starbucks use an explicit correlation identifier by writing your name on the cup and calling it out when the drink is complete. In other countries, you have to correlate by the type of drink.
Exception handling in asynchronous messaging scenarios can be difficult. If the real world writes the best stories maybe we can learn something by watching how Starbucks deals with exceptions. What do they do if you can't pay? They will toss the drink if it has already been made or otherwise pull your cup from the "queue". If they deliver you a drink that is incorrect or nonsatisfactory they will remake it. If the machine breaks down and they cannot make your drink they will refund your money. Each of these scenarios describes a different, but common error handling strategy:
* Write-off - This error handling strategy is the simplest of all: do nothing. Or discard what you have done. This might seem like a bad plan but in the reality of business this option might be acceptable. If the loss is small it might be more expensive to build an error correction solution than to just let things be. For example, I worked for a number of ISP providers who would chose this approach when there was an error in the billing / provisioning cycle. As a result, a customer might end up with active service but would not get billed. The revenue loss was small enough to allow the business to operate in this way. Periodically, they would run reconciliation reports to detect the "free" accounts and close them.
* Retry - When some operations of a larger group (i.e. "transaction") fail, we have essentially two choices: undo the ones that are already done or retry the ones that failed. Retry is a plausible option if there is a realistic chance that the retry will actually succeed. For example, if a business rule is violated it is unlikely a retry will succeed. However, if an external system is not available a retry might well be successful. A special case is a retry with Idempotent Receiver. In this case we can simply retry all operations since the successful receivers will ignore duplicate messages.
* Compensating Action - The last option is to undo operations that were already completed to put the system back into a consistent state. Such "compensating actions" work well for example if we deal with monetary systems where we can recredit money that has been debited.
All of these strategies are different than a two-phase commit that relies on separate prepare and execute steps. In the Starbucks example, a two-phase commit would equate to waiting at the cashier with the receipt and the money on the table until the drink is finished. Then, the drink would be added to the mix. Finally the money, receipt and drink would change hands in one swoop. Neither the cashier nor the customer would be able to leave until the "transaction" is completed. Using such a two-phase-commit approach would certainly kill Starbucks' business because the number of customers they can serve within a certain time interval would decrease dramatically. This is a good reminder that a two-phase-commit is can make life a lot simpler but it can also hurt the free flow of messages (and therefore the scalability) because it has to maintain stateful transaction resources across the flow of multiple, asynchronous actions.