I am a software architect working in service hosting area. I am interested and specialized in SaaS, Cloud computing and Parallel processing. Ricky is a DZone MVB and is not an employee of DZone and has posted 87 posts at DZone. You can read more from them at their website. View Full User Profile
"Eventual Consistency" and "BASE" are the modern architectural
approaches to achieve highly scalable web applications. Some of my
previous colleagues, Randy Shoup and Dan Pritchett has already written up some very good articles about this.
recently have some good conversation from the cloud computing Google
groups on the "eventual consistency" model. There is a general design
pattern that I want to capture here.
The definition of data integrity
integrity" is defined by the business people. It needs to hold at
specific stages of the business operation cycle. Here we use a book
seller example to illustrate. The data integrity is defined as ...
Sufficient books must be available in stock before shipment is made to fulfil an order.
The application designer, who design the application to support the business, may transform the "data integrity" as ...
Total_books_in_stock > sum_of_all_orders
app designer is confident that if the above data integrity is observed,
then the business integrity will automatically be observed. Note here
that "data integrity" may be transformed by the application designer
into a more restrictive form.
Now, the designer proceeds with
the application implementation ... There are 2 approaches to choose
from, with different implications of scalability.
The ACID approach
way of ensuring the above integrity is observed is to use the
traditional ACID approach. Here, a counter "no_of_books_in_stock" is
used to keep track of available books in the most up-to-date fashion.
When every new order enters, the transaction checks the data integrity
In this approach, "no_of_books_in_stock" becomes
a shared state that every concurrent transacton need to access.
Therefore, every transaction take turns sequentially to lock the shared
state in order to achieve serializability. A choke point is created and
the system is not very scalable.
Hence the second approach ...
The BASE approach
of checking against a "shared state" at the order taking time, the
order taking transaction may just get a general idea of how many books
is available in stock at the beginning of the day and only checks
against that. Note that this is not a shared state, and hence there is
no choke points. Of course, since every transaction assumes no other
transaction is proceeding, there is a possibility of over-booking.
the end of the day, there is a reconciliation process that took the sum
of all orders taken in the day, and check that against the number of
books in stock. Data integrity checking is done here, but in batch mode
which is much more efficient. In case the data integity is not
maintained, the reconciliation process fire up compensating actions
such as print more books, or refund some orders, in order to
reestablish the data integrity.
Generalizing the Design pattern of BASE
consistency" is based on the notion that every action is revokable by
executing a "compensating action". However, all compensating actions
different costs involved, which is the sum of this individual
compensation plus all compensating actions that it triggers.
BASE, the design goal is to reduce the probability of doing high-cost
revokation. Note that BASE is a probablistic model while ACID is a
binary, all-or-none model.
In BASE, data is organized into two kinds of "states": "Provisional state" and "Real state".
online transactions should only read/write the provisional state, which
is a local, non-shared view among these transactions. Since there is no
shared state and choke point involved, online transactions can proceed
Real state reflects the actual state of the
business (e.g. how many books are actually in stock), which always
lacks behind the provisonal state. The "real state" and "provisional
state" are brought together periodically via a batch-oriented
Data integrity checking is defer until
the reconciliation process executes. High efficiency is achieved
because of its batch nature. When the data integrity does not hold, the
reconciliation process will fire up compensating actions in order to
reestablish the data integrity. At the end of the reconciliation, the
provisional state and real state are brought in sync with each other.
minimize the cost of compensating actions, we need to confine the
deviation between the provisional state and the real state. This may
mean that we need to do the reconciliation more frequently. We should
also minimize the chaining effect of compensating actions.
Published at DZone with permission of Ricky Ho, author and DZone MVB. (source)
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)