Concepts of BASE - Basically Available Soft-state Eventually
When we recollect CAP theorem, we know that "C" in CAP stands for Consistency. The CAP theorem states that it is impossible to achieve "consistency", "availability" and "partition tolerance" simultaneously from a system. Let's understand why.
The Curse of CAP
Let's understand what CAP says:
- Consistency: Consider a system that is partitioned into multiple sub-system inside. However, when a user performs a transaction with this system, from the standpoint of an external user, each "transaction" is either fully completed or is fully rolled back. For example, when you withdraw money from an ATM, your balance gets deducted by $50. No matter whether you check your balance in ATM machine, in online Banking website, in your mobile - you get to see the same balance consistently. When making an amazon purchase - the purchase confirmation, order status update, inventory reduction etc should all appear 'in sync' regardless of the internal partitioning into sub-systems. This is what is meant by consistency.
- Availability: The system is never down when a user wants to access it and 100% of requests are completed successfully.
- Partition Tolerance: Any given request can be completed even if a subset of nodes in the system are unavailable.
To achieve P, we needs replicas. Lots of them! The more replicas we keep, the better the chances are that any piece of data we need will be available even if some nodes are offline. For absolute "P" we should replicate every single data item to every node in the system. (Obviously in real life we compromise on 2, 3, etc)
To achieve A, we need no single point of failure. That means that "primary/secondary" or "master/slave" replication configurations very common in lot of architecture scenarios will not work here since the master/primary is a single point of failure. We need to go with multiple master configurations. To achieve absolute "A", any single replica must be able to handle reads and writes independently of the other replicas. (in reality we compromise on async, queue based, quorums, etc)
To achieve C, we need a "single version of truth" in the system. Meaning that if I write to node A and then immediately read back from node B, node B should return the up-to-date value. Obviously this can't happen in a truly distributed multi-master system.
So, what is the solution? Probably to loosen up some of the constraints, and to compromise on the others. That is exactly what is done in BASE with its eventual consistency.
Do we require perfect consistency?
Sometimes, however, perfect consistency is not a requirement and eventual consistency will suffice. Consequently, many NoSQL databases are using eventual consistency to provide both availability and partition tolerance guarantees with a maximum level of data consistency. In contrast to immediate consistency, which guarantees that updates are immediately visible to all when a update operation returns to the user with a successful result, eventual consistency means that given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent.
In database terminology, this is known as “Basically Available Soft-state Eventually” (BASE) consistent as opposed to the database concept of ACID. No doubt the juxtaposition of the terms ACID and BASE was more than a mere coincidence.
Apache CouchDB, for example, uses a versioning system similar to software version control systems such as Subversion (SVN). An update to a record does not overwrite the old value, but rather creates a new version of that record. If two clients are operating on the same record and client A updates the record before client B, then client B will be notified that the version being modified is out of date and will have the option to requery the revised record and make the change there in a manner similar to an “update and merge” operation in SVN.
In order to use NoSQL databases at the present time, an understanding of the API language is required and queries must be written in that language. This is, however, greatly facilitated by the fact that Java is supported in every case. Work has also been done recently to create a unified NoSQL language called Unstructured Query Language (UNQL), which is semantically a superset of SQL Data Manipulation Language (DML). There is also an Apache incubator project called Thrift which involves an interface-definition language particularly well-suited to NoSQL use cases. Thrift is reminiscent of CORBA IDL and provides a means by which language-specific interfaces can be generated for most popular languages. Originally developed at Facebook, it has been shared as an open-source project since 2007.