Welcome!

SDN Journal Authors: Pat Romanski, Liz McMillan, Doug Bonderud, Elizabeth White, Jeremy Geelan

Related Topics: Java, SOA & WOA, .NET, Red Hat, Cloud Expo, SDN Journal

Java: Blog Feed Post

NoSQL and Partition Tolerance

While NoSQL implementations are aware that nodes have left, they are not aware that said nodes have formed a separate partition

I came across an article discussing NoSQL and partition tolerance.

The NoSQL Partition Tolerance Myth (link)

I may not entirely agree with the author.

But what most NoSQL systems offer is a peculiar behavior that is not partition tolerant, but partition oblivious instead.

 

No argument here. While NoSQL implementations are aware that nodes have left, they are not aware that said nodes have formed a separate partition.

In this case, we would want failure detection and carry out those transfers where the accounts are both on the same side of the partition, while denying or deferring transfers that cross the chasm.

The author is assuming that the account is only on one side of the partition. If that is that is case, is doesn’t matter whether the NoSQL implementation is eventually consistent or not. If the account is on both sides of the partition, the solution the author provides still results in an inconsistent state.

In such cases, it is almost always better to build services that degrade gracefully under partitions.

Bingo! The author implies that all NoSQL implementations sacrifice consistency in the event of a partition. That is not true. There are AP implementations (available & partition tolerant), and there are CP implementations (consistent & partition tolerant). However, an AP implementation can function as a CP implementation depending on the configuration and the application. For example, if accounts do not have multiple owners and the application can not withdraw funds from or deposit funds to an account if it can not access the account.

This post is not perfect and it is a bit outdated, but I think its helpful nonetheless.

Visual Guide to NoSQL Systems (link)

If I were to build a bank based on Dynamo, the granddaddy of all first-generation NoSQL data stores, it would silently split into two halves, like a lobotomized patient.

I would not say that Amazon Dynamo is the grandparent of all NoSQL implementations. I would say that there are two parents: Amazon Dynamo and Google BigTable. Then there are the grandparents…

A Brief History of NoSQL (link)

In this scenario, the hypothetical backend for Banko Dynamo would not only not provide any indication of failure, but allow a customer to create as many new accounts as there are partitions, one in each.

Why is the author now using account creation instead of withdrawals and deposits, and what is the relevance of creating multiple accounts? If my debit card does not work, I do not create a new account. That, and I maintain two checking accounts and one savings account with the same bank.

Let’s go back to withdrawals and deposits. If the accounts do not have multiple owners, it does not matter whether the NoSQL implementation is eventually consistent or not. If the accounts do have multiple owners, it depends on the NoSQL implementation. If it is inspired by Google BigTable (e.g. Apache HBase) or both Google BigTable and Amazon Dynamo (e.g. Apache Cassandra), it does not matter. These NoSQL implementations are CP, or can be configured to be CP. If it inspired only by Amazon Dynamo and it is eventually consistent, it may or may not matter…

Let’s assume that account withdrawals / deposits are separate from the accounts themselves and that the account is both consistent and available during a partition. The account has multiple owners but it is more or less read only.

My account has a balance of $100 (calculated from the withdrawals and deposits). Now, there are two partitions: A and B. I purchase $50 of St. Bernardus Abt 12 at Binny’s via partition A. Partition A now has withdrawal #1. I have dinner at Baume & Brix for $75 via partition B. Partition B now has withdrawal #2. My account has a balance of $50 in partition A. It has a balance of $25 in partition B. My account should have a balance of minus $25.

Does it matter? My account may not have a balance of minus $25, but it will. When the partition is repaired, the application will be able to access all of the withdrawals and deposits on my account. I may be charged an overlimit fee.

What if the NoSQL implementation sacrificed availability? My payment at Binny’s did not go through. That’s not a problem. No St. Bernardus Abt 12 for me. My payment at Baume & Brix did not go through. That’s a problem. I can’t pay for dinner. Baume & Brix can’t accept my payment nor that of any other customer paying with a debit card from the same bank as me via partition B.

What if I made a deposit of $25 at an ATM via partition A? My account will have a balance of $0 after the partition is repaired. I will not be charged an overlimit fee.

There are other scenarios. Perhaps I’m charged an insufficient funds fee and Baume & Brix does not receive payment. Perhaps Baume & Brix later resubmits the payment and receives payment.

Do you really want to sell tickets from both halves of your system? By definition, there is no way you can guarantee uniqueness of those tickets. There will be customers holding identical tickets with identical seat numbers.

Maybe, maybe not. If there is only a single owner per ticket, then yes. However, there may be availability issues. For example, partition A has tickets 1-150 and partition B has tickets 151-200. If all the tickets in partition B have been purchased, visitors may be unable to purchase tickets despite the fact that there may be tickets available in partition A. If there are multiple owners per ticket, I would prefer a NoSQL implementation that is CP. In this case, I would prefer to sacrifice availability rather than consistency.

Here is a better example. What if I report my debit card stolen? Sacrificing availability is not appropriate. What if customer service is accessing my account via the partition with no availability? My debit card must be reported stolen or the thief can continue to make purchases with it. Sacrificing consistency is not appropriate. The thief can continue to make purchases with my debit card via the partition where my account has not been reported stolen. Perhaps account information should not be stored in a distributed system.

And if they did, the first-generation NoSQL stores usually take the ultimate punt by presenting all versions of the divergent objects to the application, and let the application resolve the mess.

No argument here.

But if your data is that soft and inconsequential, why not just use memcached? It’s wicked fast, far faster than Mongo.

Perhaps because MongoDB is a document store and as such provides features that are not provided by key / value stores.

A lot of NoSQL developers pretend that being partition oblivious is a difficult thing to implement. This is false. It’s easy to make a program oblivious to a particular event; namely, you write no code to handle that event.

No argument here.

The thing that greatly helps first generation NoSQL data stores, the thing that enables them to package partition obliviousness as if it were equivalent to partition tolerance, is that they provide a very weak service guarantee in the first place. These systems cannot guarantee that, on a good day, your GET will return the latest PUT.

Sure they can.

In fact, eventual consistency means that a GET can return any previous value, including the Does Not Exist response from the very initial state of the system.

No argument here. Of course, not all NoSQL implementations are eventually consistent.


With all this being said, a NoSQL implementation may or may not be appropriate. To be more specific, a NoSQL implementation that is eventually consistent and sacrifices consistency in the event of a partition may or may not be appropriate. The behaviour is determined by the NoSQL implementation, its configuration, and the application that reads and writes to it. Whether that behaviour is appropriate or not depends on the business requirements.

Read the original blog entry...

More Stories By Daniel Thompson

I curate the content on this page, but the credit goes to my talented colleagues for the posts that you see here. Much of what you read on this page is the work of friends at How to JBoss, and I encourage you to drop by the site at http://www.howtojboss.com for some of the best JBoss technical and non-technical content for developers, architects and technology executives on the Web.

Cloud Expo Breaking News
SYS-CON Events announced today that nfina Technologies, a provider of highly reliable cloud server products, will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York. nfina Technologies develops, manufactures, and markets highly reliable cloud server products, designed to solve the most demanding data center requirements in mission-critical cloud applications. Nfina’s staff has decades of experience in co...
SYS-CON Events announced today that OpenStack will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York. OpenStack software controls large pools of compute, storage, and networking resources throughout a datacenter, all managed by a dashboard that gives administrators control while empowering their users to provision resources through a web interface. OpenStack powers some of the most widely-used SaaS app...
“Cloud has everything to do with what has happened with Big Data,” explained Jason Deck, Director of Strategic Alliances at Logicworks, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “Big Data doesn’t exist in its easily accessible way without cloud. From reduced startup costs, to cheap storage, to fast processing, to adequate security, to the easy incorporation of third-party analytics tools, cloud made Big Data accessible to customers of all sizes, with all different bud...
“Social, mobile, analytics and cloud can’t be looked at as distinct technology trends; they are facets of the same movement and an everyday reality for consumers and businesses alike,” said Craig Sowell, IBM VP of SmartCloud Marketing, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “This means that businesses need to start looking at trends as one: cloud is the delivery, analytics is the unique insight, social is a shareable service, and mobile is the ubiquitous access.” ...
With Cloud Expo New York | 12th Cloud Expo [June 10-13, 2013] hurtling towards us, let's take a look at the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference coming up June 10-13 at the Jacob Javits Center in New York City. We have technical and strategy sessions for you all four days dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, wha...
The new open source cloud orchestration platform called OpenStack is the promise of flexible network virtualization, and network overlays are looking closer than ever. The vision of this platform is to enable the on-demand creation of many distinct networks on top of one underlying physical infrastructure in the cloud environment. The platform will support automated provisioning and management of large groups of virtual machines or compute resources, including extensive monitoring in the cloud.
In his session at the 12th International Cloud Expo, Dave Eichorn, Global Data Center Practice Head at Zensar, will share a case study describing how a utility services company handled the migration of its Microsoft platform to the cloud. Challenged with the time-consuming task of opening operations out of temporary offices, this company struggled with the need to simultaneously access data that was accumulated from a vast amount of data-intensive jobs. Zensar migrated the company’s application ...
At pennies per virtual machine-hour, the economics of cloud computing are both compelling and daunting to replicate. Whether you are building your own cloud infrastructure, building a public cloud or choosing a cloud service, there are key strategy and technology decisions that make the difference between success and failure. In his General Session at the 12th International Cloud Expo, Jason Waxman, VP in the Intel Architecture Group and general manager of the Cloud Platforms Group within Inte...
You're getting pitched every day from your legacy enterprise software and hardware vendors about "cloud." They're doing an amazing job of convincing your CIO and CTO about what cloud is and how you should use it. The reality is they're defending their shrinking market share and keeping you on the legacy treadmill for as long as they can by selling you solutions that aren't "cloud." In her session at the 12th International Cloud Expo, Niki Acosta, Cloud Evangelista for Rackspace, will talk thro...
The rise of cloud computing has exposed hard drive-based storage as the new data center bottleneck. Combating this, data center managers have deployed SSDs to gain the performance needed to provide real-time access to data. However, due to budget constraints, many have turned to consumer-grade SSDs without understanding that they wear out quickly when processing enterprise workloads. In this session, Esther Spanjer will discuss recent endurance advancements in SSD technology that enable usage of...