Welcome!

SDN Journal Authors: Elizabeth White, Liz McMillan, John Walsh, Sven Olav Lund, Simon Hill

Related Topics: Java IoT, Microservices Expo, Microsoft Cloud, Server Monitoring, @CloudExpo, SDN Journal

Java IoT: Blog Feed Post

NoSQL and Partition Tolerance

While NoSQL implementations are aware that nodes have left, they are not aware that said nodes have formed a separate partition

I came across an article discussing NoSQL and partition tolerance.

The NoSQL Partition Tolerance Myth (link)

I may not entirely agree with the author.

But what most NoSQL systems offer is a peculiar behavior that is not partition tolerant, but partition oblivious instead.

 

No argument here. While NoSQL implementations are aware that nodes have left, they are not aware that said nodes have formed a separate partition.

In this case, we would want failure detection and carry out those transfers where the accounts are both on the same side of the partition, while denying or deferring transfers that cross the chasm.

The author is assuming that the account is only on one side of the partition. If that is that is case, is doesn’t matter whether the NoSQL implementation is eventually consistent or not. If the account is on both sides of the partition, the solution the author provides still results in an inconsistent state.

In such cases, it is almost always better to build services that degrade gracefully under partitions.

Bingo! The author implies that all NoSQL implementations sacrifice consistency in the event of a partition. That is not true. There are AP implementations (available & partition tolerant), and there are CP implementations (consistent & partition tolerant). However, an AP implementation can function as a CP implementation depending on the configuration and the application. For example, if accounts do not have multiple owners and the application can not withdraw funds from or deposit funds to an account if it can not access the account.

This post is not perfect and it is a bit outdated, but I think its helpful nonetheless.

Visual Guide to NoSQL Systems (link)

If I were to build a bank based on Dynamo, the granddaddy of all first-generation NoSQL data stores, it would silently split into two halves, like a lobotomized patient.

I would not say that Amazon Dynamo is the grandparent of all NoSQL implementations. I would say that there are two parents: Amazon Dynamo and Google BigTable. Then there are the grandparents…

A Brief History of NoSQL (link)

In this scenario, the hypothetical backend for Banko Dynamo would not only not provide any indication of failure, but allow a customer to create as many new accounts as there are partitions, one in each.

Why is the author now using account creation instead of withdrawals and deposits, and what is the relevance of creating multiple accounts? If my debit card does not work, I do not create a new account. That, and I maintain two checking accounts and one savings account with the same bank.

Let’s go back to withdrawals and deposits. If the accounts do not have multiple owners, it does not matter whether the NoSQL implementation is eventually consistent or not. If the accounts do have multiple owners, it depends on the NoSQL implementation. If it is inspired by Google BigTable (e.g. Apache HBase) or both Google BigTable and Amazon Dynamo (e.g. Apache Cassandra), it does not matter. These NoSQL implementations are CP, or can be configured to be CP. If it inspired only by Amazon Dynamo and it is eventually consistent, it may or may not matter…

Let’s assume that account withdrawals / deposits are separate from the accounts themselves and that the account is both consistent and available during a partition. The account has multiple owners but it is more or less read only.

My account has a balance of $100 (calculated from the withdrawals and deposits). Now, there are two partitions: A and B. I purchase $50 of St. Bernardus Abt 12 at Binny’s via partition A. Partition A now has withdrawal #1. I have dinner at Baume & Brix for $75 via partition B. Partition B now has withdrawal #2. My account has a balance of $50 in partition A. It has a balance of $25 in partition B. My account should have a balance of minus $25.

Does it matter? My account may not have a balance of minus $25, but it will. When the partition is repaired, the application will be able to access all of the withdrawals and deposits on my account. I may be charged an overlimit fee.

What if the NoSQL implementation sacrificed availability? My payment at Binny’s did not go through. That’s not a problem. No St. Bernardus Abt 12 for me. My payment at Baume & Brix did not go through. That’s a problem. I can’t pay for dinner. Baume & Brix can’t accept my payment nor that of any other customer paying with a debit card from the same bank as me via partition B.

What if I made a deposit of $25 at an ATM via partition A? My account will have a balance of $0 after the partition is repaired. I will not be charged an overlimit fee.

There are other scenarios. Perhaps I’m charged an insufficient funds fee and Baume & Brix does not receive payment. Perhaps Baume & Brix later resubmits the payment and receives payment.

Do you really want to sell tickets from both halves of your system? By definition, there is no way you can guarantee uniqueness of those tickets. There will be customers holding identical tickets with identical seat numbers.

Maybe, maybe not. If there is only a single owner per ticket, then yes. However, there may be availability issues. For example, partition A has tickets 1-150 and partition B has tickets 151-200. If all the tickets in partition B have been purchased, visitors may be unable to purchase tickets despite the fact that there may be tickets available in partition A. If there are multiple owners per ticket, I would prefer a NoSQL implementation that is CP. In this case, I would prefer to sacrifice availability rather than consistency.

Here is a better example. What if I report my debit card stolen? Sacrificing availability is not appropriate. What if customer service is accessing my account via the partition with no availability? My debit card must be reported stolen or the thief can continue to make purchases with it. Sacrificing consistency is not appropriate. The thief can continue to make purchases with my debit card via the partition where my account has not been reported stolen. Perhaps account information should not be stored in a distributed system.

And if they did, the first-generation NoSQL stores usually take the ultimate punt by presenting all versions of the divergent objects to the application, and let the application resolve the mess.

No argument here.

But if your data is that soft and inconsequential, why not just use memcached? It’s wicked fast, far faster than Mongo.

Perhaps because MongoDB is a document store and as such provides features that are not provided by key / value stores.

A lot of NoSQL developers pretend that being partition oblivious is a difficult thing to implement. This is false. It’s easy to make a program oblivious to a particular event; namely, you write no code to handle that event.

No argument here.

The thing that greatly helps first generation NoSQL data stores, the thing that enables them to package partition obliviousness as if it were equivalent to partition tolerance, is that they provide a very weak service guarantee in the first place. These systems cannot guarantee that, on a good day, your GET will return the latest PUT.

Sure they can.

In fact, eventual consistency means that a GET can return any previous value, including the Does Not Exist response from the very initial state of the system.

No argument here. Of course, not all NoSQL implementations are eventually consistent.


With all this being said, a NoSQL implementation may or may not be appropriate. To be more specific, a NoSQL implementation that is eventually consistent and sacrifices consistency in the event of a partition may or may not be appropriate. The behaviour is determined by the NoSQL implementation, its configuration, and the application that reads and writes to it. Whether that behaviour is appropriate or not depends on the business requirements.

Read the original blog entry...

More Stories By Daniel Thompson

I curate the content on this page, but the credit goes to my talented colleagues for the posts that you see here. Much of what you read on this page is the work of friends at How to JBoss, and I encourage you to drop by the site at http://www.howtojboss.com for some of the best JBoss technical and non-technical content for developers, architects and technology executives on the Web.

@CloudExpo Stories
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
Enterprises are moving to the cloud faster than most of us in security expected. CIOs are going from 0 to 100 in cloud adoption and leaving security teams in the dust. Once cloud is part of an enterprise stack, it’s unclear who has responsibility for the protection of applications, services, and data. When cloud breaches occur, whether active compromise or a publicly accessible database, the blame must fall on both service providers and users. In his session at 21st Cloud Expo, Ben Johnson, C...
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventio...
Data scientists must access high-performance computing resources across a wide-area network. To achieve cloud-based HPC visualization, researchers must transfer datasets and visualization results efficiently. HPC clusters now compute GPU-accelerated visualization in the cloud cluster. To efficiently display results remotely, a high-performance, low-latency protocol transfers the display from the cluster to a remote desktop. Further, tools to easily mount remote datasets and efficiently transfer...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"We're developing a software that is based on the cloud environment and we are providing those services to corporations and the general public," explained Seungmin Kim, CEO/CTO of SM Systems Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, provided a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to oper...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
In his session at 21st Cloud Expo, James Henry, Co-CEO/CTO of Calgary Scientific Inc., introduced you to the challenges, solutions and benefits of training AI systems to solve visual problems with an emphasis on improving AIs with continuous training in the field. He explored applications in several industries and discussed technologies that allow the deployment of advanced visualization solutions to the cloud.
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...