Welcome!

SDN Journal Authors: Elizabeth White, Yeshim Deniz, Liz McMillan, Pat Romanski, TJ Randall

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Agile Computing, @DXWorldExpo, SDN Journal

@CloudExpo: Article

Can We Finally Find the Database Holy Grail? | Part 2

A truly distributed DBMS with ACID transactional guarantees could address the key pain points in modern database management

In Part One of this three part series I talked about distributed transactional databases being the Holy Grail of database systems. Among other things the promise of such systems is to provide on-demand capacity, continuous availability and geographically distributed operation. But the historical approaches to building distributed transactional databases have involved unacceptable trade-offs, and as a result general purpose databases systems predicated on a single-server architecture have dominated the industry for decades.

In our quest for the Holy Grail of databases we acknowledge that there are folks who have given up on the highly desirable characteristic of transactional consistency in favor of a distributed operation. That is a trade-off that may be attractive if you can't find a way to scale-out transactions, but it is a drastic choice that moves a lot of complexity and cost up the application stack. If there is a way to scale out transactional databases then that is clearly a much better outcome. Our Holy Grail discussion is specifically about distributed transactional databases because if such a thing can be built without a substantial downside then no one would want any other distributed data store.

Is there a way to build one of these things? I am excited by the potential of a new design that we call a Durable Distributed Cache, but I'd like to first lay out the three categories of designs that have been proposed historically.

Shared-Disk Databases
The idea of Shared-Disk designs is fairly self-explanatory. What you do is to have several machines in a cluster all loading and storing pages of data from the same shared disk subsystem. A good example of a Shared-Disk system is Oracle Real Application Clusters. The database system tracks updated pages, writing them to disk, tracks "stale" pages, invalidating them, and implements some kind of a network lock manager that coordinates concurrency conflicts on a distributed basis. Per our example above you need to be very aware of the nature of your workload.

Workloads that require minimal coordination can scale fairly well on Shared-Disk systems. In the extreme case consider an all-read workload. This is likely to be I/O-bound before it is limited by inter-machine coordination delays. The I/O-bound issue can become a problem, because you don't increase I/O when you add more machines. If your database performance is limited by disk I/O and not by computational bandwidth then there is no point adding more machines.

But coordination is often required between machines, and in these cases Shared-Disk systems are often limited by lock-based coordination protocols. The core database management system (DBMS) design is typically built around the idea of efficiently managing tightly coupled memory and disk pages. Consequently, the introduction of synchronous distributed locking to guarantee page ownership introduces both latency problems and thrashing problems that can easily bring these systems to their knees. Careful allocation of transactions to machines to maximize affinity can ameliorate these issues to some extent, but this increases the fragility of the systems and adds substantially to long-term DBA expenses.

Shared-Disk systems are at their best for workloads that are highly read dominated or in which the data can effectively be partitioned by machine.

Shared-Nothing Databases
These are both arguments in favor of Shared-Nothing designs. Shared-Nothing is really a cute way of saying "partitioning." In other words you essentially run multiple databases, carefully arranging that part of the dataset that resides in each partition. To take a trivial example of a database containing all US citizens, you might put everyone with last names starting with A-L in one partition and everyone else in the other partition. Or you could put all male citizens in one and all female citizens in the other. For each incoming transaction you detect which data it needs to access and then send it to the relevant partition.

You might ask why, if you are essentially just running multiple databases with the data split between them, then why it is a DBMS issue at all? Surely the application can just do that itself, using any traditional client/server database system? The answer is that of course you can do that, and it is called "Sharding." Big Internet applications, like Facebook, do this, and in some cases they run many thousands of shards. Shared-Nothing distributed databases try to do this transparently for you, in particular striving to help you with the really hard problems of how to optimally partition the data, how to perform transactions efficiently when they touch data in multiple partitions, how to deliver transactional semantics across partitions, how to rebalance the partitions as the database grows, and so on.

Once again, partitioned stores can be very effective for certain kinds of workloads but you have to be very careful to arrange your queries and data layout to make sure it is optimized. For some cases there is no good way to partition the data. In all cases there is a limit to how many partitions you can profitably create, which means there is a limit to the degree that the system can scale out. If you are running a workload that has any appreciable proportion of cross-partition transactions, you will most likely need specialized low-latency interconnects between your servers. And the partitioning model is not very dynamic, so it does not address the desire for on-demand capacity management.

One last comment on partitioned stores: They are not all disk-based. In-memory systems typically also use partitioning in order to exploit multiple database servers in parallel.

Synchronous Commit Databases
There is a third traditional way of building distributed databases. You could call it the obvious way, the brute force way or, more technically, Synchronous Commit (Google calls it "Synchronous Replication").

In a transactional DBMS an application makes changes to the data before committing or rolling back the transaction. If the transaction is successfully committed then the state of the database has been (atomically) updated with the full set of committed changes. If the commit fails then the canonical state of the database is unchanged. And if the application performs a rollback then the uncommitted changes are discarded. The Synchronous Commit approach has been adopted in various forms over the years, not least in the form of Two Phase Commit (2PC) protocols.

The idea of Synchronous Commit is simply that when an application makes a request for the DBMS to commit a transaction, the DBMS performs the Commit in multiple locations before returning success. The best modern example of Synchronous Commit is Google F1. F1 is the database system that Google uses to support Google AdWords, and which will support many more Google applications in due course. Self-evidently it works for Google, and for a very demanding application. Google specifically notes that they could not address the AdWords challenge using NoSQL technologies or MySQL approaches.

Although Google have made it work, the disadvantage of Synchronous Commit is stated here in their own whitepaper: The Google F1 White Paper: Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design". Also F1 relies on state-of-the-art Google networking to reduce inter-node latency, and the existence of atomic clocks on every participating machine in order to support a common notion of true time. The obvious challenge with Synchronous Commit is latency (in other words delays). There are some applications for which throughput is more important than transaction latency but for most modern applications the reverse is true.

A less obvious challenge relating to Synchronous Commit is managing error and failure conditions. Designing distributed synchronous commit protocols is relatively straightforward if you can assume 100% reliability. It is recovery detection and recovery procedures that are the hard part of the problem, and the designer is generally left with a trade-off between maintaining transactional guarantees for a large number of recovery scenarios, avoiding significant performance degradation, and limiting implementation complexity. In consequence of the latency issues and failure management challenges the Synchronous Commit approach has had limited success historically.

Summary
It is clear that a truly distributed DBMS with ACID transactional guarantees could address the key pain points in modern database management, notably on-demand capacity, continuous availability and geo-distributed operation. The three traditional designs unfortunately deliver much less than those promised advantages and involve costs, complexities and/or functional limitations that limit their usefulness and general applicability.

In my next post I aim to introduce a new design approach, one that we call Durable Distributed Cache. By stepping back and rethinking database design from the ground up Jim Starkey has come up with an innovative solution that makes very different trade-offs. The net effect is a system that scales-out/in dynamically on commodity machines and virtual machines, has no single point of failure, and delivers full ACID transactional semantics.

More Stories By Barry Morris

Barry Morris is CEO & Co-Founder of NuoDB, Inc. An accomplished software CEO with over 25 years of industry experience in the USA and Europe, running private and public companies ranging in scale from early startup phase to 1,000+ employees, he loves to build companies around industry-changing paradigm-shifts in technology. Morris was previously CEO of StreamBase and Iona Technologies.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
We all know that end users experience the internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices - not doing so will be a path to eventual ...
"DivvyCloud as a company set out to help customers automate solutions to the most common cloud problems," noted Jeremy Snyder, VP of Business Development at DivvyCloud, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
More and more brands have jumped on the IoT bandwagon. We have an excess of wearables – activity trackers, smartwatches, smart glasses and sneakers, and more that track seemingly endless datapoints. However, most consumers have no idea what “IoT” means. Creating more wearables that track data shouldn't be the aim of brands; delivering meaningful, tangible relevance to their users should be. We're in a period in which the IoT pendulum is still swinging. Initially, it swung toward "smart for smart...
DXWorldEXPO LLC announced today that ICC-USA, a computer systems integrator and server manufacturing company focused on developing products and product appliances, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of ...
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual b...
Major trends and emerging technologies – from virtual reality and IoT, to Big Data and algorithms – are helping organizations innovate in the digital era. However, to create real business value, IT must think beyond the ‘what’ of digital transformation to the ‘how’ to harness emerging trends, innovation and disruption. Architecture is the key that underpins and ties all these efforts together. In the digital age, it’s important to invest in architecture, extend the enterprise footprint to the cl...
Sanjeev Sharma Joins November 11-13, 2018 @DevOpsSummit at @CloudEXPO New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
In an era of historic innovation fueled by unprecedented access to data and technology, the low cost and risk of entering new markets has leveled the playing field for business. Today, any ambitious innovator can easily introduce a new application or product that can reinvent business models and transform the client experience. In their Day 2 Keynote at 19th Cloud Expo, Mercer Rowe, IBM Vice President of Strategic Alliances, and Raejeanne Skillern, Intel Vice President of Data Center Group and ...
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
Founded in 2000, Chetu Inc. is a global provider of customized software development solutions and IT staff augmentation services for software technology providers. By providing clients with unparalleled niche technology expertise and industry experience, Chetu has become the premiere long-term, back-end software development partner for start-ups, SMBs, and Fortune 500 companies. Chetu is headquartered in Plantation, Florida, with thirteen offices throughout the U.S. and abroad.
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
DXWorldEXPO LLC announced today that Dez Blanchfield joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Dez is a strategic leader in business and digital transformation with 25 years of experience in the IT and telecommunications industries developing strategies and implementing business initiatives. He has a breadth of expertise spanning technologies such as cloud computing, big data and analytics, cognitive computing, m...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
HyperConvergence came to market with the objective of being simple, flexible and to help drive down operating expenses. It reduced the footprint by bundling the compute/storage/network into one box. This brought a new set of challenges as the HyperConverged vendors are very focused on their own proprietary building blocks. If you want to scale in a certain way, let's say you identified a need for more storage and want to add a device that is not sold by the HyperConverged vendor, forget about it...
Historically, some banking activities such as trading have been relying heavily on analytics and cutting edge algorithmic tools. The coming of age of powerful data analytics solutions combined with the development of intelligent algorithms have created new opportunities for financial institutions. In his session at 20th Cloud Expo, Sebastien Meunier, Head of Digital for North America at Chappuis Halder & Co., discussed how these tools can be leveraged to develop a lasting competitive advantage ...
In his session at 21st Cloud Expo, James Henry, Co-CEO/CTO of Calgary Scientific Inc., introduced you to the challenges, solutions and benefits of training AI systems to solve visual problems with an emphasis on improving AIs with continuous training in the field. He explored applications in several industries and discussed technologies that allow the deployment of advanced visualization solutions to the cloud.
As you move to the cloud, your network should be efficient, secure, and easy to manage. An enterprise adopting a hybrid or public cloud needs systems and tools that provide: Agility: ability to deliver applications and services faster, even in complex hybrid environments Easier manageability: enable reliable connectivity with complete oversight as the data center network evolves Greater efficiency: eliminate wasted effort while reducing errors and optimize asset utilization Security: implemen...