Welcome!

SDN Journal Authors: Liz McMillan, Yeshim Deniz, Elizabeth White, Pat Romanski, TJ Randall

Related Topics: @CloudExpo, Microservices Expo, Containers Expo Blog, Cloud Security, @DXWorldExpo, SDN Journal

@CloudExpo: Blog Post

The Paradox of Ephemeral Cloud Storage | @CloudExpo [#Cloud]

The moral of the story here is simple: if you put anything beyond your base OS on ephemeral storage, you are at great risk

The very name is kind of ridiculous, don't you think? The word "ephemeral" means it can go away. It's temporary. Fleeting, even. So why would I want to depend on storing something in a medium that can disappear without warning? And why am I forced to buy more of it when all I want is more CPUs or RAM?

Welcome to the paradox of ephemeral storage from cloud computing providers.

Origins and Explanations
Ephemeral storage exists only because of how first-generation cloud providers chunk up servers. The business model is simple: they buy a physical server and try to sell as many virtual machines (VMs) as possible on top of that physical server. Since the VMs are trapped on physical machines in this approach, first-generation providers dictate cookie-cutter sizes that make that stacking game easier for themselves.

In the process, though, these providers can't do anything to improve the redundancy of the disk on the physical servers, and are thus unable to offer guarantees on its availability. Instead they tell you not to trust it. It can evaporate. "Code around it instead" is what we are told.

If I can't trust it, how come I'm forced to buy more of it when I want bigger VM dimensions in other places, seeing as I probably only need 10GB for my operating system anyway? Consider the sizing chart below from PlanForCloud:

Take a look at that largest size. Who wants a 1.6 TB cloud storage liability?

Google Compute Engine and ProfitBricks Bring Sanity
One of the great features of Google Compute Engine is its approach to ephemeral storage. Google refers to this as Scratch Storage and in many cases limits each machine to 10 GB of it. That's just enough to build a base operating system upon, and that's obviously on purpose. Kudos to them.

ProfitBricks takes this a step further by not offering ephemeral storage at all. Instead, the physical servers housing the CPU cores and the RAM are on a separate pool of resources from the disk array that provides the block storage. Good IOPS is maintained by connecting the two with an 80 Gbps InfiniBand network. In the ProfitBricks model, all storage is akin to highly-available redundant block storage.

What You Really Want Is Block Storage
One of the things that public cloud noobs have a hard time getting their heads around at first is the difference between ephemeral storage and block storage. The latter, which every IaaS vendor offers, has some level of redundancy built into it and is where data should really be stored. Below are examples of how several vendors approach that redundancy, with better resulting availability:

Vendor

Block Volume Redundancy

Max Volume Size

AWS

"multiple servers in an Availability Zone"

1 TB

Azure

Offer both locally redundant and geographically redundant

1 TB

GCE

"replicated for additional redundancy"

10 TB

ProfitBricks

Double redundant RAID 10 across two Availability Zones

16 TB

Lessons Learned
The moral of the story here is simple: if you put anything beyond your base OS on ephemeral storage, you are at great risk. That data could go away at any time. You can't depend on it, so don't use it unless you add in an additional form of redundancy at your own engineering expense. Data you care about belongs on block storage: it has built-in redundancy and improved availability, which ensure that the data you care about will be there when you need it.

More Stories By Pete Johnson

Pete Johnson is senior director of product marketing at CliQr Technologies, where he focuses on the support of applications running on OpenStack based clouds. He is interested in the long-term management of applications in public and private clouds, and avoiding vendor lock-in. Prior to joining CliQr, Pete was senior director of platform evangelism at ProfitBricks after spending 19 years with HP as a heads-down developer, technical lead and chief architect.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
cabronito 07/17/13 04:21:00 PM EDT

If you used a cloud like backupthat, you wouldn't need to worry about it being ephemeral. All of your files would be backed up into your email.

CloudEXPO Stories
DXWorldEXPO LLC announced today that Kevin Jackson joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Kevin L. Jackson is a globally recognized cloud computing expert and Founder/Author of the award winning "Cloud Musings" blog. Mr. Jackson has also been recognized as a "Top 100 Cybersecurity Influencer and Brand" by Onalytica (2015), a Huffington Post "Top 100 Cloud Computing Experts on Twitter" (2013) and a "Top 50 Cloud Computing Blogger for IT Integrators" by CRN (2015). Mr. Jackson's professional career includes service in the US Navy Space Systems Command, Vice President J.P. Morgan Chase, Worldwide Sales Executive for IBM and NJVC Vice President, Cloud Services. He is currently part of a team responsible for onboarding mission applications to the US Intelligence Community cloud computing environment (IC ...
When applications are hosted on servers, they produce immense quantities of logging data. Quality engineers should verify that apps are producing log data that is existent, correct, consumable, and complete. Otherwise, apps in production are not easily monitored, have issues that are difficult to detect, and cannot be corrected quickly. Tom Chavez presents the four steps that quality engineers should include in every test plan for apps that produce log output or other machine data. Learn the steps so your team's apps not only function but also can be monitored and understood from their machine data when running in production.
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
When building large, cloud-based applications that operate at a high scale, it's important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. "Fly two mistakes high" is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed how this same philosophy can be applied to highly scaled applications, and can dramatically increase your resilience to failure.
With more than 30 Kubernetes solutions in the marketplace, it's tempting to think Kubernetes and the vendor ecosystem has solved the problem of operationalizing containers at scale or of automatically managing the elasticity of the underlying infrastructure that these solutions need to be truly scalable. Far from it. There are at least six major pain points that companies experience when they try to deploy and run Kubernetes in their complex environments. In this presentation, the speaker will detail these pain points and explain how cloud can address them.