SDN Journal Authors: Pat Romanski, Yeshim Deniz, Elizabeth White, Destiny Bertucci, Liz McMillan

Related Topics: @CloudExpo, Microservices Expo, Containers Expo Blog, Cloud Security, @DXWorldExpo, SDN Journal

@CloudExpo: Article

Cloud Hardware and New Memory Controller Designs

An exclusive interview with Barbara P. Aichinger, co-founder of FuturePlus Systems and VP of New Business Development

"The Data Center operators do understand that quality does matter," noted Barbara P. Aichinger, co-founder of FuturePlus Systems and VP of New Business Development, in this exclusive interview with Cloud Expo Conference Chair Jeremy Geelan. "When they experience failures they call the supplier and the Tier 2 and 3 vendors just blame somebody else, like the DIMM vendor or the software."

Cloud Computing Journal: You seem to have some concerns about the actual cloud hardware can you explain?

Barbara P. Aichinger: Sure, my company FuturePlus Systems makes memory design validation equipment used by the engineers that design cloud hardware. These server and network equipment have technology standards that govern their design. The advantage of using standards is that you can buy one part from vendor A and another from vendor B and because they are all designed to the same standard they work together. The standards organizations that write these standards are international in nature and in most cases have a Compliance Standard associated with the technology standard. Vendors have to not only obey the standard itself but pass a test, specified by the compliance portion of the standard, that proves that their design meets the specification. This is a stamp of quality and interoperability. The problem we have today with cloud hardware is that at the very heart of all of this hardware is the JEDEC DDR Memory standard but this standard has no compliance specification per se. Thus there is no third party checking this very critical portion of the design for quality and compliance.

Cloud Computing Journal: Why is it that there is no compliance standard for DDR Memory?

Aichinger: Good Question. Last May (2013) at a JEDEC Conference (JEDEC is the international standards organization that governs the DDR Memory specification) I asked that very question. The answer was a shrug of the shoulders and a response of ‘well we all work so closely together so we did not need one'. This was probably ok 6 or 7 years ago when the server market was dominated by a few large vendors. In addition the memory controllers themselves only came from two major silicon vendors. In addition proving compliance was very difficult and only the large major players could afford the equipment to perform such an analysis. However now there are lots of vendors supplying cloud hardware and new memory controller designs by smaller vendors starting to proliferate the market. As such we see memory error rates in the data center accelerating.

Cloud Computing Journal: How big is the problem?

Aichinger: Google, having one of the largest data centers in the world, has definitely noticed the problem. They have worked with several in academia studying the problem. Two main works have resulted: DRAM Errors in the Wild: A Large-Scale Field Study and Cosmic Rays Don't Strike Twice: Understanding the Nature of DRAM Errors and the Implications for System Design

At the Open Compute Project conference in January 2013 Facebook said that DDR Memory failures were the #2 failure in the data center. The data rates are not trivial. Given the growth that we see in data centers we are seeing memory failures bring down servers hourly. This is not only a cost in down time but also in labor to replace the system or the failing DIMM.

We have also heard the phrase ‘ghost errors'. This is when the server will go down experiencing a hard memory failure. The operators run all sorts of diagnostics and they find no error, everything works fine. They boot back up and the system will continue to run for perhaps several weeks before it experiences another error. Because they can never find the failure as they seem to disappear they call them ‘ghost errors'.

Cloud Computing Journal: How are Data Centers responding?

Aichinger: They are doing a lot of head scratching. They have cost pressures and quality concerns. From what we have been told there is a push to commoditize the server market. That is to have no distinction between the Tier 1 and the lower Tier 2 or Tier 3 vendors. The Data Center operators do understand that quality does matter. When they experience failures they call the supplier and the Tier 2 and 3 vendors just blame somebody else, like the DIMM vendor or the software. We have seen all sorts of finger pointing. Even the DIMM connector vendors get blamed even though there is really no proof behind the claim. The Tier 1 vendors will often try to study the problem. They will bring the machine back to their facility and try to recreate the problem. One of our Tier one customers told us that only 30% of the time can they recreate the failure.

Cloud Computing Journal: What is the answer here? Can we have low cost and high quality?

Aichinger: I think we can. The first step would be for the customers to demand qualification of the memory subsystem. This is what we at FuturePlus are trying to do. We are trying to alert the end user to the problem. The suppliers of this hardware are more than likely going to take the easy way out and not validate their designs. Oftentimes you have system integrators who have no idea where the motherboard or the memory came from and can't even tell you what speed the memory is operating at. The companies that run these data centers are going to have to come up to speed on basic computer architecture so they don't get the wool pulled over their eyes when buying this hardware.

More Stories By Liz McMillan

News Desk compiles and publishes breaking news stories, press releases and latest news articles as they happen.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@CloudExpo Stories
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term.
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
With privacy often voiced as the primary concern when using cloud based services, SyncriBox was designed to ensure that the software remains completely under the customer's control. Having both the source and destination files remain under the user?s control, there are no privacy or security issues. Since files are synchronized using Syncrify Server, no third party ever sees these files.
Mobile device usage has increased exponentially during the past several years, as consumers rely on handhelds for everything from news and weather to banking and purchases. What can we expect in the next few years? The way in which we interact with our devices will fundamentally change, as businesses leverage Artificial Intelligence. We already see this taking shape as businesses leverage AI for cost savings and customer responsiveness. This trend will continue, as AI is used for more sophistica...
"We are an integrator of carrier ethernet and bandwidth to get people to connect to the cloud, to the SaaS providers, and the IaaS providers all on ethernet," explained Paul Mako, CEO & CTO of Massive Networks, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
I believe that this may finally be the year that the CIO role ‘crosses the Rubicon,' leaving behind its traditional, IT-focused orientation. But I don't believe that either of the previous predictions of this outcome — fading into oblivion or rising to a business executive level — is correct. Instead, I think this is the year that we will see the role of the CIO transformed into something altogether different.
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
"Calligo is a cloud service provider with data privacy at the heart of what we do. We are a typical Infrastructure as a Service cloud provider but it's been designed around data privacy," explained Julian Box, CEO and co-founder of Calligo, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"NetApp is known as a data management leader but we do a lot more than just data management on-prem with the data centers of our customers. We're also big in the hybrid cloud," explained Wes Talbert, Principal Architect at NetApp, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Andi Mann, Chief Technology Advocate at Splunk, is an accomplished digital business executive with extensive global expertise as a strategist, technologist, innovator, marketer, and communicator. For over 30 years across five continents, he has built success with Fortune 500 corporations, vendors, governments, and as a leading research analyst and consultant.
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
CI/CD is conceptually straightforward, yet often technically intricate to implement since it requires time and opportunities to develop intimate understanding on not only DevOps processes and operations, but likely product integrations with multiple platforms. This session intends to bridge the gap by offering an intense learning experience while witnessing the processes and operations to build from zero to a simple, yet functional CI/CD pipeline integrated with Jenkins, Github, Docker and Azure...
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Sanjeev Sharma Joins November 11-13, 2018 @DevOpsSummit at @CloudEXPO New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.