Welcome!

SDN Journal Authors: Yeshim Deniz, Liz McMillan, Elizabeth White, Pat Romanski, TJ Randall

Blog Feed Post

Explicitly-defined failure domains in the datacenter

While the bulk of the networking industry’s focus is on CapEx and automation, the two major trends driving changes in these areas will have a potentially greater impact on that which matters most: availability. In fact, despite that datacenter purchasing decisions skew towards CapEx, the number one requirement for most datacenters is uptime.

If availability is so important, why is it not the number one purchasing criteria already?

First, it’s not that availability doesn’t matter. It’s more that when everyone is building the same thing, it ceases to be a differentiating point. Most switch vendors have converged on Broadcom silicon and a basic set of features required to run datacenter architectures that have really been unchanged for a decade or more. But are those architectures going to continue unscathed?

SDN and network architecture

For those who believe in the transformative power of SDN, the answer is unequivocally no. If, after all of the SDN work is said and done, we emerge with the same architectures with an extra smidge of automation sprinkled on top, we will have grossly under-delivered on what should be the kind of change that happens once every couple of decades.

The rise of a central controller is more than just pushing provisioning to a single pane of glass. It is about central control over a network using a global perspective to make intelligent resourcing and pathing decisions. While this does not necessarily mean that legacy networking cannot (or should not) co-exist, the model is dramatically different than what exists today.

Switching is just a stop along the way for bare metal

Bare metal switching will also change IT infrastructure in a meaningful way. Again, the scope of public discourse is fairly narrow. The story goes something like this: if someone makes a commodity switch, pricing will come down. But there are two things that are really happening here.

First, what we are really seeing is a shift in monetization from hardware to software. This shift should not be surprising, as software investment on the vendor side has dwarfed hardware R&D for 15 years now. The real change here is that the companies that emerge have a tolerance for lower margins than the behemoths already entrenched in the space. Anyone can drop price; the question is at what margins is a business still attractive. What will play out over the next couple of years is a game of chicken with price.

Second, the objective of bare metal switching is less about the switching and more about the bare metal. Taken to its logical conclusion, the hope has to be that all infrastructure is eventually run on the same set of hardware. Whether something is a server, a storage device, an appliance, or a switch should ultimately be determined by the software that is being run on it. In this case, we see multi-purpose devices whose role depends on the context in which they are deployed. This would eventually allow for the fungibility of resources across what are currently very hard silos.

Domain, Domain, Domain

Both SDN and bare metal lead to very different architectures than that which exists today. But as architects consider how they will evolve their own instantiations of these technologies, they need to be clear about a couple of facts that get glossed over.

If availability really is the number one requirement for datacenters, then architectures need to explicitly consider how they impact overall resource availability. Consider that there are a number of sources for downtime:

  1. Human error - By far the leading source of downtime in most networks, human error is why there is such momentum around things like ITIL. Put differently, when is uptime the highest for most datacenters? Holidays, when everyone is away from the office.
  2. System issues – After human error, the next biggest cause of downtime is issues in the systems themselves. These are most likely software bugs, but they can include device and link failures as well.
  3. Maintenance – Another major contributor to uptime is overall infrastructure maintenance. When individual systems need to be upgraded or replaced, there is frequently some interruption to service. Of course, if maintenance is planned, then the impact to overall downtime should be low.
  4. Other – The Other category covers things like power outages and backhoes.

Of these, SDN promises to improve the first one. By expanding the management domain, it reduces the number of opportunities for pesky humans to make mistakes. Automated workflows that are executed from a single point of control and orchestrated across disparate elements of the infrastructure should help drive the number of provisioning mistakes in the datacenter down.

Additionally, a central point of control helps improve visibility (or at least it will over time). This helps operators diagnose issues more quickly, which will lower the Mean-Time-to-Repair (MTTR) for network-related issues.

But the management domain is not the only one that matters. There are at least two others that impact downtime: failure domains and maintenance domains. The impacts on these by SDN and bare metal need to be explicitly understood.

Failure domains

While there are tremendous operational benefits of collapsing domains under a single umbrella, one thing that becomes more difficult is managing the impact of failures when they do occur.

For instance, if a network is under the control of a single SDN controller, what happens if that controller is not reachable? If the controller is an active part of the data path, there is one set of outcomes. If the controller is not an active part of the data path, there is a different set of outcomes.

The point here is not to advocate for one or the other, but rather to point out that architects need to be explicit in defining the failure domain so that they can adjust operations appropriately. For instance, it might be the case that you prefer to balance the control benefits with failure scenarios, opting to create several smaller management domains, each with a correspondingly smaller failure domain. This gives you some benefit over a completely distributed management environment (where management domains are defined by the devices themselves) without putting the entire network under the same failure domain.

The same is true with bare metal. If bare metal leads to platform convergence, it allows architects to co-host compute, storage, and networking in the same device. Whether you actually group them depends an awful lot on how you view failure domains. Again, any balance is useful so long as it is explicitly chosen. Collapsing everything to a single device creates a different failure domain, which might make sense in some environments and less so in others.

Maintenance domains

The same discussion extends to maintenance domains. Collapsing everything might create a single maintenance domain (depending on architecture, of course). Keeping things separate might enable much smaller maintenance domains. There is no right size, but whatever the architecture that is chosen, the maintenance domain needs to be an explicit requirement.

The bottom line

Architectures are changing. There is little doubt that technology advances in IT generally and networking specifically are enabling us to do things we couldn’t really even consider just a few years ago. When deciding how to do those things, though, we need to be explicitly designing for availability. What has always been a requirement but has seen less and less talk as architectures matured should be dominating discussions again. Depending on your own specific requirements, this could lead to some unexpected architectural decisions.

[Today’s fun fact: Playing in a marching band is considered moderate exercise. But lest there be any confusion, this does not make it a sport.]

The post Explicitly-defined failure domains in the datacenter appeared first on Plexxi.

Read the original blog entry...

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

@CloudExpo Stories
Andi Mann, Chief Technology Advocate at Splunk, is an accomplished digital business executive with extensive global expertise as a strategist, technologist, innovator, marketer, and communicator. For over 30 years across five continents, he has built success with Fortune 500 corporations, vendors, governments, and as a leading research analyst and consultant.
No hype cycles or predictions of zillions of things here. IoT is big. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, Associate Partner at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He discussed the evaluation of communication standards and IoT messaging protocols, data analytics considerations, edge-to-cloud tec...
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
Announcing Poland #DigitalTransformation Pavilion
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
CloudEXPO | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
DXWorldEXPO LLC announced today that All in Mobile, a mobile app development company from Poland, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. All In Mobile is a mobile app development company from Poland. Since 2014, they maintain passion for developing mobile applications for enterprises and startups worldwide.
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
The best way to leverage your CloudEXPO | DXWorldEXPO presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering CloudEXPO | DXWorldEXPO will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at CloudEXPO. Product announcements during our show provide your company with the most reach through our targeted audienc...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors!
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
HyperConvergence came to market with the objective of being simple, flexible and to help drive down operating expenses. It reduced the footprint by bundling the compute/storage/network into one box. This brought a new set of challenges as the HyperConverged vendors are very focused on their own proprietary building blocks. If you want to scale in a certain way, let's say you identified a need for more storage and want to add a device that is not sold by the HyperConverged vendor, forget about it...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
"We're focused on how to get some of the attributes that you would expect from an Amazon, Azure, Google, and doing that on-prem. We believe today that you can actually get those types of things done with certain architectures available in the market today," explained Steve Conner, VP of Sales at Cloudistics, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Sanjeev Sharma Joins November 11-13, 2018 @DevOpsSummit at @CloudEXPO New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.
JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at the Santa Clara Convention Center in Santa Clara, CA. The Japan External Trade Organization (JETRO) is a non-profit organization that provides business support services to companies expanding to Japan. With the support of JETRO's dedicated staff, clients can incorporate their business; receive visa, immigration, and HR support; find dedicated office space; identify local government subsidies; get...