Welcome!

SDN Journal Authors: Stefan Bernbo, Michel Courtoy, Amitabh Sinha, Mike Wood, Liz McMillan

Related Topics: SDN Journal, Java IoT, Microservices Expo, Linux Containers, Containers Expo Blog, Cloud Security

SDN Journal: Article

Resiliency in Controller-Based Network Architectures

At the core of SDN solutions is the concept of a controller

Last week Ivan Pepelnjak wrote an article about the failure domains of controller based network architectures. At the core of SDN solutions is the concept of a controller, which in most cases lives outside the network devices themselves. A controller as a central entity controlling the network (hence its name) provides very significant values and capabilities to the network. We have talked about these in this blog many times.

Centralized Control

When introducing a centralized entity into any inherently distributed system, the architecture of such a system needs to carefully consider failure domains and scenarios. Networks have been distributed entities, with each device more or less independent and a huge suite of protocols defined to manage the distributed state between all of them. When you think about it, it’s actually quite impressive to think about the extend of distribution we have created in networks. We have created an extremely large distributed system with local decision making and control. I am not sure there are too many other examples of complex distributed systems that truly run without some form of central authority.

It is exactly that last point that we networking folks tend to forget or ignore. Many control systems in the world have central control and management. And the vast majority of them work pretty well. Any complex manufacturing facility has centralized control over robots, belts and all other machinery that it may use. There usually is some distributed state and health checks at interfaces between machines and operations, but the entire end to end process is controller by a centralized entity.

The reason for this is not much different from the reason we are starting to deploy controllers in networks. Having a true end to end view of all available resources will provide better overall performance of and control over the network. A centralized entity can make choices and decisions that are related or dependent of previous choices based on information that may well be outside the reach of a typical system in distributed operation.

Architectural Choices

But the introduction of such an entity needs to be carefully architected and designed. The exact role of a controller in the day to day (or microsecond to microsecond) operation of a network becomes a critical choice, it defines the dependency of the network on the controller and as a result, the impact of a controller failure. At Plexxi we made a very deliberate architectural choice for our controller:

  • it cannot ever be in the data path of network traffic. Not for new flows, not for existing flows. Not for link failures. Not for switch failures.

The network has to run when the controller is not available. It has to run for existing attached devices, newly attached devices, existing flows and new flows. Of course we want the controller to be available all the time because it gives us the best visibility, but we very deliberately architected it so that the network keeps working if it isn’t.

To that purpose we split our controller into two separate components. The most visible (and perhaps even traditional in this new world of controller architecture) is our central controller. It’s software, runs on a VM or bare metal server and is the central coordinator. It maintains the database with all relevant data. It communicates with the switches. And the operator communicates with it through a GUI, or our APIs or Data Services Engine.

Then there is a distributed portion of the controller. It run on every Plexxi Switch. It communicates with the central controller and takes higher level configuration, policy and topology instructions, then passes them to the Switch software that turns this into configuration for the hardware etc. Similarly, things like statistics and state info from the Switch software is passed to the distributed portion of the controller, then passed back to the central controller.

Network Independence

But most importantly, Plexxi switches are fully capable of making forwarding decisions by themselves. They learn MAC addresses. They resolve ARP. They have L2 forwarding tables. They have L3 forwarding tables. And these tables themselves are not managed by the central controller. They are managed by each switch. What the central controller provides is topology information on how to reach other switches in a Plexxi domain. Out of the many paths through the fabric, which ones should be used and for what percentage of traffic. And hundreds of backup paths through that fabric if a link of switch fails. And those failures are communicated between the switches themselves, without involving the controller (who gets informed, but is not in the action path).

Having this very clear line in the sand of what the switches are responsible for and what the controller is responsible for allows us to worry (just a little) less about the 100% resiliency of the controller. Don’t get me wrong, we want the controller there, but your network will operate as you expect if its not. In his article, Ivan calls it “controller enhanced network infrastructure”. That works.

[Today's Fun Fact: All polar bears are left-handed. Or left-clawed. I would assume that means they tend to be more creative than other bears too.]

The post Resiliency in Controller based Network Architectures appeared first on Plexxi.

More Stories By Marten Terpstra

Marten Terpstra is a Product Management Director at Plexxi Inc. Marten has extensive knowledge of the architecture, design, deployment and management of enterprise and carrier networks.

@CloudExpo Stories
SYS-CON Events announced today that Datanami has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datanami is a communication channel dedicated to providing insight, analysis and up-to-the-minute information about emerging trends and solutions in Big Data. The publication sheds light on all cutting-edge technologies including networking, storage and applications, and the...
SYS-CON Events announced today that EnterpriseTech has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. EnterpriseTech is a professional resource for news and intelligence covering the migration of high-end technologies into the enterprise and business-IT industry, with a special focus on high-tech solutions in new product development, workload management, increased effi...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
Cloud promises the agility required by today’s digital businesses. As organizations adopt cloud based infrastructures and services, their IT resources become increasingly dynamic and hybrid in nature. Managing these require modern IT operations and tools. In his session at 20th Cloud Expo, Raj Sundaram, Senior Principal Product Manager at CA Technologies, will discuss how to modernize your IT operations in order to proactively manage your hybrid cloud and IT environments. He will be sharing bes...
A look across the tech landscape at the disruptive technologies that are increasing in prominence and speculate as to which will be most impactful for communications – namely, AI and Cloud Computing. In his session at 20th Cloud Expo, Curtis Peterson, VP of Operations at RingCentral, highlighted the current challenges of these transformative technologies and shared strategies for preparing your organization for these changes. This “view from the top” outlined the latest trends and developments i...
Automation is enabling enterprises to design, deploy, and manage more complex, hybrid cloud environments. Yet the people who manage these environments must be trained in and understanding these environments better than ever before. A new era of analytics and cognitive computing is adding intelligence, but also more complexity, to these cloud environments. How smart is your cloud? How smart should it be? In this power panel at 20th Cloud Expo, moderated by Conference Chair Roger Strukhoff, paneli...
Hardware virtualization and cloud computing allowed us to increase resource utilization and increase our flexibility to respond to business demand. Docker Containers are the next quantum leap - Are they?! Databases always represented an additional set of challenges unique to running workloads requiring a maximum of I/O, network, CPU resources combined with data locality.
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business...
SYS-CON Events announced today that GrapeUp, the leading provider of rapid product development at the speed of business, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market acr...
SYS-CON Events announced today that Ayehu will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara California. Ayehu provides IT Process Automation & Orchestration solutions for IT and Security professionals to identify and resolve critical incidents and enable rapid containment, eradication, and recovery from cyber security breaches. Ayehu provides customers greater control over IT infras...
Artificial intelligence, machine learning, neural networks. We’re in the midst of a wave of excitement around AI such as hasn’t been seen for a few decades. But those previous periods of inflated expectations led to troughs of disappointment. Will this time be different? Most likely. Applications of AI such as predictive analytics are already decreasing costs and improving reliability of industrial machinery. Furthermore, the funding and research going into AI now comes from a wide range of com...
SYS-CON Events announced today that SourceForge has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SourceForge is the largest, most trusted destination for Open Source Software development, collaboration, discovery and download on the web serving over 32 million viewers, 150 million downloads and over 460,000 active development projects each and every month.
In this presentation, Striim CTO and founder Steve Wilkes will discuss practical strategies for counteracting fraud and cyberattacks by leveraging real-time streaming analytics. In his session at @ThingsExpo, Steve Wilkes, Founder and Chief Technology Officer at Striim, will provide a detailed look into leveraging streaming data management to correlate events in real time, and identify potential breaches across IoT and non-IoT systems throughout the enterprise. Strategies for processing massive ...
"Our strategy is to focus on the hyperscale providers - AWS, Azure, and Google. Over the last year we saw that a lot of developers need to learn how to do their job in the cloud and we see this DevOps movement that we are catering to with our content," stated Alessandro Fasan, Head of Global Sales at Cloud Academy, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We focus on composable infrastructure. Composable infrastructure has been named by companies like Gartner as the evolution of the IT infrastructure where everything is now driven by software," explained Bruno Andrade, CEO and Founder of HTBase, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organi...
SYS-CON Events announced today that Cloud Academy named "Bronze Sponsor" of 21st International Cloud Expo which will take place October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud com...
What's the role of an IT self-service portal when you get to continuous delivery and Infrastructure as Code? This general session showed how to create the continuous delivery culture and eight accelerators for leading the change. Don Demcsak is a DevOps and Cloud Native Modernization Principal for Dell EMC based out of New Jersey. He is a former, long time, Microsoft Most Valuable Professional, specializing in building and architecting Application Delivery Pipelines for hybrid legacy, and cloud ...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.