Welcome!

SDN Journal Authors: Stefan Bernbo, Michel Courtoy, Amitabh Sinha, Mike Wood, Liz McMillan

Related Topics: SDN Journal, @CloudExpo, @BigDataExpo

SDN Journal: Blog Feed Post

I (don’t) Like Big Buffers By @PlexxiInc | @CloudExpo [#SDN #Cloud #BigData]

Recently Arista released a white paper surrounding the idea that having deeper buffers running within the network can help

Recently Arista released a white paper surrounding the idea that having deeper buffers running within the network can help to alleviate the incast congestion patterns that can present when a large number of many-to-one connections are happening within a network. Also known as the TCP incast problem. They pointedly targeted Hadoop clusters, as the incast problem can rear its ugly head when utilizing the Hadoop Cluster for  MapReduce functions. The study used an example of 20 servers hanging off of a single ToR switch that has 40Gbps of uplink capacity within a Leaf/Spine network, presenting a 5:1 oversubscription ratio. This type of oversubscription was just seen in the recent release of the Facebook network that is used within their data centers. So its safe to assume that these types of oversubscription ratios are seen in the wild. I know I’ve run my fair share of oversubscribed networks in the past.

Treating the Symptom

This particular study actually prods at what is the achilles heel of the traditional leaf/spine network design. All nodes being within 3 switch hops, (ToR <-> Spine <-> ToR), does provide a predictable pathing within the minds of the network operators of today, but I posit that this design is another case of treating the symptom instead of curing the disease. Large buffers allow the the network to mask the disease of oversubscription, congestion caused because of this oversubscription and lack of path diversity and will ultimately not cure the disease. And this is all dependent upon whether or not the flows within a given network are short and bursty. If there are larger, more sustained flows, then larger buffers can at best add more latency to the path rather than increasing performance.

When addressing sustained flows, Little’s Theorem takes over and the rate at which the ‘front’ of the buffer empties is equal to the rate at which the ‘rear’ of the buffer is being populated. When this type of traffic pattern happens, depending on the patterns themselves, the only thing we’re realizing into the system is added latency. The frame needs to be copied into memory, a pointer created and dropped into a queue, that pointer makes it way through the queue, ultimately making it to the front of the queue and being called, it then pointing to the frame that is located in memory, serializing that frame back onto the PHY interface and transmitting it over the wire. This whole process does have an effect overall and does add latency. And again, if its a sustained flow, the best that we’re doing is adding latency to the path.

Curing the Disease

The way to cure the disease in this situation is to remove the outbound bottleneck on the ToR switch in this specific scenario, and we do that today with Plexxi switches. Using our unequal cost multipathing combined with our absence of a spine layer with respect to a data center network, we’re not faced with most of the problems that are discussed within the referenced study. And I reference ‘most’ of the problems as there are problems that weren’t taken into account within the study that should have been taken into account through the whole system and that is solving the choke point of the host itself, with respect to incast problems.

Outbound pathing on the leaf switch, and the inbound / outbound pathing on a given spine switch are both points within the network that can exhibit the TCP incast problem, but there is also the link that connects a given host to the network, as well. Currently there are limited ways in which we can solve this particular problem within a leaf/spine network, and that is to provide more connectivity between a host in a rack and the ToR switch in the form of a LAG, or depending on the type of equipment you have deployed, you may get away with an MLAG between two specific leaf switches. With Plexxi’s deployment of MLAG we’re able to create an MLAG between any two switches within a Plexxi Ring and a host that is connected to the Plexxi network. We do not have the typical vendor specific limitations of MLAG only being configurable between two statically defined switches.

By creating an MLAG between an arbitrary amount of switches within a Plexxi ring and providing unequal cost multipathing within our rings we’re able to diversify connectivity and dynamically allocate bandwidth to help alleviate congestion, on the fly. Removing the need for larger buffers within the network. This helps follow the age old, push the complexity to the edge of the network as much as possible. Our UECMP and MLAG connectivity shifts the congestion to the end host rather than having it contained in a blind spot within a given point of interconnections in a leaf/spine network.

The added ability of programmability and realization of understanding the dynamic allocation of distributed applications in a clustered computing resource allows us to model the network, in terms of required resources, on the fly as well. Meaning, we could potentially allocate network resources specific to the nodes that are potentially impacted by a job that is submitted to the cluster, but this is a post for another point in time. My point is, overall, the cure the incast problem wholly and completely, we need dynamic path diversity along with data-driven workload placement to fully optimize the distributed compute platforms that we’ll be dealing with in the future.

The post I (don’t) like Big Buffers. appeared first on Plexxi.

Read the original blog entry...

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

@CloudExpo Stories
SYS-CON Events announced today that Datanami has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datanami is a communication channel dedicated to providing insight, analysis and up-to-the-minute information about emerging trends and solutions in Big Data. The publication sheds light on all cutting-edge technologies including networking, storage and applications, and the...
SYS-CON Events announced today that EnterpriseTech has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. EnterpriseTech is a professional resource for news and intelligence covering the migration of high-end technologies into the enterprise and business-IT industry, with a special focus on high-tech solutions in new product development, workload management, increased effi...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
Cloud promises the agility required by today’s digital businesses. As organizations adopt cloud based infrastructures and services, their IT resources become increasingly dynamic and hybrid in nature. Managing these require modern IT operations and tools. In his session at 20th Cloud Expo, Raj Sundaram, Senior Principal Product Manager at CA Technologies, will discuss how to modernize your IT operations in order to proactively manage your hybrid cloud and IT environments. He will be sharing bes...
A look across the tech landscape at the disruptive technologies that are increasing in prominence and speculate as to which will be most impactful for communications – namely, AI and Cloud Computing. In his session at 20th Cloud Expo, Curtis Peterson, VP of Operations at RingCentral, highlighted the current challenges of these transformative technologies and shared strategies for preparing your organization for these changes. This “view from the top” outlined the latest trends and developments i...
Automation is enabling enterprises to design, deploy, and manage more complex, hybrid cloud environments. Yet the people who manage these environments must be trained in and understanding these environments better than ever before. A new era of analytics and cognitive computing is adding intelligence, but also more complexity, to these cloud environments. How smart is your cloud? How smart should it be? In this power panel at 20th Cloud Expo, moderated by Conference Chair Roger Strukhoff, paneli...
Hardware virtualization and cloud computing allowed us to increase resource utilization and increase our flexibility to respond to business demand. Docker Containers are the next quantum leap - Are they?! Databases always represented an additional set of challenges unique to running workloads requiring a maximum of I/O, network, CPU resources combined with data locality.
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business...
SYS-CON Events announced today that GrapeUp, the leading provider of rapid product development at the speed of business, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market acr...
SYS-CON Events announced today that Ayehu will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara California. Ayehu provides IT Process Automation & Orchestration solutions for IT and Security professionals to identify and resolve critical incidents and enable rapid containment, eradication, and recovery from cyber security breaches. Ayehu provides customers greater control over IT infras...
Artificial intelligence, machine learning, neural networks. We’re in the midst of a wave of excitement around AI such as hasn’t been seen for a few decades. But those previous periods of inflated expectations led to troughs of disappointment. Will this time be different? Most likely. Applications of AI such as predictive analytics are already decreasing costs and improving reliability of industrial machinery. Furthermore, the funding and research going into AI now comes from a wide range of com...
SYS-CON Events announced today that SourceForge has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SourceForge is the largest, most trusted destination for Open Source Software development, collaboration, discovery and download on the web serving over 32 million viewers, 150 million downloads and over 460,000 active development projects each and every month.
In this presentation, Striim CTO and founder Steve Wilkes will discuss practical strategies for counteracting fraud and cyberattacks by leveraging real-time streaming analytics. In his session at @ThingsExpo, Steve Wilkes, Founder and Chief Technology Officer at Striim, will provide a detailed look into leveraging streaming data management to correlate events in real time, and identify potential breaches across IoT and non-IoT systems throughout the enterprise. Strategies for processing massive ...
"Our strategy is to focus on the hyperscale providers - AWS, Azure, and Google. Over the last year we saw that a lot of developers need to learn how to do their job in the cloud and we see this DevOps movement that we are catering to with our content," stated Alessandro Fasan, Head of Global Sales at Cloud Academy, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We focus on composable infrastructure. Composable infrastructure has been named by companies like Gartner as the evolution of the IT infrastructure where everything is now driven by software," explained Bruno Andrade, CEO and Founder of HTBase, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organi...
SYS-CON Events announced today that Cloud Academy named "Bronze Sponsor" of 21st International Cloud Expo which will take place October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud com...
What's the role of an IT self-service portal when you get to continuous delivery and Infrastructure as Code? This general session showed how to create the continuous delivery culture and eight accelerators for leading the change. Don Demcsak is a DevOps and Cloud Native Modernization Principal for Dell EMC based out of New Jersey. He is a former, long time, Microsoft Most Valuable Professional, specializing in building and architecting Application Delivery Pipelines for hybrid legacy, and cloud ...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.