Welcome!

SDN Journal Authors: Liz McMillan, Stefan Bernbo, Michel Courtoy, Amitabh Sinha, Mike Wood

Related Topics: SDN Journal, @CloudExpo, @BigDataExpo

SDN Journal: Blog Feed Post

I (don’t) Like Big Buffers By @PlexxiInc | @CloudExpo [#SDN #Cloud #BigData]

Recently Arista released a white paper surrounding the idea that having deeper buffers running within the network can help

Recently Arista released a white paper surrounding the idea that having deeper buffers running within the network can help to alleviate the incast congestion patterns that can present when a large number of many-to-one connections are happening within a network. Also known as the TCP incast problem. They pointedly targeted Hadoop clusters, as the incast problem can rear its ugly head when utilizing the Hadoop Cluster for  MapReduce functions. The study used an example of 20 servers hanging off of a single ToR switch that has 40Gbps of uplink capacity within a Leaf/Spine network, presenting a 5:1 oversubscription ratio. This type of oversubscription was just seen in the recent release of the Facebook network that is used within their data centers. So its safe to assume that these types of oversubscription ratios are seen in the wild. I know I’ve run my fair share of oversubscribed networks in the past.

Treating the Symptom

This particular study actually prods at what is the achilles heel of the traditional leaf/spine network design. All nodes being within 3 switch hops, (ToR <-> Spine <-> ToR), does provide a predictable pathing within the minds of the network operators of today, but I posit that this design is another case of treating the symptom instead of curing the disease. Large buffers allow the the network to mask the disease of oversubscription, congestion caused because of this oversubscription and lack of path diversity and will ultimately not cure the disease. And this is all dependent upon whether or not the flows within a given network are short and bursty. If there are larger, more sustained flows, then larger buffers can at best add more latency to the path rather than increasing performance.

When addressing sustained flows, Little’s Theorem takes over and the rate at which the ‘front’ of the buffer empties is equal to the rate at which the ‘rear’ of the buffer is being populated. When this type of traffic pattern happens, depending on the patterns themselves, the only thing we’re realizing into the system is added latency. The frame needs to be copied into memory, a pointer created and dropped into a queue, that pointer makes it way through the queue, ultimately making it to the front of the queue and being called, it then pointing to the frame that is located in memory, serializing that frame back onto the PHY interface and transmitting it over the wire. This whole process does have an effect overall and does add latency. And again, if its a sustained flow, the best that we’re doing is adding latency to the path.

Curing the Disease

The way to cure the disease in this situation is to remove the outbound bottleneck on the ToR switch in this specific scenario, and we do that today with Plexxi switches. Using our unequal cost multipathing combined with our absence of a spine layer with respect to a data center network, we’re not faced with most of the problems that are discussed within the referenced study. And I reference ‘most’ of the problems as there are problems that weren’t taken into account within the study that should have been taken into account through the whole system and that is solving the choke point of the host itself, with respect to incast problems.

Outbound pathing on the leaf switch, and the inbound / outbound pathing on a given spine switch are both points within the network that can exhibit the TCP incast problem, but there is also the link that connects a given host to the network, as well. Currently there are limited ways in which we can solve this particular problem within a leaf/spine network, and that is to provide more connectivity between a host in a rack and the ToR switch in the form of a LAG, or depending on the type of equipment you have deployed, you may get away with an MLAG between two specific leaf switches. With Plexxi’s deployment of MLAG we’re able to create an MLAG between any two switches within a Plexxi Ring and a host that is connected to the Plexxi network. We do not have the typical vendor specific limitations of MLAG only being configurable between two statically defined switches.

By creating an MLAG between an arbitrary amount of switches within a Plexxi ring and providing unequal cost multipathing within our rings we’re able to diversify connectivity and dynamically allocate bandwidth to help alleviate congestion, on the fly. Removing the need for larger buffers within the network. This helps follow the age old, push the complexity to the edge of the network as much as possible. Our UECMP and MLAG connectivity shifts the congestion to the end host rather than having it contained in a blind spot within a given point of interconnections in a leaf/spine network.

The added ability of programmability and realization of understanding the dynamic allocation of distributed applications in a clustered computing resource allows us to model the network, in terms of required resources, on the fly as well. Meaning, we could potentially allocate network resources specific to the nodes that are potentially impacted by a job that is submitted to the cluster, but this is a post for another point in time. My point is, overall, the cure the incast problem wholly and completely, we need dynamic path diversity along with data-driven workload placement to fully optimize the distributed compute platforms that we’ll be dealing with in the future.

The post I (don’t) like Big Buffers. appeared first on Plexxi.

Read the original blog entry...

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

@CloudExpo Stories
As more and more companies are making the shift from on-premises to public cloud, the standard approach to DevOps is evolving. From encryption, compliance and regulations like GDPR, security in the cloud has become a hot topic. Many DevOps-focused companies have hired dedicated staff to fulfill these requirements, often creating further siloes, complexity and cost. This session aims to highlight existing DevOps cultural approaches, tooling and how security can be wrapped in every facet of the bu...
For financial firms, the cloud is going to increasingly become a crucial part of dealing with customers over the next five years and beyond, particularly with the growing use and acceptance of virtual currencies. There are new data storage paradigms on the horizon that will deliver secure solutions for storing and moving sensitive financial data around the world without touching terrestrial networks. In his session at 20th Cloud Expo, Cliff Beek, President of Cloud Constellation Corporation, d...
IT organizations are moving to the cloud in hopes to approve efficiency, increase agility and save money. Migrating workloads might seem like a simple task, but what many businesses don’t realize is that application migration criteria differs across organizations, making it difficult for architects to arrive at an accurate TCO number. In his session at 21st Cloud Expo, Joe Kinsella, CTO of CloudHealth Technologies, will offer a systematic approach to understanding the TCO of a cloud application...
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Secure Channels, a cybersecurity firm, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Secure Channels, Inc. offers several products and solutions to its many clients, helping them protect critical data from being compromised and access to computer networks from the unauthorized. The company develops comprehensive data encryption security strategie...
SYS-CON Events announced today that WineSOFT will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Based in Seoul and Irvine, WineSOFT is an innovative software house focusing on internet infrastructure solutions. The venture started as a bootstrap start-up in 2010 by focusing on making the internet faster and more powerful. WineSOFT’s knowledge is based on the expertise of TCP/IP, VPN, S...
WebRTC is the future of browser-to-browser communications, and continues to make inroads into the traditional, difficult, plug-in web communications world. The 6th WebRTC Summit continues our tradition of delivering the latest and greatest presentations within the world of WebRTC. Topics include voice calling, video chat, P2P file sharing, and use cases that have already leveraged the power and convenience of WebRTC.
SYS-CON Events announced today that App2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. App2Cloud is an online Platform, specializing in migrating legacy applications to any Cloud Providers (AWS, Azure, Google Cloud).
The goal of Continuous Testing is to shift testing left to find defects earlier and release software faster. This can be achieved by integrating a set of open source functional and performance testing tools in the early stages of your software delivery lifecycle. There is one process that binds all application delivery stages together into one well-orchestrated machine: Continuous Testing. Continuous Testing is the conveyer belt between the Software Factory and production stages. Artifacts are m...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
Cloud resources, although available in abundance, are inherently volatile. For transactional computing, like ERP and most enterprise software, this is a challenge as transactional integrity and data fidelity is paramount – making it a challenge to create cloud native applications while relying on RDBMS. In his session at 21st Cloud Expo, Claus Jepsen, Chief Architect and Head of Innovation Labs at Unit4, will explore that in order to create distributed and scalable solutions ensuring high availa...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics ...
"We're a cybersecurity firm that specializes in engineering security solutions both at the software and hardware level. Security cannot be an after-the-fact afterthought, which is what it's become," stated Richard Blech, Chief Executive Officer at Secure Channels, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize existing data center assets, leverage the advantages of cloud computing and avoid cloud vendor lock-in. This requires a globally aware traffic management strategy that can monitor infrastructure health across data centers and end-user experience globally, while responding to control changes and system specification at the speed of today’s DevOps teams. In his session at 20th Cloud Expo, Josh Gray, Chie...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. Jack Norris reviews best practices to show how companies develop, deploy, and dynamically update these applications and how this data-first...
Intelligent Automation is now one of the key business imperatives for CIOs and CISOs impacting all areas of business today. In his session at 21st Cloud Expo, Brian Boeggeman, VP Alliances & Partnerships at Ayehu, will talk about how business value is created and delivered through intelligent automation to today’s enterprises. The open ecosystem platform approach toward Intelligent Automation that Ayehu delivers to the market is core to enabling the creation of the self-driving enterprise.
"We're here to tell the world about our cloud-scale infrastructure that we have at Juniper combined with the world-class security that we put into the cloud," explained Lisa Guess, VP of Systems Engineering at Juniper Networks, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Historically, some banking activities such as trading have been relying heavily on analytics and cutting edge algorithmic tools. The coming of age of powerful data analytics solutions combined with the development of intelligent algorithms have created new opportunities for financial institutions. In his session at 20th Cloud Expo, Sebastien Meunier, Head of Digital for North America at Chappuis Halder & Co., discussed how these tools can be leveraged to develop a lasting competitive advantage ...
As businesses adopt functionalities in cloud computing, it’s imperative that IT operations consistently ensure cloud systems work correctly – all of the time, and to their best capabilities. In his session at @BigDataExpo, Bernd Harzog, CEO and founder of OpsDataStore, presented an industry answer to the common question, “Are you running IT operations as efficiently and as cost effectively as you need to?” He then expounded on the industry issues he frequently came up against as an analyst, and ...
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, will provide a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to ...