Welcome!

SDN Journal Authors: Pat Romanski, Patrick Hubbard, Elizabeth White, Sven Olav Lund, Liz McMillan

Related Topics: SDN Journal, @CloudExpo, @BigDataExpo

SDN Journal: Blog Feed Post

I (don’t) Like Big Buffers By @PlexxiInc | @CloudExpo [#SDN #Cloud #BigData]

Recently Arista released a white paper surrounding the idea that having deeper buffers running within the network can help

Recently Arista released a white paper surrounding the idea that having deeper buffers running within the network can help to alleviate the incast congestion patterns that can present when a large number of many-to-one connections are happening within a network. Also known as the TCP incast problem. They pointedly targeted Hadoop clusters, as the incast problem can rear its ugly head when utilizing the Hadoop Cluster for  MapReduce functions. The study used an example of 20 servers hanging off of a single ToR switch that has 40Gbps of uplink capacity within a Leaf/Spine network, presenting a 5:1 oversubscription ratio. This type of oversubscription was just seen in the recent release of the Facebook network that is used within their data centers. So its safe to assume that these types of oversubscription ratios are seen in the wild. I know I’ve run my fair share of oversubscribed networks in the past.

Treating the Symptom

This particular study actually prods at what is the achilles heel of the traditional leaf/spine network design. All nodes being within 3 switch hops, (ToR <-> Spine <-> ToR), does provide a predictable pathing within the minds of the network operators of today, but I posit that this design is another case of treating the symptom instead of curing the disease. Large buffers allow the the network to mask the disease of oversubscription, congestion caused because of this oversubscription and lack of path diversity and will ultimately not cure the disease. And this is all dependent upon whether or not the flows within a given network are short and bursty. If there are larger, more sustained flows, then larger buffers can at best add more latency to the path rather than increasing performance.

When addressing sustained flows, Little’s Theorem takes over and the rate at which the ‘front’ of the buffer empties is equal to the rate at which the ‘rear’ of the buffer is being populated. When this type of traffic pattern happens, depending on the patterns themselves, the only thing we’re realizing into the system is added latency. The frame needs to be copied into memory, a pointer created and dropped into a queue, that pointer makes it way through the queue, ultimately making it to the front of the queue and being called, it then pointing to the frame that is located in memory, serializing that frame back onto the PHY interface and transmitting it over the wire. This whole process does have an effect overall and does add latency. And again, if its a sustained flow, the best that we’re doing is adding latency to the path.

Curing the Disease

The way to cure the disease in this situation is to remove the outbound bottleneck on the ToR switch in this specific scenario, and we do that today with Plexxi switches. Using our unequal cost multipathing combined with our absence of a spine layer with respect to a data center network, we’re not faced with most of the problems that are discussed within the referenced study. And I reference ‘most’ of the problems as there are problems that weren’t taken into account within the study that should have been taken into account through the whole system and that is solving the choke point of the host itself, with respect to incast problems.

Outbound pathing on the leaf switch, and the inbound / outbound pathing on a given spine switch are both points within the network that can exhibit the TCP incast problem, but there is also the link that connects a given host to the network, as well. Currently there are limited ways in which we can solve this particular problem within a leaf/spine network, and that is to provide more connectivity between a host in a rack and the ToR switch in the form of a LAG, or depending on the type of equipment you have deployed, you may get away with an MLAG between two specific leaf switches. With Plexxi’s deployment of MLAG we’re able to create an MLAG between any two switches within a Plexxi Ring and a host that is connected to the Plexxi network. We do not have the typical vendor specific limitations of MLAG only being configurable between two statically defined switches.

By creating an MLAG between an arbitrary amount of switches within a Plexxi ring and providing unequal cost multipathing within our rings we’re able to diversify connectivity and dynamically allocate bandwidth to help alleviate congestion, on the fly. Removing the need for larger buffers within the network. This helps follow the age old, push the complexity to the edge of the network as much as possible. Our UECMP and MLAG connectivity shifts the congestion to the end host rather than having it contained in a blind spot within a given point of interconnections in a leaf/spine network.

The added ability of programmability and realization of understanding the dynamic allocation of distributed applications in a clustered computing resource allows us to model the network, in terms of required resources, on the fly as well. Meaning, we could potentially allocate network resources specific to the nodes that are potentially impacted by a job that is submitted to the cluster, but this is a post for another point in time. My point is, overall, the cure the incast problem wholly and completely, we need dynamic path diversity along with data-driven workload placement to fully optimize the distributed compute platforms that we’ll be dealing with in the future.

The post I (don’t) like Big Buffers. appeared first on Plexxi.

Read the original blog entry...

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

@CloudExpo Stories
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and...
The session is centered around the tracing of systems on cloud using technologies like ebpf. The goal is to talk about what this technology is all about and what purpose it serves. In his session at 21st Cloud Expo, Shashank Jain, Development Architect at SAP, will touch upon concepts of observability in the cloud and also some of the challenges we have. Generally most cloud-based monitoring tools capture details at a very granular level. To troubleshoot problems this might not be good enough.
When it comes to cloud computing, the ability to turn massive amounts of compute cores on and off on demand sounds attractive to IT staff, who need to manage peaks and valleys in user activity. With cloud bursting, the majority of the data can stay on premises while tapping into compute from public cloud providers, reducing risk and minimizing need to move large files. In his session at 18th Cloud Expo, Scott Jeschonek, Director of Product Management at Avere Systems, discussed the IT and busine...
SYS-CON Events announced today that Dasher Technologies will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Dasher Technologies, Inc. ® is a premier IT solution provider that delivers expert technical resources along with trusted account executives to architect and deliver complete IT solutions and services to help our clients execute their goals, plans and objectives. Since 1999, we'v...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities – ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups. As a result, many firms employ new business models that place enormous impor...
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual b...
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
SYS-CON Events announced today that TidalScale, a leading provider of systems and services, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale has been involved in shaping the computing landscape. They've designed, developed and deployed some of the most important and successful systems and services in the history of the computing industry - internet, Ethernet, operating s...
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
Transforming cloud-based data into a reportable format can be a very expensive, time-intensive and complex operation. As a SaaS platform with more than 30 million global users, Cornerstone OnDemand’s challenge was to create a scalable solution that would improve the time it took customers to access their user data. Our Real-Time Data Warehouse (RTDW) process vastly reduced data time-to-availability from 24 hours to just 10 minutes. In his session at 21st Cloud Expo, Mark Goldin, Chief Technolo...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
SYS-CON Events announced today that TidalScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale is the leading provider of Software-Defined Servers that bring flexibility to modern data centers by right-sizing servers on the fly to fit any data set or workload. TidalScale’s award-winning inverse hypervisor technology combines multiple commodity servers (including their ass...
In the fast-paced advances and popularity in cloud technology, one of the most critical factors revolves around concerns for security of your critical data. How to assure both your company and your customers they can confidently trust and utilize your cloud environment is most often top on the list. There is a method to evaluating and providing security that exceeds conventional modes of protecting data both within the cloud as well externally on mobile and other devices. With the public failure...
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant tha...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, will provide a fun and simple way to introduce Machine Leaning to anyone and everyone. Together we will solve a machine learning problem and find an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intellige...
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
As popularity of the smart home is growing and continues to go mainstream, technological factors play a greater role. The IoT protocol houses the interoperability battery consumption, security, and configuration of a smart home device, and it can be difficult for companies to choose the right kind for their product. For both DIY and professionally installed smart homes, developers need to consider each of these elements for their product to be successful in the market and current smart homes.
Infoblox delivers Actionable Network Intelligence to enterprise, government, and service provider customers around the world. They are the industry leader in DNS, DHCP, and IP address management, the category known as DDI. We empower thousands of organizations to control and secure their networks from the core-enabling them to increase efficiency and visibility, improve customer service, and meet compliance requirements.