SDN Journal Authors: Pat Romanski, Elizabeth White, Yeshim Deniz, Destiny Bertucci, Liz McMillan

Related Topics: SDN Journal, @CloudExpo, @DXWorldExpo

SDN Journal: Blog Feed Post

I (don’t) Like Big Buffers By @PlexxiInc | @CloudExpo [#SDN #Cloud #BigData]

Recently Arista released a white paper surrounding the idea that having deeper buffers running within the network can help

Recently Arista released a white paper surrounding the idea that having deeper buffers running within the network can help to alleviate the incast congestion patterns that can present when a large number of many-to-one connections are happening within a network. Also known as the TCP incast problem. They pointedly targeted Hadoop clusters, as the incast problem can rear its ugly head when utilizing the Hadoop Cluster for  MapReduce functions. The study used an example of 20 servers hanging off of a single ToR switch that has 40Gbps of uplink capacity within a Leaf/Spine network, presenting a 5:1 oversubscription ratio. This type of oversubscription was just seen in the recent release of the Facebook network that is used within their data centers. So its safe to assume that these types of oversubscription ratios are seen in the wild. I know I’ve run my fair share of oversubscribed networks in the past.

Treating the Symptom

This particular study actually prods at what is the achilles heel of the traditional leaf/spine network design. All nodes being within 3 switch hops, (ToR <-> Spine <-> ToR), does provide a predictable pathing within the minds of the network operators of today, but I posit that this design is another case of treating the symptom instead of curing the disease. Large buffers allow the the network to mask the disease of oversubscription, congestion caused because of this oversubscription and lack of path diversity and will ultimately not cure the disease. And this is all dependent upon whether or not the flows within a given network are short and bursty. If there are larger, more sustained flows, then larger buffers can at best add more latency to the path rather than increasing performance.

When addressing sustained flows, Little’s Theorem takes over and the rate at which the ‘front’ of the buffer empties is equal to the rate at which the ‘rear’ of the buffer is being populated. When this type of traffic pattern happens, depending on the patterns themselves, the only thing we’re realizing into the system is added latency. The frame needs to be copied into memory, a pointer created and dropped into a queue, that pointer makes it way through the queue, ultimately making it to the front of the queue and being called, it then pointing to the frame that is located in memory, serializing that frame back onto the PHY interface and transmitting it over the wire. This whole process does have an effect overall and does add latency. And again, if its a sustained flow, the best that we’re doing is adding latency to the path.

Curing the Disease

The way to cure the disease in this situation is to remove the outbound bottleneck on the ToR switch in this specific scenario, and we do that today with Plexxi switches. Using our unequal cost multipathing combined with our absence of a spine layer with respect to a data center network, we’re not faced with most of the problems that are discussed within the referenced study. And I reference ‘most’ of the problems as there are problems that weren’t taken into account within the study that should have been taken into account through the whole system and that is solving the choke point of the host itself, with respect to incast problems.

Outbound pathing on the leaf switch, and the inbound / outbound pathing on a given spine switch are both points within the network that can exhibit the TCP incast problem, but there is also the link that connects a given host to the network, as well. Currently there are limited ways in which we can solve this particular problem within a leaf/spine network, and that is to provide more connectivity between a host in a rack and the ToR switch in the form of a LAG, or depending on the type of equipment you have deployed, you may get away with an MLAG between two specific leaf switches. With Plexxi’s deployment of MLAG we’re able to create an MLAG between any two switches within a Plexxi Ring and a host that is connected to the Plexxi network. We do not have the typical vendor specific limitations of MLAG only being configurable between two statically defined switches.

By creating an MLAG between an arbitrary amount of switches within a Plexxi ring and providing unequal cost multipathing within our rings we’re able to diversify connectivity and dynamically allocate bandwidth to help alleviate congestion, on the fly. Removing the need for larger buffers within the network. This helps follow the age old, push the complexity to the edge of the network as much as possible. Our UECMP and MLAG connectivity shifts the congestion to the end host rather than having it contained in a blind spot within a given point of interconnections in a leaf/spine network.

The added ability of programmability and realization of understanding the dynamic allocation of distributed applications in a clustered computing resource allows us to model the network, in terms of required resources, on the fly as well. Meaning, we could potentially allocate network resources specific to the nodes that are potentially impacted by a job that is submitted to the cluster, but this is a post for another point in time. My point is, overall, the cure the incast problem wholly and completely, we need dynamic path diversity along with data-driven workload placement to fully optimize the distributed compute platforms that we’ll be dealing with in the future.

The post I (don’t) like Big Buffers. appeared first on Plexxi.

Read the original blog entry...

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

@CloudExpo Stories
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
DXWorldEXPO LLC announced today that Kevin Jackson joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Kevin L. Jackson is a globally recognized cloud computing expert and Founder/Author of the award winning "Cloud Musings" blog. Mr. Jackson has also been recognized as a "Top 100 Cybersecurity Influencer and Brand" by Onalytica (2015), a Huffington Post "Top 100 Cloud Computing Experts on Twitter" (2013) and a "Top 50 C...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve fu...
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Daniel Jones is CTO of EngineerBetter, helping enterprises deliver value faster. Previously he was an IT consultant, indie video games developer, head of web development in the finance sector, and an award-winning martial artist. Continuous Delivery makes it possible to exploit findings of cognitive psychology and neuroscience to increase the productivity and happiness of our teams.
Predicting the future has never been more challenging - not because of the lack of data but because of the flood of ungoverned and risk laden information. Microsoft states that 2.5 exabytes of data are created every day. Expectations and reliance on data are being pushed to the limits, as demands around hybrid options continue to grow.
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Evan Kirstel is an internationally recognized thought leader and social media influencer in IoT (#1 in 2017), Cloud, Data Security (2016), Health Tech (#9 in 2017), Digital Health (#6 in 2016), B2B Marketing (#5 in 2015), AI, Smart Home, Digital (2017), IIoT (#1 in 2017) and Telecom/Wireless/5G. His connections are a "Who's Who" in these technologies, He is in the top 10 most mentioned/re-tweeted by CMOs and CIOs (2016) and have been recently named 5th most influential B2B marketeer in the US. H...
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
@DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises - and delivering real results.
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
DXWorldEXPO LLC announced today that Dez Blanchfield joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Dez is a strategic leader in business and digital transformation with 25 years of experience in the IT and telecommunications industries developing strategies and implementing business initiatives. He has a breadth of expertise spanning technologies such as cloud computing, big data and analytics, cognitive computing, m...
"We started a Master of Science in business analytics - that's the hot topic. We serve the business community around San Francisco so we educate the working professionals and this is where they all want to be," explained Judy Lee, Associate Professor and Department Chair at Golden Gate University, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.