SDN Journal Authors: Elizabeth White, Yeshim Deniz, Liz McMillan, Pat Romanski, TJ Randall

Related Topics: @DXWorldExpo, Containers Expo Blog, SDN Journal

@DXWorldExpo: Blog Post

Pay Attention to Big Data By @MartenT1999 | @BigDataExpo [#BigData]

Big Data Application folks have a pretty good understanding of the role of the network

Network Engineers, Pay Attention to Big Data

You have probably realized we are having a Big Data kind of week here at the Plexxi blog. And for good reason. The amount of development and change in this big bucket of applications we conveniently label “Big Data”, is astonishing.

Walking around at Hadoopworld in New York last week, I initially felt somewhat lost as a “networking guy”. But that feeling of “not belonging” is only superficial, the network has a tremendously important role in these applications. The challenge is that many “networking” folks don’t quite understand or realize that yet, but contrary to what I believed not too long ago, Big Data Application folks have a pretty good understanding of the role of the network in their overall application and its performance.

As an industry we have been talking about the increase in east-west traffic for quite a few years now. For your typical datacenter infrastructure today this is based on loosely coupled applications and semi-distributed storage. A web based application has many components that together make up the application we see as users. There are application load balancers, web server front ends, application back ends that in turn have databases for their data storage. And those databases may have local or more likely centralized or semi-distributed physical storage. Then these storage systems have replication and backup components. All of these interactions we have traditionally labeled east-west, this is all traffic inside the datacenter required to pass the appropriate data back to the application user. Whether that is a person or another application.

The communication patterns in more traditional distributed applications like these are fairly straightforward to understand. Some basic measurements and profiling should give you a pretty decent view of how each of the components of the application behaves, how they interact and what the network requirements are between them. The application developers may not necessarily be able to provide you with exact needs and guidance before a deployment, but the applications, once they have gone through at least one scaling and performance adjustment cycle, will typically fall into a specific pattern that will be fairly consistent for the life of the application. And the job of the network engineer is to ensure that the network provides appropriate connectivity for these communication patterns.

Big Data applications bring these east-west concepts to new levels. They are designed to run in a parallel or distributed system. They depend on moving extremely large amounts of data through the compute infrastructure. They are built with the assumption that data and computation is continually distributed and replicated across members of a big data cluster. Many of these applications are built to tackle a multitude of different data analysis jobs. Each of them different in its data set, its data reduction behaviors and therefore different in what it wants or needs from the network. For that, you need a much more dynamic network than the ones your have built in the past.

Many Big Data deployments today are built on top of 1GbE networks. It is easy to draw the conclusion that therefore the network is not an issue. And it’s probably the biggest mistake to make. It is easy to think of Big Data projects as compute intensive analysis and reductions of extremely large amounts of data. In reality, many big data applications are semi-real time streaming data based. Each piece of data may require only a fairly small amount of computation, but the sheer amount of data requires new levels of connectivity we may not be used to.

Last week at Hadoopworld I had an interesting chat with someone that worked for an Ad Tech company. Ad Tech is a fast growing sub industry in marketing and advertising, focused on digital advertising and marketing. This gentleman asked us about some of the performance characteristics of a Plexxi network. When we asked him for some more details of his deployment, he explained that he manages a big data cluster of about 200 servers and that with the switches he uses today (from one of our competitors and certainly at the top end of expected performance), he can only populate about half of the switch’s available port before congestion becomes an issue. His cluster fairly consistently pushes 700 Gbit to 1 Tbit per second across the racks. There are very very few network infrastructures out there that are specifically designed and built to support those types of applications.

(I can already hear someone say: “why wouldn’t he use a big chassis based switch in the middle of the network for 200 devices”. If as a network industry we believe our answer is to create ever larger centralized switches then we have not learned from all the industries around us in that same datacenter).

The answer is also not the often heard “just throw more bandwidth at the problem”. This Ad Tech company is a prime example of how we need to evolve our thinking in how we support these new applications. We have to stop pretending to be able to support new application infrastructures with new demands, needs and requirements with the same network we have been building for years. The applications are evolving. Servers, storage and how they are being used is evolving. Network engineers need to dive into this whole new world of Big Data Applications. It’s scary, there are many new acronyms and names you will not recognize. If you thought the network world was creative in naming of things, the application folks have us beat, hands down.

Don’t be afraid of these new applications. They are coming whether you like it or not. Embrace them, understand them as best you can. Then sit back and think about what the network can do for them. You have an ability to significantly impact their ability to perform. But you may have to put traditional thinking aside and step out of the box that has provided so much comfort for so many years. It will be worth it.


[Today's fun fact: The first penny had the motto "Mind your business." Can we bring those pennies back please?]

The post Network Engineers, Pay Attention to Big Data appeared first on Plexxi.

Read the original blog entry...

More Stories By Marten Terpstra

Marten Terpstra is a Product Management Director at Plexxi Inc. Marten has extensive knowledge of the architecture, design, deployment and management of enterprise and carrier networks.

CloudEXPO Stories
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like "How is my application doing" but no idea how to get a proper answer.
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed by some of the world's largest financial institutions. The company develops and applies innovative machine-learning technologies to big data to predict financial, economic, and world events. The team is a group of passionate technologists, mathematicians, data scientists and programmers in Silicon Valley with over 100 patents to their names. Big Data Federation was incorporated in 2015 and is ...
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by researching target group and involving users in the designing process.
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments that frequently get lost in the hype. The panel will discuss their perspective on what they see as they key challenges and/or impediments to adoption, and how they see those issues could be resolved or mitigated.
CloudEXPO New York 2018, colocated with DevOpsSUMMIT and DXWorldEXPO New York 2018 will be held November 12-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI and Machine Learning to one location.