Welcome!

SDN Journal Authors: Elizabeth White, Lori MacVittie, Sematext Blog , Tom Leyden, Michael Bushong

Related Topics: Big Data Journal, Java, Linux, Web 2.0, Cloud Expo, SDN Journal

Big Data Journal: Blog Feed Post

Scaling Big Data Fabrics

The size of the network might be the least interesting aspect of scaling Big Data fabrics

When people talk about Big Data, the emphasis is usually on the Big. Certainly, Big Data applications are distributed largely because the size of the data on which computations are executed warrants more than a typical application can handle. But scaling the network that provides connectivity between Big Data nodes is not just about creating massive interconnects.

In fact, the size of the network might be the least interesting aspect of scaling Big Data fabrics.

Just how big is Big Data?

Not that long ago, I asked the question: how large is a typical Big Data deployment? I was expecting, as I suspect many people are, that the Big in the title meant that the deployments would be, in a word, big. But the average Big Data deployment is actually far smaller than most people realize. I grabbed a list from HadoopWizard in an article dating back to last year.

What is remarkable about this list is just how unremarkable the sizes of the deployments are. Sure, the list is dated, and deployments have certainly gotten larger. And yes, companies like Yahoo! are pushing scaling limits. But the average deployment if you take Yahoo! out is a mere 113 nodes. Even if every node is multi-homed to two switches, this means the average deployment could be handled by 4 access switches.

Even if every deployment quadrupled, you would still only be talking about 16-access-switch deployments. When our industry talks about scaling, we usually think well beyond 16 switches.

Is scaling an issue?

So if deployments are small, does that mean scaling is a solved issue? The answer is both yes and no. If the end game is building individual networks for each Big Data application, then yes. While the web scale companies will always need more, the vast majority of customers will be well-served by the scaling limits that are around today.

But the issue with Big Data is that it isn’t really just Big Data. When we talk about Big Data, we usually ought to be using a different moniker. For most people, Big Data is less about Hadoop and more about clustered applications (at least so far as the network is concerned). By expanding the definition to clustered applications, you move past Hadoop and into clustered compute and even clustered storage environments. Anything clustered has a dependency on some kind of interconnect.

The challenge in clustered environments

The challenge of all these types of clustered environments is that their requirements vary. For Hadoop, job completion times are dominated by the compute side of things, so the network is really about providing a congestion-free interconnect that is always available. For clustered compute, latency might be more important. And for multi-tenant environments, it might be most important to isolate traffic. Whatever the application, the point is that the requirements are highly contextual.

Which brings us back to scaling.

The real issue in scaling Big Data fabrics is less about making a small interconnect larger. Networks are not going to scale along the lines of single applications (or at least they shouldn’t). The actual scaling challenge is plotting a course from a single Big Data application to an environment that hosts multiple clustered applications, each with different requirements.

This might seem dead simple, but it isn’t. When people deploy Big Data applications today, the Big part leads people to purpose-build architecture with massive data workloads in mind. In many cases, this includes building out separate networks aimed at specific workloads.

But even in the best cases, Hadoop makes use of things like rack awareness, which help provide application resilience while minimizing traffic across the network. Regardless of whether you view this as for the application or for the network, the result is that proximity and locality are built into the infrastructure. This creates interesting considerations (and potentially limitations) when expanding. If you want to grow a cluster, you can’t just use any available server in the datacenter; there are servers that are more preferable than others based solely on their physical location.

Scalability is more than scaling

Making a scalable interconnect for these types of clustered applications is more than just supporting a large (or as I mentioned previously, not so large) number of nodes. The objective for scalability is to provide a graceful path from start to finish. This means architectures need to consider not just what the ending state is but also how to get from here to there.

With Hadoop, this means that things like locality have to be an explicit consideration in architecting the interconnect. Is the right answer a bunch of cross-connects zigzagging across the datacenter? Maybe. Or it might be a different architectural approach to providing interconnect between clustered servers.

Additionally, it isn’t just about one application. Architecting for bandwidth because you have a Hadoop-y application is great, but what if the next clustered application is latency-sensitive? Or if it brings with it a set of auditing and compliance requirements more typical of HIPAA-style applications?

If the architecture doesn’t explicitly consider how to expand beyond a single application, even if it can grow to thousands of switches, it won’t really matter.

The bottom line

The punch line here is that scaling is not only about growing larger. It also means potentially growing more diverse. And if there is one thing that the Hadoop deployment numbers tell me, it’s that people are still experimenting. If you are still experimenting, how can you predict with certainty what the next 5 or 10 years will mean in terms of applications for your business? You can’t. Which means that the most important architectural objective might go well beyond the number of switches in a deployment. Scalability could be about building flexibility into you datacenter. How do you get a bunch of different purpose-built capabilities into a single, general-purpose network? Answering that might be the real key to determining how to scale Big Data fabrics.

[Today’s fun fact: It is against the law to use the Star Spangled Banner as dance music in Massachusetts. There go my party plans!]

The post Scaling Big Data fabrics appeared first on Plexxi.

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

@CloudExpo Stories
Verizon Enterprise Solutions is simplifying the cloud-purchasing experience for its clients, with the launch of Verizon Cloud Marketplace, a key foundational component of the company's robust ecosystem of enterprise-class technologies. The online storefront will initially feature pre-built cloud-based services from AppDynamics, Hitachi Data Systems, Juniper Networks, PfSense and Tervela. Available globally to enterprises using Verizon Cloud, Verizon Cloud Marketplace provides a one-stop shop fo...
AppZero has announced that its award-winning application migration software is now fully qualified within the Microsoft Azure Certified program. AppZero has undergone extensive technical evaluation with Microsoft Corp., earning its designation as Microsoft Azure Certified. As a result of AppZero's work with Microsoft, customers are able to easily find, purchase and deploy AppZero from the Azure Marketplace. With just a few clicks, users have an Azure-based solution for moving applications to the...
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today Isomorphic Software, the global leader in high-end, web-based business applications, will exhibit at SYS-CON's DevOps Summit 2015 New York, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Isomorphic Software is the global leader in high-end, web-based business applications. We develop, market, and support the SmartClient & Smart GWT HTML5/Ajax platform, combining the productivity and performance of traditional desktop software ...
SYS-CON Events announced today that IDenticard will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. IDenticard™ is the security division of Brady Corp (NYSE: BRC), a $1.5 billion manufacturer of identification products. We have small-company values with the strength and stability of a major corporation. IDenticard offers local sales, support and service to our customers across the United States and Canada...
SYS-CON Events announced today that AIC, a leading provider of OEM/ODM server and storage solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. AIC is a leading provider of both standard OTS, off-the-shelf, and OEM/ODM server and storage solutions. With expert in-house design capabilities, validation, manufacturing and production, AIC's broad selection of products are highly flexible and are conf...
Leysin American School is an exclusive, private boarding school located in Leysin, Switzerland. Leysin selected an OpenStack-powered, private cloud as a service to manage multiple applications and provide development environments for students across the institution. Seeking to meet rigid data sovereignty and data integrity requirements while offering flexible, on-demand cloud resources to users, Leysin identified OpenStack as the clear choice to round out the school's cloud strategy. Additional...
The BPM world is going through some evolution or changes where traditional business process management solutions really have nowhere to go in terms of development of the road map. In this demo at 15th Cloud Expo, Kyle Hansen, Director of Professional Services at AgilePoint, shows AgilePoint’s unique approach to dealing with this market circumstance by developing a rapid application composition or development framework.
The cloud is becoming the de-facto way for enterprises to leverage common infrastructure while innovating and one of the biggest obstacles facing public cloud computing is security. In his session at 15th Cloud Expo, Jeff Aliber, a global marketing executive at Verizon, discussed how the best place for web security is in the cloud. Benefits include: Functions as the first layer of defense Easy operation –CNAME change Implement an integrated solution Best architecture for addressing network-l...
“We help people build clusters, in the classical sense of the cluster. We help people put a full stack on top of every single one of those machines. We do the full bare metal install," explained Greg Bruno, Vice President of Engineering and co-founder of StackIQ, in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
The major cloud platforms defy a simple, side-by-side analysis. Each of the major IaaS public-cloud platforms offers their own unique strengths and functionality. Options for on-site private cloud are diverse as well, and must be designed and deployed while taking existing legacy architecture and infrastructure into account. Then the reality is that most enterprises are embarking on a hybrid cloud strategy and programs. In this Power Panel at 15th Cloud Expo (http://www.CloudComputingExpo.com...
"BSQUARE is in the business of selling software solutions for smart connected devices. It's obvious that IoT has moved from being a technology to being a fundamental part of business, and in the last 18 months people have said let's figure out how to do it and let's put some focus on it, " explained Dave Wagstaff, VP & Chief Architect, at BSQUARE Corporation, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
The move in recent years to cloud computing services and architectures has added significant pace to the application development and deployment environment. When enterprise IT can spin up large computing instances in just minutes, developers can also design and deploy in small time frames that were unimaginable a few years ago. The consequent move toward lean, agile, and fast development leads to the need for the development and operations sides to work very closely together. Thus, DevOps become...
"Our premise is Docker is not enough. That's not a bad thing - we actually love Docker. At ActiveState all our products are based on open source technology and Docker is an up-and-coming piece of open source technology," explained Bart Copeland, President & CEO of ActiveState Software, in this SYS-CON.tv interview at DevOps Summit at Cloud Expo®, held Nov 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Windstream, a leading provider of advanced network and cloud communications, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Windstream (Nasdaq: WIN), a FORTUNE 500 and S&P 500 company, is a leading provider of advanced network communications, including cloud computing and managed services, to businesses nationwide. The company also offers broadband, p...
The Internet of Things is not new. Historically, smart businesses have used its basic concept of leveraging data to drive better decision making and have capitalized on those insights to realize additional revenue opportunities. So, what has changed to make the Internet of Things one of the hottest topics in tech? In his session at @ThingsExpo, Chris Gray, Director, Embedded and Internet of Things, discussed the underlying factors that are driving the economics of intelligent systems. Discover ...

ARMONK, N.Y., Nov. 20, 2014 /PRNewswire/ --  IBM (NYSE: IBM) today announced that it is bringing a greater level of control, security and flexibility to cloud-based application development and delivery with a single-tenant version of Bluemix, IBM's

DevOps Summit 2015 New York, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that it is now accepting Keynote Proposals. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete...
“DevOps is really about the business. The business is under pressure today, competitively in the marketplace to respond to the expectations of the customer. The business is driving IT and the problem is that IT isn't responding fast enough," explained Mark Levy, Senior Product Marketing Manager at Serena Software, in this SYS-CON.tv interview at DevOps Summit, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Mobile commerce traffic is surpassing desktop, yet less than 20% of sales in the U.S. are mobile commerce sales. In his session at 15th Cloud Expo, Dan Franklin, Segment Manager, Commerce, at Verizon Digital Media Services, defined mobile devices and discussed how next generation means simplification. It means taking your digital content and turning it into instantly gratifying experiences.