|By Michael Bushong||
|July 24, 2014 08:45 AM EDT||
When people talk about Big Data, the emphasis is usually on the Big. Certainly, Big Data applications are distributed largely because the size of the data on which computations are executed warrants more than a typical application can handle. But scaling the network that provides connectivity between Big Data nodes is not just about creating massive interconnects.
In fact, the size of the network might be the least interesting aspect of scaling Big Data fabrics.
Just how big is Big Data?
Not that long ago, I asked the question: how large is a typical Big Data deployment? I was expecting, as I suspect many people are, that the Big in the title meant that the deployments would be, in a word, big. But the average Big Data deployment is actually far smaller than most people realize. I grabbed a list from HadoopWizard in an article dating back to last year.
What is remarkable about this list is just how unremarkable the sizes of the deployments are. Sure, the list is dated, and deployments have certainly gotten larger. And yes, companies like Yahoo! are pushing scaling limits. But the average deployment if you take Yahoo! out is a mere 113 nodes. Even if every node is multi-homed to two switches, this means the average deployment could be handled by 4 access switches.
Even if every deployment quadrupled, you would still only be talking about 16-access-switch deployments. When our industry talks about scaling, we usually think well beyond 16 switches.
Is scaling an issue?
So if deployments are small, does that mean scaling is a solved issue? The answer is both yes and no. If the end game is building individual networks for each Big Data application, then yes. While the web scale companies will always need more, the vast majority of customers will be well-served by the scaling limits that are around today.
But the issue with Big Data is that it isn’t really just Big Data. When we talk about Big Data, we usually ought to be using a different moniker. For most people, Big Data is less about Hadoop and more about clustered applications (at least so far as the network is concerned). By expanding the definition to clustered applications, you move past Hadoop and into clustered compute and even clustered storage environments. Anything clustered has a dependency on some kind of interconnect.
The challenge in clustered environments
The challenge of all these types of clustered environments is that their requirements vary. For Hadoop, job completion times are dominated by the compute side of things, so the network is really about providing a congestion-free interconnect that is always available. For clustered compute, latency might be more important. And for multi-tenant environments, it might be most important to isolate traffic. Whatever the application, the point is that the requirements are highly contextual.
Which brings us back to scaling.
The real issue in scaling Big Data fabrics is less about making a small interconnect larger. Networks are not going to scale along the lines of single applications (or at least they shouldn’t). The actual scaling challenge is plotting a course from a single Big Data application to an environment that hosts multiple clustered applications, each with different requirements.
This might seem dead simple, but it isn’t. When people deploy Big Data applications today, the Big part leads people to purpose-build architecture with massive data workloads in mind. In many cases, this includes building out separate networks aimed at specific workloads.
But even in the best cases, Hadoop makes use of things like rack awareness, which help provide application resilience while minimizing traffic across the network. Regardless of whether you view this as for the application or for the network, the result is that proximity and locality are built into the infrastructure. This creates interesting considerations (and potentially limitations) when expanding. If you want to grow a cluster, you can’t just use any available server in the datacenter; there are servers that are more preferable than others based solely on their physical location.
Scalability is more than scaling
Making a scalable interconnect for these types of clustered applications is more than just supporting a large (or as I mentioned previously, not so large) number of nodes. The objective for scalability is to provide a graceful path from start to finish. This means architectures need to consider not just what the ending state is but also how to get from here to there.
With Hadoop, this means that things like locality have to be an explicit consideration in architecting the interconnect. Is the right answer a bunch of cross-connects zigzagging across the datacenter? Maybe. Or it might be a different architectural approach to providing interconnect between clustered servers.
Additionally, it isn’t just about one application. Architecting for bandwidth because you have a Hadoop-y application is great, but what if the next clustered application is latency-sensitive? Or if it brings with it a set of auditing and compliance requirements more typical of HIPAA-style applications?
If the architecture doesn’t explicitly consider how to expand beyond a single application, even if it can grow to thousands of switches, it won’t really matter.
The bottom line
The punch line here is that scaling is not only about growing larger. It also means potentially growing more diverse. And if there is one thing that the Hadoop deployment numbers tell me, it’s that people are still experimenting. If you are still experimenting, how can you predict with certainty what the next 5 or 10 years will mean in terms of applications for your business? You can’t. Which means that the most important architectural objective might go well beyond the number of switches in a deployment. Scalability could be about building flexibility into you datacenter. How do you get a bunch of different purpose-built capabilities into a single, general-purpose network? Answering that might be the real key to determining how to scale Big Data fabrics.
[Today’s fun fact: It is against the law to use the Star Spangled Banner as dance music in Massachusetts. There go my party plans!]
SYS-CON Events announced today that BMC will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. BMC delivers software solutions that help IT transform digital enterprises for the ultimate competitive business advantage. BMC has worked with thousands of leading companies to create and deliver powerful IT management services. From mainframe to cloud to mobile, BMC pairs high-speed digital innovation with robust...
May. 27, 2015 11:30 AM EDT Reads: 346
As the Internet of Things unfolds, mobile and wearable devices are blurring the line between physical and digital, integrating ever more closely with our interests, our routines, our daily lives. Contextual computing and smart, sensor-equipped spaces bring the potential to walk through a world that recognizes us and responds accordingly. We become continuous transmitters and receivers of data. In his session at @ThingsExpo, Andrew Bolwell, Director of Innovation for HP's Printing and Personal S...
May. 27, 2015 11:30 AM EDT Reads: 4,249
All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo, June 9-11, 2015, at the Javits Center in New York City. Learn what is going on, contribute to the discussions, and ensure that your enter...
May. 27, 2015 11:15 AM EDT Reads: 2,767
Cloud computing started a technology revolution; now DevOps is driving that revolution forward. By enabling new approaches to service delivery, cloud and DevOps together are delivering even greater speed, agility, and efficiency. No wonder leading innovators are adopting DevOps and cloud together! In his session at DevOps Summit, Andi Mann, Vice President of Strategic Solutions at CA Technologies, explored the synergies in these two approaches, with practical tips, techniques, research data, wa...
May. 27, 2015 11:00 AM EDT Reads: 6,848
SYS-CON Events announced today that EnterpriseDB (EDB), the leading worldwide provider of enterprise-class Postgres products and database compatibility solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. EDB is the largest provider of Postgres software and services that provides enterprise-class performance and scalability and the open source freedom to divert budget from more costly traditiona...
May. 27, 2015 11:00 AM EDT Reads: 2,059
T-Mobile has been transforming the wireless industry with its “Uncarrier” initiatives. Today as T-Mobile’s IT organization works to transform itself in a like manner, technical foundations built over the last couple of years are now key to their drive for more Agile delivery practices. In his session at DevOps Summit, Martin Krienke, Sr Development Manager at T-Mobile, will discuss where they started their Continuous Delivery journey, where they are today, and where they are going in an effort ...
May. 27, 2015 11:00 AM EDT Reads: 2,108
“We are strong believers in the DevOps movement and our staff has been doing DevOps for large enterprise environments for a number of years. The solution that we build is intended to allow DevOps teams to do security at the speed of DevOps," explained Justin Lundy, Founder & CTO of Evident.io, in this SYS-CON.tv interview at DevOps Summit, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
May. 27, 2015 10:30 AM EDT Reads: 4,913
The Internet of Things is not only adding billions of sensors and billions of terabytes to the Internet. It is also forcing a fundamental change in the way we envision Information Technology. For the first time, more data is being created by devices at the edge of the Internet rather than from centralized systems. What does this mean for today's IT professional? In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will addresses this very serious issue o...
May. 27, 2015 10:30 AM EDT Reads: 1,279
There is no question that the cloud is where businesses want to host data. Until recently hypervisor virtualization was the most widely used method in cloud computing. Recently virtual containers have been gaining in popularity, and for good reason. In the debate between virtual machines and containers, the latter have been seen as the new kid on the block – and like other emerging technology have had some initial shortcomings. However, the container space has evolved drastically since coming on...
May. 27, 2015 10:15 AM EDT Reads: 2,074
Containers Expo Blog covers the world of containers, as this lightweight alternative to virtual machines enables developers to work with identical dev environments and stacks. Containers Expo Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. Bookmark Containers Expo Blog ▸ Here Follow new article posts on Twitter at @ContainersExpo
May. 27, 2015 10:00 AM EDT Reads: 1,452
Buzzword alert: Microservices and IoT at a DevOps conference? What could possibly go wrong? In this Power Panel at DevOps Summit, moderated by Jason Bloomberg, the leading expert on architecting agility for the enterprise and president of Intellyx, panelists will peel away the buzz and discuss the important architectural principles behind implementing IoT solutions for the enterprise. As remote IoT devices and sensors become increasingly intelligent, they become part of our distributed cloud en...
May. 27, 2015 10:00 AM EDT Reads: 2,237
EMC Corporation on Tuesday announced it has entered into a definitive agreement to acquire privately held Virtustream. When the transaction closes, Virtustream will form EMC’s new managed cloud services business. The acquisition represents a transformational element of EMC’s strategy to help customers move all applications to cloud-based IT environments. With the addition of Virtustream, EMC completes the industry’s most comprehensive hybrid cloud portfolio to support all applications, all workl...
May. 27, 2015 09:30 AM EDT Reads: 1,149
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
May. 27, 2015 09:30 AM EDT Reads: 4,603
The 5th International DevOps Summit, co-located with 17th International Cloud Expo – being held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the...
May. 27, 2015 09:30 AM EDT Reads: 4,624
If cloud computing benefits are so clear, why have so few enterprises migrated their mission-critical apps? The answer is often inertia and FUD. No one ever got fired for not moving to the cloud - not yet. In his session at 15th Cloud Expo, Michael Hoch, SVP, Cloud Advisory Service at Virtustream, discussed the six key steps to justify and execute your MCA cloud migration.
May. 27, 2015 09:30 AM EDT Reads: 3,195
You often hear the two titles of "DevOps" and "Immutable Infrastructure" used independently. In his session at DevOps Summit, John Willis, Technical Evangelist for Docker, will cover the union between the two topics and why this is important. He will cover an overview of Immutable Infrastructure then show how an Immutable Continuous Delivery pipeline can be applied as a best practice for "DevOps." He will end the session with some interesting case study examples.
May. 27, 2015 09:00 AM EDT Reads: 2,440
Docker is becoming very popular--we are seeing every major private and public cloud vendor racing to adopt it. It promises portability and interoperability, and is quickly becoming the currency of the Cloud. In his session at DevOps Summit, Bart Copeland, CEO of ActiveState, discussed why Docker is so important to the future of the cloud, but will also take a step back and show that Docker is actually only one piece of the puzzle. Copeland will outline the bigger picture of where Docker fits a...
May. 27, 2015 09:00 AM EDT Reads: 6,347
SYS-CON Media named Andi Mann editor of DevOps Journal. DevOps Journal is focused on this critical enterprise IT topic in the world of cloud computing. DevOps Journal brings valuable information to DevOps professionals who are transforming the way enterprise IT is done. Andi Mann, Vice President, Strategic Solutions, at CA Technologies, is an accomplished digital business executive with extensive global expertise as a strategist, technologist, innovator, marketer, communicator, and thought lea...
May. 27, 2015 09:00 AM EDT Reads: 2,254
Even though it’s now Microservices Journal, long-time fans of SOA World Magazine can take comfort in the fact that the URL – soa.sys-con.com – remains unchanged. And that’s no mistake, as microservices are really nothing more than a new and improved take on the Service-Oriented Architecture (SOA) best practices we struggled to hammer out over the last decade. Skeptics, however, might say that this change is nothing more than an exercise in buzzword-hopping. SOA is passé, and now that people are ...
May. 27, 2015 09:00 AM EDT Reads: 3,980
Enterprises are fast realizing the importance of integrating SaaS/Cloud applications, API and on-premises data and processes, to unleash hidden value. This webinar explores how managers can use a Microservice-centric approach to aggressively tackle the unexpected new integration challenges posed by proliferation of cloud, mobile, social and big data projects. Industry analyst and SOA expert Jason Bloomberg will strip away the hype from microservices, and clearly identify their advantages and d...
May. 27, 2015 09:00 AM EDT Reads: 2,459