Welcome!

SDN Journal Authors: Liz McMillan, Elizabeth White, Yeshim Deniz, Pat Romanski, TJ Randall

Related Topics: @CloudExpo, Microservices Expo, Containers Expo Blog

@CloudExpo: Blog Post

Big Moves in Big Data: EMC's Hadoop Strategy

To date, Big Storage has been locked out of Big Data

To date, Big Storage has been locked out of Big Data. It’s been all about direct attached storage for several reasons. First, Advanced SQL players have typically optimized architectures from data structure (using columnar), unique compression algorithms, and liberal usage of caching to juice response over hundreds of terabytes. For the NoSQL side, it’s been about cheap, cheap, cheap along the Internet data center model: have lots of commodity stuff and scale it out. Hadoop was engineered exactly for such an architecture; rather than speed, it was optimized for sheer linear scale.

Over the past year, most of the major platform players have planted their table stakes with Hadoop. Not surprisingly, IT household names are seeking to somehow tame Hadoop and make it safe for the enterprise.

Up ' til now, anybody with armies of the best software engineers that Internet firms could buy could brute force their way to scale out humungous clusters and if necessary, invent their own technology, then share and harvest from the open source community at will. Hardly a suitable scenario for the enterprise mainstream, the common thread behind the diverse strategies of IBM, EMC, Microsoft, and Oracle toward Hadoop has been to not surprisingly make Hadoop more approachable.

Up ' til now, anybody with armies of the best software engineers that Internet firms could buy could brute force their way to scale out humungous clusters and if necessary.

What’s been conspicuously absent so far was a play from Big Optimized Storage. The conventional wisdom is that SAN or NAS are premium, architected systems whose costs might be prohibitive when you talk petabytes of data.

Similarly, so far there has been a different operating philosophy behind the first generation implementations from the NoSQL world that assumed that parts would fail, and that five nines service levels were overkill. And anyway, the design of Hadoop brute forced the solution: replicate to have three unique copies of the data distributed around the cluster, as hardware is cheap.

As Big Data gains traction in the enterprise, some of it will certainly fit this pattern of something being better than nothing, as the result is unique insights that would not otherwise be possible. For instance, if your running analysis of Facebook or Twitter goes down, it probably won’t take the business with it. But as enterprises adopt Hadoop – and as pioneers stretch Hadoop to new operational use cases such as what Facebook is doing with its messaging system – those concepts of mission-criticality are being revisited.

And so, ever since EMC announced last spring that its Greenplum unit would start supporting and bundling different versions of Hadoop, we’ve been waiting for the other shoe to drop: When would EMC infuse its Big Data play with its core DNA, storage?

Today, EMC announced that its Isilon networked storage system was adding native support for Apache Hadoop’s HDFS file system. There were some interesting nuances to the rollout.

Big vendors feeling their way

It’s interesting to see how IT household names are cautiously navigating their way into unfamiliar territory. EMC becomes the latest, after Oracle and Microsoft, to calibrate their Hadoop strategy in public.

Oracle announced its Big Data appliance last fall before it lined up its Hadoop distribution. Microsoft ditched its Dryad project built around its HPC Server. Now EMC has recalibrated its Hadoop strategy; when it first unveiled its Hadoop strategy last spring, the spotlight was on the MapR proprietary alternatives to the HDFS file system of Apache Hadoop. It’s interesting that vendor initial announcements have either been vague, or have been tweaked as they’ve waded into the market. For EMC’s shift, more about that below.


For EMC, HDFS is the mainstream

MapR’s strategy (and IBM’s along with it, regarding GPFS) has prompted debate and concern in the Hadoop community about commercial vendors forking the technology. As we’ve ranted previously, Hadoop’s growth will be tied, not only to megaplatform vendors that support it, but the third party tools and solutions ecosystem that grows around it.

For such a thing to happen, ISVs and consulting firms need to have a common target to write against, and having forked versions of Hadoop won’t exactly grow large partner communities.

Regarding EMC, the original strategy was two Greenplum Hadoop editions: a Community Edition with a free Apache distro and an Enterprise Edition that bundled MapR, both under the Greenplum HD branding umbrella. At first blush, it looked like EMC was going to earn the bulk of its money from the proprietary side of the Hadoop business.

This reflects emerging conventional wisdom that the enterprise mainstream is leery about lock-in to anything that smells proprietary for technology where they still are in the learning curve.

What’s significant is that the new announcement of Isilon support pertains on to the HDFS open source side. More to the point, EMC is rebranding and subtly repositioning its Greenplum Hadoop offerings: Greenplum HD is the Apache HDFS edition with the optional Isilon support, and Greenplum MR is the MapR version, which is niche targeted towards advanced Hadoop use cases that demand higher performance.

Coming atop recent announcements from Oracle and Microsoft that have come clearly out on the side of OEM’ing Apache rather than anything limited or proprietary, and this amounts to an unqualified endorsement of Apache Hadoop/HDFS as not only the formal, but also the de facto standard.

This reflects emerging conventional wisdom that the enterprise mainstream is leery about lock-in to anything that smells proprietary for technology where they still are in the learning curve. Other forks may emerge, but they will not be at the base file system layer. This leaves IBM and MapR pigeonholed – admittedly, there will be API compatibility, but clearly both are swimming upstream.

Central Storage is newest battleground

As noted earlier, Hadoop’s heritage has been the classic Internet data center scale-out model. The advantage is that, leveraging Hadoop’s highly linear scalability, organizations could easily expand their clusters quite easily by plucking more commodity server and disk. Pioneers or purists would scoff at the notion of an appliance approach because it was always simply scaling out inexpensive, commodity hardware, rather than paying premiums for big vendor boxes.

In blunt terms, the choice is whether you pay now or pay later. As mentioned before, do-it-yourself compute clusters require sweat equity – you need engineers who know how to design, deploy, and operate them. The flipside is that many, arguably most corporate IT organizations either lack the skills or the capital. There are various solutions to what might otherwise appear a Hobson’s Choice:

  • Go to a cloud service provider that has already created the infrastructure, such as what Microsoft is offering with its Hadoop-on-Azure services;
  • Look for a happy, simpler medium such as Amazon’s Elastic MapReduce on its DynamoDB service;
  • Subscribe to SaaS providers that offer Hadoop applications (e.g., social network analysis, smart grid as a service) as a service;

    Pioneers or purists would scoff at the notion of an appliance approach because it was always simply scaling out inexpensive, commodity hardware, rather than paying premiums for big vendor boxes.

  • Get a platform and have a systems integrator put it together for you (key to IBM’s BigInsights offering, and applicable to any SI that has a Hadoop practice)
  • Go to an appliance or engineered systems approach that puts Hadoop and/or its subsystems in a box, such as with Oracle Big Data Appliance or EMC’s Greenplum DCA. The systems engineering is mostly done for you, but the increments for growing the system can be much larger than simply adding a few x86 servers here or there (Greenplum HD DCA can scale in groups of 4 server modules). Entry or expansion costs are not necessarily cheap, but then again, you have to balance capital cost against labor.
  • Surrounding Hadoop infrastructure with solutions. This is not a mutually exclusive strategy; unless you’re Cloudera or Hortonworks, which make their business bundling and supporting the core Apache Hadoop platform, most of the household names will bundle frameworks, algorithms, and eventually solutions that in effect place Hadoop under the hood. For EMC, the strategy is their recent announcement of a Unified Analytics Platform (UAP) that provides collaborative development capabilities for Big Data applications. EMC is (or will be) hardly alone here.

With EMC’s new offering, the scale-up option tackles the next variable: storage. This is the natural progression of a market that will address many constituencies, and where there will be no single silver bullet that applies to all.

This guest post comes courtesy of Tony Baer’s OnStrategies blog. Tony is a senior analyst at Ovum.

More Stories By Tony Baer

Tony Baer is Principal Analyst with Ovum, leading Ovum’s research on the software lifecycle. Working in concert with other members of Ovum’s software group, his research covers the full lifecycle from design and development to deployment and management. Areas of focus include application lifecycle management, software development methodologies (including agile), SOA, IT service management/ITIL, and IT management/governance.

Baer has been a noted authority on software development platforms and integration architecture for nearly 20 years. Prior to joining Ovum, he was an independent analyst whose company ‘onStrategies’ delivered software development and integration tools to vendors with technology assessment and market positioning services. He also led Computerwire’s CIO Agenda and Computer Finance end-user best practices research services.

Follow him on Twitter @TonyBaer or read his blog site www.onstrategies.com/blog.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
HyperConvergence came to market with the objective of being simple, flexible and to help drive down operating expenses. It reduced the footprint by bundling the compute/storage/network into one box. This brought a new set of challenges as the HyperConverged vendors are very focused on their own proprietary building blocks. If you want to scale in a certain way, let's say you identified a need for more storage and want to add a device that is not sold by the HyperConverged vendor, forget about it...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
"We're focused on how to get some of the attributes that you would expect from an Amazon, Azure, Google, and doing that on-prem. We believe today that you can actually get those types of things done with certain architectures available in the market today," explained Steve Conner, VP of Sales at Cloudistics, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Sanjeev Sharma Joins November 11-13, 2018 @DevOpsSummit at @CloudEXPO New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.
JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at the Santa Clara Convention Center in Santa Clara, CA. The Japan External Trade Organization (JETRO) is a non-profit organization that provides business support services to companies expanding to Japan. With the support of JETRO's dedicated staff, clients can incorporate their business; receive visa, immigration, and HR support; find dedicated office space; identify local government subsidies; get...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
DXWorldEXPO LLC announced today that Dez Blanchfield joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Dez is a strategic leader in business and digital transformation with 25 years of experience in the IT and telecommunications industries developing strategies and implementing business initiatives. He has a breadth of expertise spanning technologies such as cloud computing, big data and analytics, cognitive computing, m...
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, added the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor analytic...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Michael Maximilien, better known as max or Dr. Max, is a computer scientist with IBM. At IBM Research Triangle Park, he was a principal engineer for the worldwide industry point-of-sale standard: JavaPOS. At IBM Research, some highlights include pioneering research on semantic Web services, mashups, and cloud computing, and platform-as-a-service. He joined the IBM Cloud Labs in 2014 and works closely with Pivotal Inc., to help make the Cloud Found the best PaaS.
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...
In a world where the internet rules all, where 94% of business buyers conduct online research, and where e-commerce sales are poised to fall between $427 billion and $443 billion by the end of this year, we think it's safe to say that your website is a vital part of your business strategy. Whether you're a B2B company, a local business, or an e-commerce site, digital presence is key to maintain in your drive towards success. Digital Performance will take priority in 2018 for the following reason...
I think DevOps is now a rambunctious teenager - it's starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
What's the role of an IT self-service portal when you get to continuous delivery and Infrastructure as Code? This general session showed how to create the continuous delivery culture and eight accelerators for leading the change. Don Demcsak is a DevOps and Cloud Native Modernization Principal for Dell EMC based out of New Jersey. He is a former, long time, Microsoft Most Valuable Professional, specializing in building and architecting Application Delivery Pipelines for hybrid legacy, and cloud ...