Welcome!

SDN Journal Authors: Pat Romanski, Patrick Hubbard, Elizabeth White, Sven Olav Lund, Liz McMillan

Related Topics: @CloudExpo, @BigDataExpo, SDN Journal

@CloudExpo: Blog Post

How Data Analytics Drives the Business By @Dana_Gardner [#BigData]

HP Analytics blazes new trails in examining business trends from myriad data

The next BriefingsDirect deep-dive big data thought leadership interview examines how HP analyzes its own vast data warehouses to derive new insights for its global operations, extensive supply chain, sales organization, global marketing groups, and customers.

We'll explore how the Analytics Group at HP, based in India, sifts through myriad internal data sources, as well as joins with other public data sets, to deliver entirely new intelligence value that helps make business more responsive and efficient.

To learn how, BriefingsDirect sat down with Pramod Singh, Director of Digital and Big Data Analytics at HP Analytics in Bangalore, India, at the recent HP Big Data 2014 Conference in Boston. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us a little bit about the Analytics Group at HP, what you do, and what’s the charter of your organization.

Singh: We have a big analytics organization in HP, it’s called Global Analytics and serves the analytics for most of HP. About 80 to 90 percent of the analytics happening inside of HP comes out of this eco-system. We do analytics across the entire food chain at HP, which includes the supply chains, marketing, and sales.

What I personally lead is an organization called Digital Analytics, and we are responsible for doing analytics across all digital properties for HP. That includes the eCommerce, social media, search, and campaign analytics. Additionally, we also have a Center of Excellence for Big Data Analytics, where we're using HP’s big-data technologies, which is that framework called HAVEn, to help develop big-data solutions for HP customers, as well as internal HP.

Gardner: Obviously, HP is a very large global company. What sort of datasets are we talking about here? What’s the volume that you're working with?

Data explosion

Singh: As you know, a data explosion is happening. On one end, HP has done a very good job over the last six to seven years of getting most of their enterprise data into something called an enterprise data warehouse. We're talking about close to two petabytes of data, which is structured data.

Singh

The great part of this journey is that we have taken data from 700-800 different data marts into one enterprise data warehouse over the last three to four years. A lot of data that is not part of the enterprise is also becoming an important part of making the business decisions.

A lot of that data I personally deal with in the digital space, is what we call the human-generated data, the social media data, which no enterprise owns. It’s open for anybody to go use that. What I've started to see is that, on one hand, we've done a really good job of getting data in the enterprise and getting value out of it.

We've also started to analyze and harvest the data that is out in the open space. It could be blogs, Twitter feeds, or Facebook data. Combining that is what’s bringing real business value.

The Global Analytics organization is more than 1,000 people spread through different parts of the world. A big chunk of that is in Bangalore, India, but we have folks in the US and the UK. We have a center in Guadalajara, Mexico and couple of other locations in India. My particular organization is close to 100 people.

We've also started to analyze and harvest the data that is out in the open space. It could be blogs, Twitter feeds, or Facebook data. Combining that is what’s bringing real business value.

I have a PhD in pure mathematics, and before that I had an MBA in marketing. It's a little bit of an awkward mix there, and got in into analytics space in mid '90s working for Walmart.

I built out Walmart’s Assortment Planning System in late '90s and then came to HP in 2000 leading an advance data-mining center in Austin, Texas. From there I evolved into doing e-business analytics for few years and then moved to customer knowledge management. I spent five years in IT developing analytics platform.

About year-and-a-half ago, I got an opportunity to lead the big-data practice for this organization called Global Analytics. In five years, they had gone from five people to more than 1,000 people, and that intrigued me a lot. I was able to take the opportunity and move to India to go lead that team.

More insights

Gardner: Pramod, when we look back into this data, do you gain more insights knowing what you're looking for, or not knowing what you're looking for? What kind of insights were the unexpected consequences of your putting together this type of data infrastructure and then applying big-data analytics to it?

Singh: We deal with that day-in and day-out. I’ll give you a couple of examples there. This is something that happened about three or four years ago with HP. We were looking at a problem that was a classic problem in marketing to the US small and medium-sized business organizations (SMBs). We had a fixed budget for marketing, and across the US, there are more than 20 million SMBs. The classic definition of an SMBs is any business with 100-500 employees.

HP had an install base of a small part of that. We realized that particular segment of a SMBs is squeezed between a classic consumer, where you can do mass marketing, such as TV advertising, and an enterprise, where you can actually put bodies, your people who have relationship. SMBs are squeezed in between those two extremes.

The question then became what do we do with that? Again, when you do data mining and analytics, you may not know where this will lead you.

On one hand, you can't reach out to every single one of them. It’s just way too expensive to do that. On the other hand, if you try to go do the marketing, you don’t get the best out of it.

We were starting to work on something like that. I was approached by a vice president in marketing who said revenues are declining and they had a limited marketing budget. They didn’t know what to do.

This is where one of those unexpected things came in. I said, "Let’s see in that install base whether there are different segments of customers that are behaving differently." That led us on kind of a journey where we said, well, "How do we start to do that right? Let’s figure out what are the different attributes of data that I can capture."

On one hand, if you look at SMBs, you can capture who they are, what industry segment they're in, how many employees they have, where are they based, who the CEO is. It's what we call firmographics.

On the other hand, you have classes of data involving their interaction with HP. It could be things like how many PCs or servers they bought, how long ago did they buy it, how much money they spent, the whole transactional aspect of it.

Then, there are some things that are derived attributes. You may be able to derive that in the last one year they came to us four times. What interaction did we have on the website,? For example, did they come to us through a web channel? If they did, how many email offers were sent to them? How many of those were clicked? How many of those converted? Those are the classes of data that we could capture.

The question then became what do we do with that? Again, when you do data mining and analytics, you may not know where this will lead you.

Mathematical modeling

We thought that maybe there are different classes of customers. We pulled our data together and started to do mathematical modeling. There are techniques called clustering, analytical techniques called K-Means, and things like that. We started to get some results and to analyze them. In this type of situation, we have to be careful, because there are some things that may look mathematically correct, but may not have a real business value behind it.

Once we started to look at those things, we went through multiple iterations. We realized that we were not getting segments or clusters that were very distinct. One day, I was driving home in Austin, and I said, "You know what? Who they are I don’t control, but as far as what they're doing with HP we have a reasonably good understanding."

So we started to do clustering based only on those attributes, and that’s where an "aha" moment came. We started to find these clusters, which we call segments, where we eventually found a cluster which was that 7 to 8 percent of the population that brought in 45 percent of revenue.

The marketers started to say that this was a gold mine. That’s what we never expected to happen. We put together a structure. Once we figured out these four or five clusters, we tried to figure out why they were clustered together. What’s common?

We built out a primary research thing, where we took a random sample out of each one of those clusters, interviewed those guys, and were able to build a very good profile of what these segments were.

There are 20 million SMBs in US, and we are able to build a model to predict which of these prospects are similar to the clusters we had. That’s where we were able to find customers that looked like our most profitable customers, which we ended up calling Vanguards. That resulted into a tremendous amount of  a dollar increment for HP. It's a good example of what you talked when you find unexpected things.

We just wanted to analyze data. It led us to a journey and ended up finding a customer group we weren't even aware of. Then, we could build marketing strategy to actually go target those and get some value out of it.

Gardner: At the Big Data Conference, I've spoken to other organizations who are creating an analytics capability and then exposing that to as many of their employees as possible, hoping for this very sort of unexpected positive benefit. Is there a way that you're taking your analytics either through visualization or tools and then allowing a larger population within HP to experiment with it?

Singh: We're trying to democratize the analytics as much as we can. One thing we're realizing is that to get the full value, you don't want data to stay in silos. So there are a couple of things you have to do. In terms of building out an ecosystem where you have good set of motivated people and where you can give them a career path, we have created this organization called Global Analytics. You get a critical mass of people who challenge each other, learn from each other, and do lot of analytics.

But also it’s very important that on the consumption side of it, you have people who are analysts and understand analytics and get the best value out of it. So they try to create that ecosystem. We have seen both ends of it.

Good career path

If you just give them to one data miner or analytics person in one team, sometimes the person does not find an ecosystem to challenge himself or herself. We're trying to do it on both sides of the fence, so that we can provide people with a good career path.

Hiring these folks is not easy. Once you've hired them, retaining them is not easy. You want to make sure to create an ecosystem where it’s challenging enough for these people to work. It also has to be an ecosystem where you continually challenge them and keep training them.

The analytical techniques are evolving. When I started doing it, things were stable for years. Now, the newer class of data is coming in, newer techniques are coming in, and newer classes of business problems are coming in. It’s very important that we keep the ecosystem going. So we try to do it on both sides.

Gardner: Very interesting. HP, of course, has its own line of products for big-data analysis. You're such a large global enterprise that you're doing lots of analysis, as any good business should, but you're also being asked to show how this works. Are there some specific use cases that demonstrate for other enterprises what you've learned yourselves.

You want to make sure to create an ecosystem where it’s challenging enough for these people to work. It also has to be an ecosystem where you continually challenge them and keep training them.

Singh: There are several that we can talk about. One is in a social media space. I briefly talked about that. My career evolved of doing analytics in what I call "data inside the enterprise." But, over the last couple of years, we started to go look at data outside the enterprise.

Recently we went and looked at a bank. We were able to harvest data from the Internet, publicly available data like Glassdoor, for example. Glassdoor is a website where employees of a company can put their feedback, talk about the company, and rate things.

We were presenting it to the executives of this particular bank and we were able to get all the data and tell them the overall employee morale. We figured out that the life-work balance for the employees wasn't very good.

The main component that the employees weren't happy about was their leave policy and their vacation policy. We drilled down and figured out that the bankers seemed to be fairly happy, but the IT guys and analysts weren't very happy. Again, this is one example where we didn't ask for a line of data from the customer. This data is publicly available. You and I, or anybody else, can go get it. I can do that same analysis for HP or any other company.

That’s where I believe the classes of analytics we're doing is changing. A lot of times, your competitive differentiator is the ability to do things with that data. Data is a corporate asset and it will be, but this class of what we call the user-generated data is changing analytics as a whole. The ability to go harvest it and, more importantly, get value out of it will be the competitive differentiator.

Gardner: Any other use cases that demonstrate the power of a particular type of platform, let’s say Vertica in HAVEn, where you've got the power of a columnar architecture and you've got the ability to bring in unstructured data from Autonomy? Maybe there are a couple of use cases that demonstrate the unique attributes of HAVEn when it comes to inclusivity and the comprehensive nature of information today?

Game changer

Singh: Let me talk about a couple of the things that happened in the HAVEn ecosystem. One of the main work forces in HAVEn is our massively parallel database called Vertica. In addition to being a database where we can ingest data very quickly, ingest large volumes of data, and run query performance, the game-changer for us as an analytics practitioner for me has been ability to do analytics in database.

If I look at my career over the last 20-22 years, most of the times what happens in the analytics space is that you have data residing in a database or an enterprise data warehouse. When you want to build a model, you take the data out and use an analytics platform like SAS, R, or SPSS. You do something there and you either bring the data back into the environment or you run the models and publish them out.

What Vertica has done that's unique is given us a framework, and through the UDEF framework, we could build a data mining model and run it directly on a database engine and take the output out.

An example we took to HP Discover a couple of months ago was trying to predict a failure of a machine before the actual failure happens. HP has these big machines and big printers, which are very expensive.

Like lot of high-end devices these days, they send out a lot of data. They send out data about when you're using a machine. The sensors send out a lot of information, maybe the pressure of the valves, the kind of the temperature they're in, the kind of throughput they're giving you, or the number of pages you've printed.

Looking at each components of failure, we could predict with a certain probability when the machine will fail and with a certain probability.

Also, they give you data on the events when the machine was not performing optimally or actually failed. We were able to go ingest all that data, put the data onto in the Vertica  platform, and build predictive models using open source R language. We built a model that can predict the failure of a machine.

Looking at each components of failure, we could predict with a certain probability when the machine will fail and with a certain probability, so our service reps can actually be proactive and not wait for the machine to fail. That's one example of doing an in-database data mining using Vertica.

Another example used more components around the social-media space. One of the problems in the social-media space, and I think you guys are probably familiar with this, is finding influencers.

I gave a talk yesterday around figuring out how you do that. There are classical ways if you go by the uni-dimensional thing around the number of followers or retweets you have. Barack Obama or Lady Gaga would be big influencers, but Barack Obama, for cloud computing for HP, may not be a very big influencer.

So you build those classes of algorithms. My team has actually built out three patented algorithms to figure out how to identify influencers in the space. We've actually built out a framework where we can source that data from the social-media space, drop it into a Hadoop kind of an environment.

We use Autonomy to enrich and put some sentiments to it and then drop the data into the Vertica environment. In that Vertica environment, you run the compressed algorithms and get an output. Then, you can score and predict who is the influencer for the topic you are looking for.

Influencers

I gave the example of Barack Obama, in general a big influencer, but he is not influencer for all topics. Maybe in politics or the US government he's a big influencer, but not for cloud computing. Influencer is also a function of time. Somebody like Diego Maradona probably was a big influencer in soccer in the ’90s, but in 2014, not that much.

You have to make sure that you can incorporate those as part of the logic of your algorithm. We've been able to use the multiple components of HAVEn and build out a complete framework where we can tell numerically who the main influencers are and how influential they are. For example, if you get a score of 93 and I get a score of 22, you are almost four times as influential as I am.

Gardner: For other organizations that are interested in learning more about how HP Analytics is operating and maybe learning from your example, are there any resources or websites we can go to, where you are providing more information about HP Analytics?

You have to make sure that you can incorporate those as part of the logic of your algorithm.

Singh: Definitely. We work through our partners in Enterprise Services. We have our own website as well. There are multiple ways that you can approach us. You can talk to the Vertica sales team and they can connect to us. As I said, we do analytics for all of HP and for select customers. We do not have a direct sales arm to us. We work through our partners in Enterprise Services, as well as with software team.

You may also be interested in:

More Stories By Dana Gardner

At Interarbor Solutions, we create the analysis and in-depth podcasts on enterprise software and cloud trends that help fuel the social media revolution. As a veteran IT analyst, Dana Gardner moderates discussions and interviews get to the meat of the hottest technology topics. We define and forecast the business productivity effects of enterprise infrastructure, SOA and cloud advances. Our social media vehicles become conversational platforms, powerfully distributed via the BriefingsDirect Network of online media partners like ZDNet and IT-Director.com. As founder and principal analyst at Interarbor Solutions, Dana Gardner created BriefingsDirect to give online readers and listeners in-depth and direct access to the brightest thought leaders on IT. Our twice-monthly BriefingsDirect Analyst Insights Edition podcasts examine the latest IT news with a panel of analysts and guests. Our sponsored discussions provide a unique, deep-dive focus on specific industry problems and the latest solutions. This podcast equivalent of an analyst briefing session -- made available as a podcast/transcript/blog to any interested viewer and search engine seeker -- breaks the mold on closed knowledge. These informational podcasts jump-start conversational evangelism, drive traffic to lead generation campaigns, and produce strong SEO returns. Interarbor Solutions provides fresh and creative thinking on IT, SOA, cloud and social media strategies based on the power of thoughtful content, made freely and easily available to proactive seekers of insights and information. As a result, marketers and branding professionals can communicate inexpensively with self-qualifiying readers/listeners in discreet market segments. BriefingsDirect podcasts hosted by Dana Gardner: Full turnkey planning, moderatiing, producing, hosting, and distribution via blogs and IT media partners of essential IT knowledge and understanding.

@CloudExpo Stories
The dynamic nature of the cloud means that change is a constant when it comes to modern cloud-based infrastructure. Delivering modern applications to end users, therefore, is a constantly shifting challenge. Delivery automation helps IT Ops teams ensure that apps are providing an optimal end user experience over hybrid-cloud and multi-cloud environments, no matter what the current state of the infrastructure is. To employ a delivery automation strategy that reflects your business rules, making r...
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software. They hope to capture value from emerging technologies such as IoT, SDN, and AI. Ultimately, irrespective of the vertical, it is about deriving value from independent software applications participating in an ecosystem as one comprehensive solution. In his session at @ThingsExpo, Kausik Sridhar, founder and CTO of Pulzze Systems, will discuss how given the magnitude of today's applicati...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Amazon is pursuing new markets and disrupting industries at an incredible pace. Almost every industry seems to be in its crosshairs. Companies and industries that once thought they were safe are now worried about being “Amazoned.”. The new watch word should be “Be afraid. Be very afraid.” In his session 21st Cloud Expo, Chris Kocher, a co-founder of Grey Heron, will address questions such as: What new areas is Amazon disrupting? How are they doing this? Where are they likely to go? What are th...
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp emp...
SYS-CON Events announced today that SkyScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SkyScale is a world-class provider of cloud-based, ultra-fast multi-GPU hardware platforms for lease to customers desiring the fastest performance available as a service anywhere in the world. SkyScale builds, configures, and manages dedicated systems strategically located in maximum-security...
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant th...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, will go over the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, applicatio...
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...
Microsoft Azure Container Services can be used for container deployment in a variety of ways including support for Orchestrators like Kubernetes, Docker Swarm and Mesos. However, the abstraction for app development that support application self-healing, scaling and so on may not be at the right level. Helm and Draft makes this a lot easier. In this primarily demo-driven session at @DevOpsSummit at 21st Cloud Expo, Raghavan "Rags" Srinivas, a Cloud Solutions Architect/Evangelist at Microsoft, wi...
Containers are rapidly finding their way into enterprise data centers, but change is difficult. How do enterprises transform their architecture with technologies like containers without losing the reliable components of their current solutions? In his session at @DevOpsSummit at 21st Cloud Expo, Tony Campbell, Director, Educational Services at CoreOS, will explore the challenges organizations are facing today as they move to containers and go over how Kubernetes applications can deploy with lega...
SYS-CON Events announced today that Avere Systems, a leading provider of hybrid cloud enablement solutions, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere Systems was created by file systems experts determined to reinvent storage by changing the way enterprises thought about and bought storage resources. With decades of experience behind the company’s founders, Avere got its ...
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. ANSeeN are the measurement electronics maker for X-ray and Gamma-ray and Neutron measurement equipment such as spectrometers, pulse shape analyzer, and CdTe-FPD. For more information, visit http://anseen.com/.
Today most companies are adopting or evaluating container technology - Docker in particular - to speed up application deployment, drive down cost, ease management and make application delivery more flexible overall. As with most new architectures, this dream takes significant work to become a reality. Even when you do get your application componentized enough and packaged properly, there are still challenges for DevOps teams to making the shift to continuous delivery and achieving that reducti...
As you move to the cloud, your network should be efficient, secure, and easy to manage. An enterprise adopting a hybrid or public cloud needs systems and tools that provide: Agility: ability to deliver applications and services faster, even in complex hybrid environments Easier manageability: enable reliable connectivity with complete oversight as the data center network evolves Greater efficiency: eliminate wasted effort while reducing errors and optimize asset utilization Security: imple...
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, will discuss how by using...
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lead...
SYS-CON Events announced today that Yuasa System will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Yuasa System is introducing a multi-purpose endurance testing system for flexible displays, OLED devices, flexible substrates, flat cables, and films in smartphones, wearables, automobiles, and healthcare.
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
SYS-CON Events announced today that CAST Software will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CAST was founded more than 25 years ago to make the invisible visible. Built around the idea that even the best analytics on the market still leave blind spots for technical teams looking to deliver better software and prevent outages, CAST provides the software intelligence that matter ...