Welcome!

SDN Journal Authors: Liz McMillan, Elizabeth White, Yeshim Deniz, Pat Romanski, TJ Randall

Related Topics: @DXWorldExpo, Java IoT, Microservices Expo, Log Management, @CloudExpo, SDN Journal

@DXWorldExpo: Article

Harnessing the Power of Big Data for BI

HP Vertica architecture gives massive performance boost to toughest BI queries for Infinity Insurance

The next edition of the HP Discover Performance Podcast Series highlights how Infinity Insurance Companies in Birmingham, Alabama has been deploying a new data architecture -- native column store databases -- to improve productivity for their analysis and business intelligence (BI) queries.

To learn more about how Infinity has improved their performance and their results for their business analytics, BriefingsDirect interviewed Barry Ralston, Assistant Vice President for Data Management at Infinity Insurance Companies. The discussion, which took place at the recent HP Discover 2013 Conference in Las Vegas, is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Learn more about the upcoming Vertica conference in Boston Aug. 5.]

Among other findings, Ralston and his team has seen a 100 times improvement in their top 12 worst-performing queries or longest-running queries when moving from a row-store-based Oracle Exadata implementation to a column store-based HP Vertica deployment. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:

Gardner: What was it that you've been doing with your BI and data warehousing that prompted you to seek an alternative?

Ralston: Like many companies, we have constructed an enterprise data warehouse deployed to a row-store technology. In our case, it was initially Oracle RAC and then, eventually, the Oracle Exadata engineered hardware/software appliance.

Ralston

We were noticing that analysis that typically occurs in our space wasn’t really optimized for execution via that row store. Based on my experience with Vertica, we did a proof of concept with a couple of other alternative and analytic store-type databases. We specifically chose Vertica to achieve higher productivity and to allow us to focus on optimizing queries and extracting value out of the data.

Gardner: What does Infinity Insurance Companies do? How big are you, and how important is data and analysis to you?

Ralston: We are billion-dollar property and casualty company, headquartered in Birmingham, Alabama. Like any insurance carrier, data is key to what we do. But one of the things that drew me to Infinity, after years of being in a consulting role, was the idea of their determination to use data as a strategic weapon, not just IT as a whole, but data specifically within that larger IT as a strategic or competitive advantage.

Vertica environment

Gardner: You have quite a bit of internal and structured data. Tell me a bit what happened when you moved into a Vertica environment, first in the proof of concept phase and then into production?

Ralston: For the proof of concept, we took the most difficult or worst-performing queries from our Exadata implementation and moved that entire enterprise data warehouse set into a Vertica deployment on three Dual Hex Core, DL380 type machines. We're running at the same scale, with the same data, with the same queries.

We took the top 12 worst-performing queries or longest-running queries from the Exadata implementation, and not one of the proof of concept queries ran less than 100 times faster. It was an easy decision to make in terms of the analytic workload, versus trying to use the Oracle row-store technology.

Gardner: Let’s dig into that a bit. I'm not a computer scientist and I don’t claim to fully understand the difference between row store, relational, and the column-based approach for Vertica. Give us the quick "Data Architecture 101" explanation of why this improvement is so impressive? [Learn more about the upcoming Vertica conference in Boston Aug. 5.]

Ralston: The original family of relational databases -- the current big three are  Oracle, SQL Server and DB2 -- are based on what we call row-storage technologies. They store information in blocks on disks, writing an entire row at a time.

If you had a record for an insured, you might have the insured's name, the date the policy went into effect, the date the policy next shows a payment, etc. All those attributes were written all at the same time in series to a row, which is combined into a block.

It’s an optimal way of storing data for transaction processing.

So storage has to be allocated in a particular fashion, to facilitate things like updates. It’s an optimal way of storing data for transaction processing. For now, it’s probably the state-of-the-art for that. If I am running an accounting system or a quote system, that’s the way to go.

Analytic queries are fundamentally different than transaction-processing queries. Think of the transaction processing as a cash register. You ring up a sale with a series of line items. Those get written to that row store database and that works well.

But when I want to know the top 10 products sold to my most profitable 20 percent of customers in a certain set of regions in the country, those set-based queries don’t perform well without major indexing. Often, that relates back to additional physical storage in a row-storage architecture.

Column store databases -- Vertica is a native column store database -- store data fundamentally differently than those row stores. We might break down a record into an entire set of columns or store distinctly. This allows me to do a couple of different things from an architectural level.

Sort, compress, organize

First and foremost, I can sort, compress, and organize the data on disk much more efficiently. Compression has been recently added to row-storage architectures, but in a row-storage database, you largely have to compress at the entirety of a row.

I can’t choose an optimal compression algorithm for just a date, because in that row, I will have text, numbers, and dates. In a column store, I can apply specific compression algorithm to the data that's in that column. So date gets one algorithm, a monotone increasing key like a surrogate key you might have in a dimensional data warehouse, has a different encoding algorithm, etc.

This is sorting. How data gets retrieved is fundamentally different, another big point for row-storage databases at query time. I could say, "Tell me all the customers that bought a product in California, but I only want to know their last name."

If I have 20 different attributes, a row-storage database actually has to read all the attributes off of disk. The query engine eliminates the ones I didn’t ask for in the eventual results, but I've already incurred the penalty of the input-output (I/O). This has a huge impact when you think of things like call detail records in telecom which have a 144-some odd columns.

If I'm only asking against a column store database, "Give me all the people who have last names, who bought a product in California," I'm essentially asking the database to read two columns off disk, and that’s all that’s happening. My I/O factors are improved by an order of 10 or in the case of the CDR, 1 in 144.

The great question is what ends up being the business value.

Gardner: You can’t just go back and increase your I/O improvements in those relational environments by making it in-memory or cutting down on the distance between the data and the processing? That only gets you so far, and you can only throw hardware at it so much. So fundamentally, it’s all about the architecture.

Ralston: Absolutely correct. You've seen a lot of these -- I think one of the fun terms around this is "unnatural acts with data," as to how data gets either scattered or put into a cache or other things. Every time you introduce one of these mechanisms, you're putting another bottleneck between near real-time analytics and getting the data from a source system into a user’s hands for analytics. Think of a cache. If you’re going to cache, you’ve got to warm that cache up to get an effect.

If I'm streaming data in from a sensor, real-time location servers, or something like that, I don’t get a whole lot of value out of the cache to start until it gets warmed up. I totally agree with your point there, Dana, that it’s all about the architecture.

In short, in leveraging Vertica, the underlying architecture allows me to create a playfield, if you will, for business analysts. They don’t necessarily have to be data scientists to enjoy it and be able to relate things that have a business relationship between each other, but not necessarily one that’s reflected in the data model, for whatever reason.

Performance suffers

Obviously in a row storage architecture, and specifically within dimensional data warehouses, if there is no index between a pair of columns, your performance begins to suffer. Vertica creates no indexes and it’s self-indexing the data via sorting and encoding.

So if I have an end user who wants to analyze something that’s never been analyzed before, but has a semantic relationship between those items, I don’t have to re-architect the data storage for them to get information back at the speed of their decision.

Gardner: What about opening this up to some new types of data and/or giving your users the folks in the insurance company the opportunity to look to external types of queries and learn more about markets, where they can apply new insurance products and grow the top line?

Ralston: That's definitely part of our strategic plan. Right now, 100 percent of the data being leveraged at Infinity is structured. We're leveraging Vertica to manage all that structured data, but we have a plan to leverage Hadoop and the Vertica Hadoop connectors, based on what I'm seeing around HAVEn, the idea of being able to seamlessly structured, non-structured data from one point.

Then, I’ve delivered what my CIO is asking me in terms of data as a competitive advantage.

Insurance is an interesting business in that, as my product and pricing people look for the next great indicator of risk, we essentially get to ride a wave of that competitive advantage for as long a period of time as it takes us to report that new rate to a state. The state shares that with our competitors, and then our competitors have to see if they want to bake into their systems what we’ve just found.

So we can use Vertica as a competitive hammer, Vertica plus Hadoop to do things that our competitors aren’t able to do. Then, I’ve delivered what my CIO is asking me in terms of data as a competitive advantage.

You may also be interested in:

More Stories By Dana Gardner

At Interarbor Solutions, we create the analysis and in-depth podcasts on enterprise software and cloud trends that help fuel the social media revolution. As a veteran IT analyst, Dana Gardner moderates discussions and interviews get to the meat of the hottest technology topics. We define and forecast the business productivity effects of enterprise infrastructure, SOA and cloud advances. Our social media vehicles become conversational platforms, powerfully distributed via the BriefingsDirect Network of online media partners like ZDNet and IT-Director.com. As founder and principal analyst at Interarbor Solutions, Dana Gardner created BriefingsDirect to give online readers and listeners in-depth and direct access to the brightest thought leaders on IT. Our twice-monthly BriefingsDirect Analyst Insights Edition podcasts examine the latest IT news with a panel of analysts and guests. Our sponsored discussions provide a unique, deep-dive focus on specific industry problems and the latest solutions. This podcast equivalent of an analyst briefing session -- made available as a podcast/transcript/blog to any interested viewer and search engine seeker -- breaks the mold on closed knowledge. These informational podcasts jump-start conversational evangelism, drive traffic to lead generation campaigns, and produce strong SEO returns. Interarbor Solutions provides fresh and creative thinking on IT, SOA, cloud and social media strategies based on the power of thoughtful content, made freely and easily available to proactive seekers of insights and information. As a result, marketers and branding professionals can communicate inexpensively with self-qualifiying readers/listeners in discreet market segments. BriefingsDirect podcasts hosted by Dana Gardner: Full turnkey planning, moderatiing, producing, hosting, and distribution via blogs and IT media partners of essential IT knowledge and understanding.

@CloudExpo Stories
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
HyperConvergence came to market with the objective of being simple, flexible and to help drive down operating expenses. It reduced the footprint by bundling the compute/storage/network into one box. This brought a new set of challenges as the HyperConverged vendors are very focused on their own proprietary building blocks. If you want to scale in a certain way, let's say you identified a need for more storage and want to add a device that is not sold by the HyperConverged vendor, forget about it...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
"We're focused on how to get some of the attributes that you would expect from an Amazon, Azure, Google, and doing that on-prem. We believe today that you can actually get those types of things done with certain architectures available in the market today," explained Steve Conner, VP of Sales at Cloudistics, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Sanjeev Sharma Joins November 11-13, 2018 @DevOpsSummit at @CloudEXPO New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.
JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at the Santa Clara Convention Center in Santa Clara, CA. The Japan External Trade Organization (JETRO) is a non-profit organization that provides business support services to companies expanding to Japan. With the support of JETRO's dedicated staff, clients can incorporate their business; receive visa, immigration, and HR support; find dedicated office space; identify local government subsidies; get...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
DXWorldEXPO LLC announced today that Dez Blanchfield joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Dez is a strategic leader in business and digital transformation with 25 years of experience in the IT and telecommunications industries developing strategies and implementing business initiatives. He has a breadth of expertise spanning technologies such as cloud computing, big data and analytics, cognitive computing, m...
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, added the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor analytic...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Michael Maximilien, better known as max or Dr. Max, is a computer scientist with IBM. At IBM Research Triangle Park, he was a principal engineer for the worldwide industry point-of-sale standard: JavaPOS. At IBM Research, some highlights include pioneering research on semantic Web services, mashups, and cloud computing, and platform-as-a-service. He joined the IBM Cloud Labs in 2014 and works closely with Pivotal Inc., to help make the Cloud Found the best PaaS.
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...
In a world where the internet rules all, where 94% of business buyers conduct online research, and where e-commerce sales are poised to fall between $427 billion and $443 billion by the end of this year, we think it's safe to say that your website is a vital part of your business strategy. Whether you're a B2B company, a local business, or an e-commerce site, digital presence is key to maintain in your drive towards success. Digital Performance will take priority in 2018 for the following reason...
I think DevOps is now a rambunctious teenager - it's starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
What's the role of an IT self-service portal when you get to continuous delivery and Infrastructure as Code? This general session showed how to create the continuous delivery culture and eight accelerators for leading the change. Don Demcsak is a DevOps and Cloud Native Modernization Principal for Dell EMC based out of New Jersey. He is a former, long time, Microsoft Most Valuable Professional, specializing in building and architecting Application Delivery Pipelines for hybrid legacy, and cloud ...