Welcome!

SDN Journal Authors: Hovhannes Avoyan, Yeshim Deniz, Sharon Barkai, Michael Bushong, Pat Romanski

Related Topics: Big Data Journal, Java, SOA & WOA, .NET

Big Data Journal: Article

Detecting Anomalies that Matter!

Like needles in a haystack

As Netuitive's Chief Data Scientist, I am fortunate to work closely with some of the worlds' largest banks, telcos, and eCommerce companies. Increasingly the executives that I speak with at these companies are no longer focused on just detecting application performance anomalies - they want to understand the impact this has on the business.  For example - "is the current slowdown in the payment service impacting sales?"

You can think of it as detecting IT operations anomalies that really matter - but this is easier said than done.

Like Needles in a Haystack
When it comes to IT analytics, there is a general notion that the more monitoring data you are able to consume, analyze, and correlate, the more accurate your results will be. Just pile all that infrastructure, application performance, and business metric data together and good things are bound to happen, right?

Larger organizations typically have access to voluminous data being generated from dozens of monitoring tools that are tracking thousands of infrastructure and application components.  At the same time, these companies often track hundreds of business metrics using a totally different set of tools.

The problem is that, collectively, these monitoring tools do not communicate with each other.  Not only is it hard to get holistic visibility into the performance and health of a particular business service, it's even harder to discover complex anomalies that have business impact.

Anomalies are Like Snowflakes
Compounding the challenge is the fact that no two anomalies are alike.  Anomalies that matter have multiple facets.  They reflect a composite behavior of many layers of interacting and inter-dependent components.  Additionally, they can be cleverly disguised or hidden in a haze of visible but insignificant noise.  No matter how many graphs and charts you display on the largest LCD monitor you can find - the type of scalable real-time analysis required to find and expose what's important is humanly impossible.

Enter IT Operations Analytics
Analytics such as statistical machine learning allow us to understand the "normal" behavior of each resource we are tracking - be it a single IT component, web service, application, or business process. Additional algorithms help us find patterns and correlations between the thousands of IT and business metrics that matter in a critical service.

The Shift Towards IT Operations Analytics is Already Happening
This is not about the future.  It's about what companies are doing today.

Several years ago thought-leading enterprises (primarily large banks with critical revenue driving services) began experimenting with a new breed of IT analytics platform. These companies' electronic and web facing businesses had so much revenue (and reputation) at stake that they needed to find the anomalies that matter -- the ones that were truly indicative of current or impending problems.

Starting with an almost "blank slate", these forward-thinking companies began developing open IT analytics platforms that easily integrated any type of data source in real time to provide a comprehensive view of patterns and relationships between IT infrastructure and business service performance. This was only possible with technologies that leveraged sophisticated data integration, knowledge modeling, and analytics to discover and capture the unique behavior of complex business services.  Anything less would fail, because, like snowflakes, no two anomalies are alike.

The Continuous Need for Algorithm Research
The online banking system at one bank is different than the online system at the next bank.  And the transaction slowdown that occurred last week may have a totally different root cause than the one two months ago.  Even more interesting are external factors such as seasonality and its effects on demand.  For example, payment companies see increased workload around holidays such as Thanksgiving and Mother's Day whereas gaming/betting companies' demand is driven more by factors such as the NFL Playoffs or the World Series.

For this reason, analytics research is an ongoing endeavor at Netuitive - part driven by customer needs and in part by advances in technology.   Once Netuitive technology is installed in an enterprise and integrating data collected across multiple layers in the service stack, behavior learning begins immediately.  As time passes, the statistical algorithms have more observations to feed their results and this leads to increasing confidence in both anomalies detected and proactive forecasts.  Additionally, customer domain knowledge can be layered in to Netuitive's real-time analysis in the form of knowledge bases and supervised learning algorithms.  The Research Group at Netuitive works closely with our Professional Services Group as well as directly with customers to regularly review actual delivered alarm quality to tune the algorithms that we have as well as identify new algorithms that would deliver greater value in an actionable timeframe.

Since Netuitive's software architecture allows for "pluggable" algorithms, we can incrementally introduce new analytics capabilities easily, at first in an experimental or laboratory setting and ultimately, once verified, into production.

The IT operations management market has matured over the past two decades to the point that most critical components are well instrumented.  The data is there and mainstream IT organizations (not just visionary early adopters) realize that analytics deliver measurable and tangible value.   My vision and challenge is to get our platform to the point where customers can easily customize the algorithms on their own, as their needs and IT infrastructure evolve over time.  This is where platforms need to get to because of the endless variety of ways that enterprises must discover and remediate "anomalies that matter".

Stay tuned.  In an upcoming blog I will drill down on some specific industry examples of algorithms we developed as part of some large enterprise IT analytic platform solutions.

More Stories By Elizabeth A. Nichols, Ph.D

As Chief Data Scientist for Netuitive, Elizabeth A. Nichols, Ph.D. leads development of algorithms, models, and analytics. This includes both enriching the company’s current portfolio as well as developing new analytics to support current and emerging technologies and IT-dependent business services across multiple industry sectors.

Previously, Dr. Nichols co-founded PlexLogic, a provider of open analytics services for quantitative data analysis, risk modeling and data visualization. In her role as CTO and Chief Data Scientist, she developed a cloud platform for collecting, cleansing and correlating data from heterogeneous sources, computing metrics, applying algorithms and models, and visualizing results. Prior to Plexlogic, Dr. Nichols co-founded and served as CTO for ClearPoint Metrics, a security metrics software platform that was eventually sold to nCircle. Prior to ClearPoint Metrics, Dr. Nichols served in technical advisory and leadership positions at CA, Legent Corp, BladeLogic, and Digital Analysis Corp. At CA, she was VP of Research and Development and Lead Architect for agent instrumentation and analytics for CA Unicenter. After receiving a Ph.D. in Mathematics from Duke University, she began her career as an operations research analyst developing war gaming models for the US Army.

Cloud Expo Latest Stories
With the explosion of the cloud, more businesses are transitioning to a recurring revenue model to generate reliable sales, grow profits, and open new markets. This opportunity requires businesses to get to market quickly with the pricing and packaging options customers want. In addition, you will want to take advantage of the ensuing tidal wave of data to more effectively upsell, cross-sell and manage your customers. All of this is possible, but only with the right approach. At 15th Cloud Expo, Brendan O'Brien, Co-founder at Aria Systems and the inventor of cloud billing panelists, will lead a panel discussion on what it takes to launch and manage a successful recurring revenue business. The panelists will offer their insights about what each department will need to consider, from financial management to line of business and IT. The panelists will also offer examples from their success in recurring revenue with companies such as Audi, Constant Contact, Experian, Pitney-Bowes, Teleko...
Planning scalable environments isn't terribly difficult, but it does require a change of perspective. In his session at 15th Cloud Expo, Phil Jackson, Development Community Advocate for SoftLayer, will broaden your views to think on an Internet scale by dissecting a video publishing application built with The SoftLayer Platform, Message Queuing, Object Storage, and Drupal. By examining a scalable modular application build that can handle unpredictable traffic, attendees will able to grow your development arsenal and pick up a few strategies to apply to your own projects.
Come learn about what you need to consider when moving your data to the cloud. In her session at 15th Cloud Expo, Skyla Loomis, a Program Director of Cloudant Development at Cloudant, will discuss the security, performance, and operational implications of keeping your data on premise, moving it to the cloud, or taking a hybrid approach. She will use real customer examples to illustrate the tradeoffs, key decision points, and how to be successful with a cloud or hybrid cloud solution.
The cloud provides an easy onramp to building and deploying Big Data solutions. Transitioning from initial deployment to large-scale, highly performant operations may not be as easy. In his session at 15th Cloud Expo, Harold Hannon, Sr. Software Architect at SoftLayer, will discuss the benefits, weaknesses, and performance characteristics of public and bare metal cloud deployments that can help you make the right decisions.
Over the last few years the healthcare ecosystem has revolved around innovations in Electronic Health Record (HER) based systems. This evolution has helped us achieve much desired interoperability. Now the focus is shifting to other equally important aspects – scalability and performance. While applying cloud computing environments to the EHR systems, a special consideration needs to be given to the cloud enablement of Veterans Health Information Systems and Technology Architecture (VistA), i.e., the largest single medical system in the United States.
Cloud and Big Data present unique dilemmas: embracing the benefits of these new technologies while maintaining the security of your organization’s assets. When an outside party owns, controls and manages your infrastructure and computational resources, how can you be assured that sensitive data remains private and secure? How do you best protect data in mixed use cloud and big data infrastructure sets? Can you still satisfy the full range of reporting, compliance and regulatory requirements? In his session at 15th Cloud Expo, Derek Tumulak, Vice President of Product Management at Vormetric, will discuss how to address data security in cloud and Big Data environments so that your organization isn’t next week’s data breach headline.
Scott Jenson leads a project called The Physical Web within the Chrome team at Google. Project members are working to take the scalability and openness of the web and use it to talk to the exponentially exploding range of smart devices. Nearly every company today working on the IoT comes up with the same basic solution: use my server and you'll be fine. But if we really believe there will be trillions of these devices, that just can't scale. We need a system that is open a scalable and by using the URL as a basic building block, we open this up and get the same resilience that the web enjoys.
Is your organization struggling to deal with skyrocketing volumes of digital assets? The amount of data is growing exponentially and organizations are having a hard time managing this growth. In his session at 15th Cloud Expo, Amar Kapadia, Senior Director of Open Cloud Strategy at Seagate, will walk through the essential considerations when developing a cloud storage strategy. In this discussion, you will understand the challenges IT is facing, why companies need to move to cloud, and how the right cloud model can help your business economically overcome the data struggle.
If cloud computing benefits are so clear, why have so few enterprises migrated their mission-critical apps? The answer is often inertia and FUD. No one ever got fired for not moving to the cloud – not yet. In his session at 15th Cloud Expo, Michael Hoch, SVP, Cloud Advisory Service at Virtustream, will discuss the six key steps to justify and execute your MCA cloud migration.
The 16th International Cloud Expo announces that its Call for Papers is now open. 16th International Cloud Expo, to be held June 9–11, 2015, at the Javits Center in New York City brings together Cloud Computing, APM, APIs, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
Most of today’s hardware manufacturers are building servers with at least one SATA Port, but not every systems engineer utilizes them. This is considered a loss in the game of maximizing potential storage space in a fixed unit. The SATADOM Series was created by Innodisk as a high-performance, small form factor boot drive with low power consumption to be plugged into the unused SATA port on your server board as an alternative to hard drive or USB boot-up. Built for 1U systems, this powerful device is smaller than a one dollar coin, and frees up otherwise dead space on your motherboard. To meet the requirements of tomorrow’s cloud hardware, Innodisk invested internal R&D resources to develop our SATA III series of products. The SATA III SATADOM boasts 500/180MBs R/W Speeds respectively, or double R/W Speed of SATA II products.
In today's application economy, enterprise organizations realize that it's their applications that are the heart and soul of their business. If their application users have a bad experience, their revenue and reputation are at stake. In his session at 15th Cloud Expo, Anand Akela, Senior Director of Product Marketing for Application Performance Management at CA Technologies, will discuss how a user-centric Application Performance Management solution can help inspire your users with every application transaction.
SYS-CON Events announced today that Gridstore™, the leader in software-defined storage (SDS) purpose-built for Windows Servers and Hyper-V, will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Gridstore™ is the leader in software-defined storage purpose built for virtualization that is designed to accelerate applications in virtualized environments. Using its patented Server-Side Virtual Controller™ Technology (SVCT) to eliminate the I/O blender effect and accelerate applications Gridstore delivers vmOptimized™ Storage that self-optimizes to each application or VM across both virtual and physical environments. Leveraging a grid architecture, Gridstore delivers the first end-to-end storage QoS to ensure the most important App or VM performance is never compromised. The storage grid, that uses Gridstore’s performance optimized nodes or capacity optimized nodes, starts with as few a...
SYS-CON Events announced today that Cloudian, Inc., the leading provider of hybrid cloud storage solutions, has been named “Bronze Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Cloudian is a Foster City, Calif.-based software company specializing in cloud storage. Cloudian HyperStore® is an S3-compatible cloud object storage platform that enables service providers and enterprises to build reliable, affordable and scalable hybrid cloud storage solutions. Cloudian actively partners with leading cloud computing environments including Amazon Web Services, Citrix Cloud Platform, Apache CloudStack, OpenStack and the vast ecosystem of S3 compatible tools and applications. Cloudian's customers include Vodafone, Nextel, NTT, Nifty, and LunaCloud. The company has additional offices in China and Japan.
SYS-CON Events announced today that TechXtend (formerly Programmer’s Paradise), a leading value-added provider of server and storage virtualization, and r-evolution will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. TechXtend (formerly Programmer’s Paradise) is a leading value-added provider of software, systems and solutions for corporations, government organizations, and academic institutions across the United States and Canada. TechXtend is the Exclusive Reseller in the United States for r-evolution