SDN Journal Authors: Pat Romanski, Elizabeth White, Yeshim Deniz, Destiny Bertucci, Liz McMillan

Related Topics: @DevOpsSummit, Microservices Expo, Containers Expo Blog, @CloudExpo, SDN Journal

@DevOpsSummit: Blog Feed Post

Measuring and Monitoring: Apps and Stacks By @LMacVittie | @DevOpsSummit #DevOps #Microservices

One of the charter responsibilities of DevOps is measuring and monitoring applications once they're in production

One of the charter responsibilities of DevOps (because it's a charter responsibility of ops) is measuring and monitoring applications once they're in production. That means both performance and availability. Which means a lot more than folks might initially think because generally speaking what you measure and monitor is a bit different depending on whether you're looking at performance or availability*.

There are four primary variables you want to monitor and measure in order to have the operational data necessary to make any adjustments necessary to maintain performance and availability:

  • Connectivity

    • This determines whether or not upstream devices (ultimately, the client) can reach the app (IP). This is the most basic of tests and tells you absolutely nothing about the application except that the underlying network is reachable. While that is important, of course, connectivity is implied by the successful execution of monitors up the stack and thus the information available from a simple connectivity test is not generally useful for performance or availability monitoring. ICMP pings can also be detrimental in that they generate traffic and activity on systems that, in hyper-scale environments, can actually negatively impact performance.
  • Capacity

    • This measure is critical to both performance and availability, and measures how close to "full" the connection capacity (TCP) of a given instance is. These variables are measured against known values usually obtained during pre-release stress / load tests that determine how many connections an app instance can maintain before becoming overwhelmed and performance degrades.
  • App Status

    • This simple but important measure determines whether the application (the HTTP stack) is actually working. This is generally accomplished by sending an HTTP request and verifying that the response includes an HTTP 200 response. Any other response is generally considered an error. Systems can be instructed to retry this test multiple times and after a designated number of failures, the app instance is flagged as out of service.
  • Availability

    • This is often ignored but is key to determining if the application is responding correctly or not. This type of monitoring requires that the monitor be able to make a request and compare the actual results against a known "good" result. These are often synthetic transactions that test the app and its database connectivity to ensure that the entire stack is working properly.


App Status and Availability can be measured either actively or passively (in band). When measured actively, a monitor initiates a request to the application and verifies its response. This is a "synthetic" transaction; a "fake" transaction used to measure performance and availability. When measured passively, a monitor spies on real transactions and verifies responses without interference. It is more difficult to measure availability based on application content verification with a passive monitor than an active one as a passive monitor is unlikely to be able to verify responses against known ones because it doesn't control what requests are being made. The benefit of a passive monitor is that it isn't consuming resources on the app instance in order to execute a test and it is measuring real performance for real users.

You'll notice that there's a clear escalation "up the stack" from IP -> TCP -> HTTP -> Application. That's not coincidental. Each layer of the stack is a critical component in the communication that occurs between a client and the application. Each one provides key information that is important to measuring both performance and availability.

The thing is that while the application may be responsible for responding to queries about its status in terms of resource utilization (CPU, memory, I/O), everything else is generally collected external to the application, from an upstream service. Most often that upstream service is going to be a proxy or load balancer, as in addition to monitoring status and performance it needs those measurements to enable decisions regarding scale and availability. It has to know how many connections an app has right now because at some point (a predetermined threshold) it is going to have to start distributing load differently. Usually to a new instance.

In a DevOps world where automation and orchestration are in play, this process can be automated or at least triggered by the recognition that a threshold has been reached. But only if the proxy is actually monitoring and measuring the variables that might trigger that process.

But to do that, you've got to monitor and measure the right things. Simply sending out a ping every five seconds tells you the core network is up, available and working but says nothing about the capacity of the app platform (the web or application server) or whether or not the application is actually responding to requests. HTTP 500, anyone?

It's not the case that you must monitor everything. As you move up the stack some things are redundant. After all, if you can open a TCP connection you can assume that the core network is available. If you can send an HTTP request and get a response, well, you get the picture.

What's important is to figure out what you need to know - connectivity, capacity, status and availability - and monitor it so you can measure it and take decisive action based on that data.

Monitoring and measuring of performance and availability should be application specific; that is, capacity of an app isn't just about the platform and what max connections are set to in the web server configuration. The combination of users, content, and processing within the application make capacity a very app-specific measurement. That means the systems that need that data must be aligned better with each application to ensure not only optimal performance and availability but efficiency of resources.

That's one of the reason traditionally "network" services like load balancing and proxies are becoming the responsibility of DevOps rather than NetOps.

* Many variables associated with availability - like system load - also directly impact performance and can thus be used as part of the performance equation.

Read the original blog entry...

More Stories By Lori MacVittie

Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.

@CloudExpo Stories
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
DXWorldEXPO LLC announced today that Kevin Jackson joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Kevin L. Jackson is a globally recognized cloud computing expert and Founder/Author of the award winning "Cloud Musings" blog. Mr. Jackson has also been recognized as a "Top 100 Cybersecurity Influencer and Brand" by Onalytica (2015), a Huffington Post "Top 100 Cloud Computing Experts on Twitter" (2013) and a "Top 50 C...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve fu...
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Daniel Jones is CTO of EngineerBetter, helping enterprises deliver value faster. Previously he was an IT consultant, indie video games developer, head of web development in the finance sector, and an award-winning martial artist. Continuous Delivery makes it possible to exploit findings of cognitive psychology and neuroscience to increase the productivity and happiness of our teams.
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Predicting the future has never been more challenging - not because of the lack of data but because of the flood of ungoverned and risk laden information. Microsoft states that 2.5 exabytes of data are created every day. Expectations and reliance on data are being pushed to the limits, as demands around hybrid options continue to grow.
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Evan Kirstel is an internationally recognized thought leader and social media influencer in IoT (#1 in 2017), Cloud, Data Security (2016), Health Tech (#9 in 2017), Digital Health (#6 in 2016), B2B Marketing (#5 in 2015), AI, Smart Home, Digital (2017), IIoT (#1 in 2017) and Telecom/Wireless/5G. His connections are a "Who's Who" in these technologies, He is in the top 10 most mentioned/re-tweeted by CMOs and CIOs (2016) and have been recently named 5th most influential B2B marketeer in the US. H...
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
@DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises - and delivering real results.
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
DXWorldEXPO LLC announced today that Dez Blanchfield joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Dez is a strategic leader in business and digital transformation with 25 years of experience in the IT and telecommunications industries developing strategies and implementing business initiatives. He has a breadth of expertise spanning technologies such as cloud computing, big data and analytics, cognitive computing, m...
"We started a Master of Science in business analytics - that's the hot topic. We serve the business community around San Francisco so we educate the working professionals and this is where they all want to be," explained Judy Lee, Associate Professor and Department Chair at Golden Gate University, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.