Welcome!

SDN Journal Authors: Daniel Gordon, John Walsh, Elizabeth White, Liz McMillan, Sven Olav Lund

Related Topics: @DevOpsSummit, Microservices Expo, Containers Expo Blog, @CloudExpo, SDN Journal

@DevOpsSummit: Blog Feed Post

Measuring and Monitoring: Apps and Stacks By @LMacVittie | @DevOpsSummit #DevOps #Microservices

One of the charter responsibilities of DevOps is measuring and monitoring applications once they're in production

One of the charter responsibilities of DevOps (because it's a charter responsibility of ops) is measuring and monitoring applications once they're in production. That means both performance and availability. Which means a lot more than folks might initially think because generally speaking what you measure and monitor is a bit different depending on whether you're looking at performance or availability*.

There are four primary variables you want to monitor and measure in order to have the operational data necessary to make any adjustments necessary to maintain performance and availability:

  • Connectivity

    • This determines whether or not upstream devices (ultimately, the client) can reach the app (IP). This is the most basic of tests and tells you absolutely nothing about the application except that the underlying network is reachable. While that is important, of course, connectivity is implied by the successful execution of monitors up the stack and thus the information available from a simple connectivity test is not generally useful for performance or availability monitoring. ICMP pings can also be detrimental in that they generate traffic and activity on systems that, in hyper-scale environments, can actually negatively impact performance.
  • Capacity

    • This measure is critical to both performance and availability, and measures how close to "full" the connection capacity (TCP) of a given instance is. These variables are measured against known values usually obtained during pre-release stress / load tests that determine how many connections an app instance can maintain before becoming overwhelmed and performance degrades.
  • App Status

    • This simple but important measure determines whether the application (the HTTP stack) is actually working. This is generally accomplished by sending an HTTP request and verifying that the response includes an HTTP 200 response. Any other response is generally considered an error. Systems can be instructed to retry this test multiple times and after a designated number of failures, the app instance is flagged as out of service.
  • Availability

    • This is often ignored but is key to determining if the application is responding correctly or not. This type of monitoring requires that the monitor be able to make a request and compare the actual results against a known "good" result. These are often synthetic transactions that test the app and its database connectivity to ensure that the entire stack is working properly.

measuring

App Status and Availability can be measured either actively or passively (in band). When measured actively, a monitor initiates a request to the application and verifies its response. This is a "synthetic" transaction; a "fake" transaction used to measure performance and availability. When measured passively, a monitor spies on real transactions and verifies responses without interference. It is more difficult to measure availability based on application content verification with a passive monitor than an active one as a passive monitor is unlikely to be able to verify responses against known ones because it doesn't control what requests are being made. The benefit of a passive monitor is that it isn't consuming resources on the app instance in order to execute a test and it is measuring real performance for real users.

You'll notice that there's a clear escalation "up the stack" from IP -> TCP -> HTTP -> Application. That's not coincidental. Each layer of the stack is a critical component in the communication that occurs between a client and the application. Each one provides key information that is important to measuring both performance and availability.

The thing is that while the application may be responsible for responding to queries about its status in terms of resource utilization (CPU, memory, I/O), everything else is generally collected external to the application, from an upstream service. Most often that upstream service is going to be a proxy or load balancer, as in addition to monitoring status and performance it needs those measurements to enable decisions regarding scale and availability. It has to know how many connections an app has right now because at some point (a predetermined threshold) it is going to have to start distributing load differently. Usually to a new instance.

In a DevOps world where automation and orchestration are in play, this process can be automated or at least triggered by the recognition that a threshold has been reached. But only if the proxy is actually monitoring and measuring the variables that might trigger that process.

But to do that, you've got to monitor and measure the right things. Simply sending out a ping every five seconds tells you the core network is up, available and working but says nothing about the capacity of the app platform (the web or application server) or whether or not the application is actually responding to requests. HTTP 500, anyone?

It's not the case that you must monitor everything. As you move up the stack some things are redundant. After all, if you can open a TCP connection you can assume that the core network is available. If you can send an HTTP request and get a response, well, you get the picture.

What's important is to figure out what you need to know - connectivity, capacity, status and availability - and monitor it so you can measure it and take decisive action based on that data.

Monitoring and measuring of performance and availability should be application specific; that is, capacity of an app isn't just about the platform and what max connections are set to in the web server configuration. The combination of users, content, and processing within the application make capacity a very app-specific measurement. That means the systems that need that data must be aligned better with each application to ensure not only optimal performance and availability but efficiency of resources.

That's one of the reason traditionally "network" services like load balancing and proxies are becoming the responsibility of DevOps rather than NetOps.

* Many variables associated with availability - like system load - also directly impact performance and can thus be used as part of the performance equation.

Read the original blog entry...

More Stories By Lori MacVittie

Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.

@CloudExpo Stories
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
Enterprises are moving to the cloud faster than most of us in security expected. CIOs are going from 0 to 100 in cloud adoption and leaving security teams in the dust. Once cloud is part of an enterprise stack, it’s unclear who has responsibility for the protection of applications, services, and data. When cloud breaches occur, whether active compromise or a publicly accessible database, the blame must fall on both service providers and users. In his session at 21st Cloud Expo, Ben Johnson, C...
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventio...
Data scientists must access high-performance computing resources across a wide-area network. To achieve cloud-based HPC visualization, researchers must transfer datasets and visualization results efficiently. HPC clusters now compute GPU-accelerated visualization in the cloud cluster. To efficiently display results remotely, a high-performance, low-latency protocol transfers the display from the cluster to a remote desktop. Further, tools to easily mount remote datasets and efficiently transfer...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
"We're developing a software that is based on the cloud environment and we are providing those services to corporations and the general public," explained Seungmin Kim, CEO/CTO of SM Systems Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, provided a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to oper...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
In his session at 21st Cloud Expo, James Henry, Co-CEO/CTO of Calgary Scientific Inc., introduced you to the challenges, solutions and benefits of training AI systems to solve visual problems with an emphasis on improving AIs with continuous training in the field. He explored applications in several industries and discussed technologies that allow the deployment of advanced visualization solutions to the cloud.
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...