Monitoring application performance on the surface and the currents below is a great way to build a performance baseline and provide application fluency. Ironically, the deep dive tool-sets in place today still may not provide all the insight you need to quickly resolve anomalous behaviour.
Standing back on the shore waiting for an event to go by may not be the best approach for proactive monitoring. Synthetic monitoring (active monitoring) is needed to help reduce the blind spots for critical business applications.
For example, we just experienced a production issue on a fully instrumented critical business application that first appeared nebulous. During peak volume time the Service Desk was taking calls from users across random locations stating that they couldn’t login, however if they were already on the system all was well.
Even when the users logged out they could still login again and continue working. Other facts that came in made the issue more perplexing:
- RUM showed transaction volume and performance was normal.
- Deep dive Java monitoring agents showed the same.
- There were no glaring HTTP 500 errors and the backend database was fine.
- Infrastructure monitoring was green in all tiers and resource consumption was within baseline.
What did we use to find the issue then? It was our synthetic monitoring tool that popped an alert on two externally facing applications.
Root Cause? Our Internet provider’s DNS resolution was not working properly. So any machine that needed name resolution that wasn’t already cached for the day, couldn’t get a login page.
Conclusion – If an event occurs and no one sees it, believes it, or takes action on it, APM’s value can be severely diminished and you run the risk of owning “shelfware.” Our experience has shown that active monitoring (synthetics) are a good complement when used with passive monitoring to help provide visibility on application availability especially when monitoring outside the Data Center.
Larry Dragich (LinkedIn) Director of Customer Experience Management at a large insurance company. He is actively involved with industry leaders sharing knowledge of Application Performance Management (APM) best practices, resource allocation, and approaches for implementation. He has been working in the APM space since 2006 where he built the Enterprise Systems Management team which is now the focal point for IT performance monitoring and capacity planning activities. Larry is also a regular blogger on APMdigest, and a contributing editor on Wikipedia focused on defining the APM space and how it ties into the critical ITIL processes many companies are now using.
This article was first published at LinkedIn and has been re-published at Practical Performance Analyst with prior permission.