Planning Your Next Move During an IT Postmortem Review

Can a postmortem review help foster a curiosity for innovative possibilities to make application performance better? Blue-sky thinkers may not want to deal with the myriad of details on how to manage the events being generated operationally, but could learn something from this exercise.

Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint preventing them from reoccurring? How much better off would you be if you could avoid the proverbial “Groundhog Day” with system outages?

The argument that system monitoring is just a nice to have, and not really a core requirement for operational readiness, dissipates quickly when a critical application goes down with no warning.

Starting with the Event and Incident Management processes may seem like a reactive approach when implementing an Application Performance Management (APM) solution, but is it really? If “Rome is burning”, wouldn’t the most prudent action be to extinguish the fire, then come up with a proactive approach for prevention? Managing the operational noise can calm the environment allowing you to focus on more of the strategic aspects of your business.

Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint preventing them from reoccurring? How much better off would you be if you could avoid the proverbial “Groundhog Day” with system outages?

The argument that system monitoring is just a nice to have, and not really a core requirement for operational readiness, dissipates quickly when a critical application goes down with no warning.

Starting with the Event and Incident Management processes may seem like a reactive approach when implementing an Application Performance Management (APM) solution, but is it really? If “Rome is burning”, wouldn’t the most prudent action be to extinguish the fire, then come up with a proactive approach for prevention? Managing the operational noise can calm the environment allowing you to focus on more of the strategic aspects of your business.

Reactive – Alerts that Occur at Failure

Multiple events can occur before a system failure; eventually an alert will come in notifying you that an application is down. This will come from either the users calling the Service Desk to report an issue or it will be system generated corresponding with an application failure.

Proactive – Alerts that Occur Before Failure

These alerts will most likely come from proactive monitoring to tell you there are component failures that need attention but have not yet affected overall application availability, (e.g. dual power supply failure in server).

Predictive – Alerts that Trend on a Possible Failure

These alerts are usually set up in parallel with trending reports that will help predict subtle changes in the environment, (e.g. trending on memory usage or disk utilization before running out of resources).

 Conclusion – Once you build awareness in the organization that you have a bird’s eye view of the technical landscape and have the ability to monitor the ecosystem of each application (as an ecologist), people become more meticulous when introducing new elements into the environment. They know that you are watching, taking samples, and trending on the overall health and stability that will help lead to improving the customer experience.


Larry Dragich (LinkedIn) is Director of Customer Experience Management at a large insurance company. He is actively involved with industry leaders sharing knowledge of Application Performance Management (APM) best practices, resource allocation, aLarry_Dragichnd approaches for implementation. He has been working in the APM space since 2006 where he built the Enterprise Systems Management team which is now the focal point for IT performance monitoring and capacity planning activities.  Larry is also a regular blogger on APMdigest, and a contributing editor on Wikipedia focused on defining the APM space and how it ties into the critical ITIL processes many companies are now using.

This post first appeared at LinkedIn and has been republished at Practical Performance Analyst with prior permission from the author. You can read the original post here.

Related Posts

  • Does Devops really need a friendDoes Devops really need a friend Adoption APM when Performance Matters As enterprises embrace the DevOps philosophy, and the coalescence of the Development and Operations continues, I foresee the conditions ripening to foster innovative methods of making application performance better and code deployments smoother. […]
  • The Butterfly Effect in ITThe Butterfly Effect in IT The "Butterfly Effect" theoretically describes a hurricane's formation being contingent on whether or not a distant butterfly had flapped its wings weeks before. This highlights a sensitive dependence on environmental conditions where a small change at one place (Dev Env) can result in […]
  • Driving a Simple Performance BaselineDriving a Simple Performance Baseline Adopting an Application Performance Management (APM) strategy will help you manage the quality of the customer experience. The challenge is that APM has evolved into a mosaic of monitoring tools, analytic engines, and event processors that provide many solutions to different problem […]
  • Application Aware Network Performance ManagementApplication Aware Network Performance Management Application Aware Network Performance Management (AANPM) is an emerging method of monitoring, troubleshooting, and analyzing both network and application systems. Historically, Network Management Systems have been network infrastructure focused, while Application Performance Management […]