The Seven Cardinal Sins Of Performance Management

A few days ago, on my way to work i happened to be listening to the TED Radio hour podcast with an episode on air titled, “Seven Cardinal Sins”. The TED Radio Hour ( Full Disclosure : I am a total fan of TED talks and more so of the TED Radio Podcast) is definitely one of my favorites and possibly one of the best. It’s curated and authored by some really great people and features a heap of interesting topics that otherwise I would never have had a chance to stumble across in real life. This episode in particular dealt with the Seven Cardinal Sins of vices as it has been known from the early christian times and how these vices affect our lives in the context of today’s fast paced world. For those interested in the history of the “Seven Cardinal Sins” we would strongly suggest checking out the following Wikipedia Link.

While the TED Radio show was fascinating it got me thinking about some of the classic Cardinal Sins we’ve been committing our whole lives in the SPE (Systems Performance Engineering) world. Performance Management or PE or SPE whatever you choose to call it, is no rocket science, but a lot of common sense practices which unfortunately involve tasks that are laborious, time consuming and expensive which is possibly why most projects, programs and organization give it a skip unless there is tangible impact to the business outcomes that are intended to be delivered. More importantly, experience tells me that performance gets ignored simply because –

  • Senior management can’t seem to see value
  • It’s always so expensive to do anything that’s worthwhile

so why not just cop the flak and fix it in production (unless your business depends on it i.e.). Unfortunately all of us in the performance community have only ourselves to blame for the situation we’ve got ourselves into. In future posts I’ll talk about how we might consider taking the last two head on.

So based on the last decade and a half of work we’ve been doing in the SPE (Systems Performance Engineering) world I’ve put together a view of some of the Seven Cardinal Sins. This is by no means a comprehensive list of all the mistakes we’ve seen made but some of the most common ones. Feel free to add to this list and let us know what you think about it.

It’s 3 weeks to go live, let’s get started : I’ve put this one right at the top since it’s the most common one that I’ve heard over the last decade and a half. Projects always seem to be waking up at the last moment especially when performance is concerned. Here’s one of the most common situations, SIT (System Integrated Testing) has completed, UAT (User Acceptance Testing) has completed and it’s time to go live…but hold on we’ve got to get Performance Testing done over the next few weeks. Without Performance Testing we do not get the required tick in the box so that the Change Control Board can approve the transition of the system into BAU (production environments).

Even if you didn’t have a technical bone in your body, I reckon you would have realized that this is a perfect recipe for disaster. Unless your application has a ZERO business impact you pretty much have a business case that your application is being designed to deliver. And you’ve left out the most important bit i.e. accounting for resources, tools, environments and time that was required to engineer, test, optimize and deliver a system that would scale and help deliver the expected business outcomes.

Getting started with performance, even if it’s just performance testing a few weeks before go live isn’t going to deliver the best outcomes for your project, program and your organization.

We don’t have NFR’s, but we still need you to test and validate the scalability of the system : “Oh gawd!!! What have i got myself into”, is what is probably going through your head right as you sit in front of the customer trying to work out how you will get him the performance test report he wants in the next 3 weeks when he doesn’t even know what his Non Functional Requirements are.

No reason to fear, like a large number of Performance Engineers you spear ahead without an understanding or agreement on the fundamental requirements i.e. Non Functional Requirements and you get the performance testing done. And when you are done you take those results, hand them over to the test manager and walk out the door. This is a very common scenario and something we’ve seen happen time and time again at clients around the world. Shocking reality that plays itself out time and time again with a lot of $ spent on very expensive resources delivering very little gain.

There are many reasons this happens but one of the most common ones is that the client lacks the capability to truly articulate his/her Non Functional Requirements, the project then signs of on what’s required to be done, the performance guys are called in 3 weeks before go-live (when all hell has broken loose) and the performance engineer/tester is then expected to deliver a completely tested, optimized and tuned system without knowing what is it that you are supposed to be optimizing for. With the short time frame and mammoth task at hand you surely can’t expect the resource to come in and start gather requirements…can you?

We don’t have our workload models. Can’t you make something up based on the Functional Testing requirements instead : This is another classic one and it comes up fairly often. After all who needs Workload models to run a performance test eh? Sad, but true. Workload models as you know are the combination of business processes and transactions (in the OLTP or Online Transaction Processing) that need to be included into the Performance Test. Workload models are the essential building blocks not just for Performance Testing but also for Capacity Planning, APM implementations, Diagnostics & Tuning, etc.

Workload models unfortunately require time to put together just like anything else in the performance world. You might have to leverage logs from production, use your enterprise web analytics solution or look at the application logs to work out what is the current workload across your business systems (based on what actions your users are performing) to identify the actual workload models. The situations gets a lot more complex when you have to design workload models for a green field system where there is no existing system in production to go look at and learn from with regards to collecting user behavior metrics.

We don’t have budget for commercial tools, let’s just stick to Open Source tools instead : Who needs commercial tools when we can all just get by, by using open source tools eh? Don’t get me wrong…i use a lot of Open Source and am myself a great Open Source proponent. Infact the complete infrastructure and application stack right from the operating system, to the web servers, the database servers, the caching engines, the Content Management System, the optimization modules, etc. that have been used to deliver this content are based on a completely open source stack.

However, when it comes to delivering performance you have to go in realizing that there is a high possibility that you might need to invest in tooling and infrastructure to host that tooling that is going to cost a fair bit. Unless you’ve planned and budgeted for tools across the delivery lifecycle you are going to find it impossible to bring in the right performance tools to help get the job done efficiently. Open source tools are good when it comes to very elementary performance testing but for everything else you will commercial tools and unfortunately our market today is totally skewed towards some of the very large vendors whose tools are prohibitively expensive. Some of the SAAS vendors are trying to change that and it’s space that definitely needs to be watched.

You are the performance team, why do we need anyone else. Need you to test, tune, optimize and fix. : The “A” team huh!!! No wonder performance issues get out of hand and then the “A+” teams need to be called in to get the fires under control. It’s not that the “A” team weren’t good enough to get the job done. It was more due to the fact that the “A” team wasn’t given the required human resources, tools, infrastructure and time to get the job done. The solution always seems to be throw more resources (hardware and humans) and make the problem go away which unfortunately isn’t the case.

Most project and program managers think of Performance purely as a Performance Testing and that too as an activity that gets done a few weeks before go-live. Un-fortunately we (The Performance community) have only ourselves to blame. More on that in a future post. Delivering performance requires the project or program to engineer performance into the system and more importantly to validate and optimize performance of the system from design to build to test to go live. This isn’t a one person gig and it definitely isn’t a 3 week gig.

Look at performance as a siloed operation and you’ll end up with integrated performance issues that will surely cause you nightmares for a long time to come and in some cases even your job.

Our integrated test environments are 50% of production capacity. Can you please extrapolate results and tell me what we could expect in production : What a classic!!! This for some reason is a favorite question that many in senior management have. Infact this is something that we get asked so often that I’ve made it part of my interview questionnaire (chuckle, chuckle!!!).

We understand the situation our customers live in. Business does not want to pay for expensive test environments and wants quick turn around time for change requests including associated testing cycles. When your Performance Test infrastructure is a scaled down replica of your production environments you have to tread very carefully. There is no perfect science out there that allows you to perfectly forecast what the system (application, infrastructure, networks, etc.) performance will look in production using testing results from an environment that is a scaled down version of production.

While there is no rocket science involved in the analysis, most miss out on the “analysis” involved and make arbitrary statements which are akin to shooting themselves in the leg. So watchout folks, while you can definitely extrapolate using a combination of statistical and analytical modelling techniques there is absolutely no way you can say with certainity what bottlenecks could impact the ability of the system to deliver those numbers at 100% of the capacity in production. Simply assuming linear/exponential/logarithmic scalability for production based on what you’ve seen in performance test is a truly scary way to go!!!.

We don’t need no Proof Of Concepts. Let’s go build our greenfield solution : Organizations spend large amounts of money building large and complex systems. Devops, Agile, etc. are changing the way systems today are built and delivered. Agile development practices can really help when it comes to addressing incremental functional and non functional challenges across the system, learning from one’s mistakes and them improvising accordingly.

But unfortunately, the decisions on the stacks, components and infrastructure have already been made. Especially if it’s a large system integration project that involves a few different vendors you have decisions that have been made for reasons that have nothing to do with the expected performance of the system. This is inevitably the case on most large programs and it will un-fortunately be the case for a long time to come.

However, one of ways you might be able to mitigate some of that risk is to play your vendors against each other and run a competition of sorts during the tendering process. Smart clients get their vendors to deliver a PoC (Proof Of Concept) especially when a greenfield implementation is concerned so that design or architectural concerns can be flushed out at an initial stage. There’s no perfect recipe here but rather options that you might want to consider before locking yourself into a stack that is either inherently un-scalable or is going to require massive re-architecturing effort to get you over the line.

Conclusion : It’s sad that a lot of us as Performance Architects, Performance Engineers, Capacity Planners, APM Engineers, or Performance Testers spend most of our lives helping put out fires. We surely make a living out of it and it’s what puts the bread on the table. But it might just be time that we took notice of what’s causing these situations in the first place and lay the foundation to prevent these issues from happening again.

Trevor Warren (Linked In) focuses on building innovative solutions that have the potential to impact people’s lives in a positive manner. Trevor is inquisitive by nature, loves asking questions and sotrevor_warrenmetimes does get into trouble for doing so. He’s passionate about certain things in life and building solutions that have the ability to impact people’s lives in a positive manner is one of them. He believes that he can change the world and is doing the little he can to change it in his own little ways. When not hacking, building new products, writing content for Practical Performance Analyst, dreaming up new concepts or building castles in the air, you can catch-him bird spotting (watching planes fly above his house). You can reach trevor at –  trevor at practical performance analyst dot com. The views expressed on this web site are his own.

Related Posts

  • Overview Of Software Performance Engineering Overview Of Software Performance Engineering While IT Systems need to keep managing ever increasing workloads the basic sense of engineering for performance is missing in most implementations. Performance Engineering needs to start right from the requirements phase but the reality today is that performance is looked at either […]
  • HowTo Create A Performance BudgetHowTo Create A Performance Budget Launching HowTo's - We have just launched a new section here at Practical Performance Analyst called HowTo's. Howto's much like their counterparts (in the Open Source movement) will focus on the specifics around different SPE (Systems Performance Engineering) related tasks. There is a […]
  • So What Does Performance Mean To You ?So What Does Performance Mean To You ? Performance means different things to different people. My personal experience over the last decade and a half, working with Architects, Developers, Testers and Managers tells me that mostly Performance tends to get viewed or interpreted as a set of reactive tasks. Performance tends to […]
  • Is Poor System Performance Truly AvoidableIs Poor System Performance Truly Avoidable What do we mean by Poor System Performance? How do you define Poor System Performance? How does poor System Performance Manifest itself? Is Poor System Performance endemic or is there something that could be done proactively earlier in the development life cycle to reduce the […]
  • Paul Offord

    Great post Trevor. Sadly so true. It’s the lack of NFRs that I always find so baffling.

    • Agree. Unfortunately Paul we (in the SPE community) have to shoulder some of that blame ourselves (and that includes me), because we don’t talk about the issues/challenges as much as we should at forums where enough decision makers are able listen to us.

      Practical Performance Analyst was born out of the desire to stop blaming the rest of the world for the systemic issues I continue to face wherever I go i.e. lack of maturity in terms of managing performance across the delivery lifecycle.