In this series of articles Nikita Salnikov-Tarnovski (LinkedIn), the co-founder of Plumbr discusses the performance and stability issues in Enterprise Java applications due to memory leaks. Nikita goes on to talk about various approaches to identifying memory leaks through the use of various tools and home grown techniques. Nikita is one of the founders a Plumbr, the memory leak detection product. Besides his daily technical tasks as a lead developer Nikita is also an active blogger, JavaOne RockStar and frequent speaker at several international conferences.
Just in case you have just dropped in and missed the first part in this series –
- You can read the first part in this series by Clicking Here.
So You Are an Architect: So you thought your job as a lead architect was writing and reviewing code not fixing performance issues in production eh!!. Unfortunately your cries for more mature unit, system and integrated performance testing practices have fallen on deaf ears. You’ve tried hard to get the organization to invest in a proactive Performance Management by bringing in the right capability, tools and people but haven’t made much progress. So here you are again, trying to figure out what has gone wrong in production, working your way through the various logs, application terminals, etc. to understand why has the application fallen over for the nth time.
So let’s take a step back and figure out what could be done differently to avoid getting yourself into the situation again and how potentially could you identify performance issues pro-actively using the right tools and techniques. After all, isn’t your operations full of competent people, who know what they are doing.
Facing performance issues within your Java application? Give Plumbr a try – the application-aware JVM monitoring tool will pinpoint the problems with memory leaks and sub-optimal garbage collection configuration.
Taking it all in: Identifying performance issues requires one to take a step back and understand the following:
- Symptoms of the performance issue
- The environment or system configuration the application is hosted on
- Relevant application performance metrics with at-least a few hours if not a few days or weeks worth of data
- Relevant system performance metrics with at-least a few hours if not a few days or weeks worth of data
- Understanding of the Architecture of the solution
- Better still, an understanding of the system configuration and the underlying code
Before you dive in to investigate a performance issue it pays to have the above information, to the extent practically feasible.
In the past you might have used the following techniques to identify and address the performance issues:
- Randomly modifying various memory parameters, such as -Xmx, -XX:MaxPermSize and -XX:NewRatio=2;
- Increase logging of all relevant metrics and application components that generates gigabytes of logs by -verbose:gc -XX:+PrintGCTimeStamp -XX:+PrintGCDetails and so forth;
- Throwing in Gigabytes worth of physical memory to the servers;
…including various other weird things that go beyond imagination.
Throwing Hardware Isn’t The Solution – Here’s something absolutely strange that’s happened to me on more than one occasion. More than once I’ve met seasoned professionals who believe that throwing more hardware at a performance issue is the best way of addressing the challenge at hand. These individuals who i know first hand are experienced folks from operations (Don’t get us wrong, we love operations guys at Plumbr more than anyone else!!!) folks who decided to thrown more RAM (Physical Memory) at machines and bump up to the RAM on the machine when they were faced with OutOfMemoryErrors in Java programs across their production environment. Unfortunately none of these smart guys had heard or understood much about how Java virtual machines actually handle memory.
These operations folk like a lot of customers we’ve met through our professional careers are and continue to run Java Applications in production with default our of the box configuration and in most cases setting the heap size to the maximum amount of RAM available on the physical or virtual machine. It’s mighty tough to get a lot of our your applications in production if you have little understanding of how the JVM runs.
Driving Blind With Configuration Changes: Another type of operations resource we have met a few times in the past is a tech head who just loves playing around with the Java Virtual Machine configuration. One specific tech head we knew was great at pulling together Java Startup Configuration for the JVM’s in production but unfortunately with very little understanding of what the parameters did or how the JVM parameters affected the actual system performance. And as a result, post this configuration frenzy the application lasted six hours before it needed a JVM restart. Soon, his performance tweaks brought down the lifetime expectancy of the application down to just 50 minutes.
Just to make things clear, this isn’t a mud slinging tournament. We are not in the business of accusing operations folks or developers or anyone out there who might have matched the above profile. After all operations does usually have a lot on it’s plate, are generally overworked and are tasked with more work that it can ideally manage. To make things worse no to applications are same. Bespoke applications today are created using all kinds of unimaginable languages, libraries, 3rd part cloud based components, etc.
Performance Is Tough – A lot of the operations resources we’ve met lack the specific knowledge in Java including the relevant tools and processes to identify and address the relevant performance issues. So lets be honest – without a strong understanding of what’s underneath the hood in terms of code, an understanding of the various java parameters, including an understanding of the relevant tools and techniques to identify/address performance issues there is little chance that any of these guys will succeed. But they still give it their best shot because it’s their job and it’s completely up to them to get the issue resolved so that business can go on as usual.
Unfortunately, addressing the performance issues could take days or even months before the issue is addressed. By then the issue might also be escalated past operations to senior leadership in IT and worse even upto the guys up in Business. During this time operations has to manage expectations of all the angry business users who have suffered due to the poorly performance application.
What’s Next: In the next few posts in this series we will go into different tools and techniques you have at your disposal to identify performances issues and address them before they turn into major show stoppers bringing down your application in production.
Nikita Salnikov-Tarnovski (LinkedIn) is the co-founder of Plumbr, the memory leak detection product, where he now contributes as a core developer. Besides his daily technical tasks Nikita is an active blogger, JavaOne RockStar and frequent conference speaker at several conferences around the world including – Devoxx, JavaOne Russia, 33rd Degree, TopConf, JavaDay, GeekOut, Joker, Jazoon etc. Prior to founding Plumbr, Nikita was a Java EE developer and performance consultant in the Baltics, and has worked on building numerous different Java EE Enterprise applications over the years. In the last four years he specialized in troubleshooting and performance optimization of Enterprise Java Applications.