Disclaimer: The notes taken below from the CMG summit are based on the interpretation and understanding of the author. Please excuse any errors that may have crept in. I have also taken the liberty to reference some of the content from the official CMG India blog where a review of the conference has been posted here. Pics from the conference can also be found here
I had a opportunity to attend the 1st Annual CMG Summit conducted in Pune, India a couple weeks back. Going by the enthusiastic participation in huge numbers and the spectrum of content covering the wide gamut of topic in Performance Engineering, the event can be termed a phenomenal success.
The event was jointly hosted by Infosys and Persistent systems and was conducted over 2 days covering Architecture, Vendor and Tutorial tracks. The schedule can be found here.
The conference was kicked off by a traditional lighting of the lamp. Dr. Rajesh Mansharmani spoke about the highlights of CMG India since its inception in Sep 2013
- CMG India’s membership is free and has a registered a phenomenal growth of 1600 registered members
- There have been 16 events in a span of 1 year, 4 newsletters and several technical articles and blogs
- CMG India has a policy of zero budget, zero incentive based approach which means only really interested and motivated Performance engineers participate in all of its activities (and it has worked well so far)
- The papers selected for the conference has gone through a rigorous blind review process.
– Keynote session on Day 1 was by Adam Grummit, Distinguished Engineer, ex-CMG President and a well known name in Performance & Capacity Planning circles. His talk centered around Green Lean and Mean approach to Capacity Management. The full presentation can be downloaded here.
Key takeaways were:
- A good analogy of capacity planning is with Satellite Navigation which requires that we know where we are and identify where we want to get to.
- Do more for less using the above approach and Capacity Management covers the whole gamut covering past performance and current system behavior through effective diagnostic monitoring, bottleneck analysis, device patterns, usage process pathology, anomaly detection etc
- 6 case studies depicting different scales on the Capacity Management maturity model
- Clear description of the objectives and deliverables for Capacity Management
- Attitude – Behavior – Culture is way above and difficult to achieve than performance, capacity planning or technology implementation
– Keynote 2 in the afternoon session had everyone’s rapt attention as N Muralidharan, Chief of Special Projects and Director of the National Stock Exchange began to describe how a change was brought about from a floor based trading system to a super fast low latency electronic trading system which can handle algorithmic trading and how the NSE can be counted as one of the best bourses in the world. Murali is a natural story teller and gave a thrilling presentation.
NSE Statistics –
- 9 out of 10 transactions in India happen on the National Stock Exchange
- Volume goes up as high as 1 Billion messages per day which includes > 3000 symbols for trading
- The market cap is almost $989 Billion (with a ‘B’)
- 6000 billion rupees (roughly $100 billion) worth of transactions every day
- 220,000 trading terminals operating at 30 micro second latency
- 400 High Volume, High Frequency trading with 100,000 algo trading
- 50,000 orders/second
Problem Statement and Transformation Approach:
Transformation from a primitive form of trading to a modern high performant and a reliable exchange required answers to several challenges.
- Should the system be built on top of existing systems or rebuilt from ground up?
- Technology decisions for Operating systems, security, messaging layers with high enqueue and dequeue rates, Db choice, programming language to use
- Any malfunction can lead to a catastrophe as large as collapsing a nation’s economy.
The approach taken to achieve this high throughput and reliability included:
- Building a system grounds up but re-using some key components at the same time.
- Adopting a divide and conquer approach to designing the system by optimizing its sub-systems
- Planning well in advance during the design phase – performance budgets for each component tier
- Decision to invest time build in implementing code instrumentation and monitoring for up to 500,000 quick calculations
- Went ahead with ANSI C as a programming language for achieving high performance and low latency
- Safeguards for efficient rollback of transactions without overheads implemented
Mr. Muralidharan’s talk also talked about some stories about process management and quick leadership decisions that were taken. His described this journey in key learnings.
- Think far ahead as possible
- Know the constraints and be clear about what the expectations are
- Spend time on design
- Deliver bad news as soon as possible
- Testing strategies are as important as coding
All in all, it was a very thrilling session and Muralidharan wove a nice story about the transformation journey.
– The keynote on Dec 13 morning was given by Dr. Anand Deshpande.
The points from his talk were
- Emphasized 6-facets of Enterprise Transformation and
- Stressed how born-digital companies internalize technologies which enable them to come up with disruptive business models.
- Highlighted API economy and stressed for the need for enterprises to come with APIs to exchange data.
- Presented a view that data will soon become the money-making instrument and Apps business wont be as lucrative going further.
- Highlighted the need for Enterprises to adopt Full Stack implementations such that all hygiene factors are appropriately dealt with in SDLC.
- Stated that advent of Cloud has significantly reduced the time and the efforts to do App development and hence TCO for the project
- Cited interesting examples regarding Analytics and appealed to develop insight-extracting ways of performing Analytics
– Last keynote of the day was presented by Prof. Jayant Haritsa.
In his talk, he presented about Query optimization plans and how to bound the worst case
- The reasons why Query optimizers choose highly sub-optimal plans for query execution?
- Broken into 2 components – Operation cost and Selectivity cost, elaborated why Selectivity cost is variable and how it can lead to order of magnitude large errors. Stated that operation cost can go off roughly by only about 30%
- Explained technique of bounding the worst case plan selectivity – first in 2-dimensions and then in 3-dimensions
There were too many good sessions happening in parallel and I was not able to attend all of them. Mentioned below are highlights from these sessions.
– Optimal Design Principles for Better Performance of Next Generation Systems
- Design is the key in Software Engineering Methodology and can be disastrous if design in not well thought out.
The key areas where design and architecture focus is needed are:
- Programming language
- Language constructs, ecosystem community and problem statement define choice of programming language
- Selection of Database
- NoSQL dbs will not solve all the problems
- RDBMS for systems having relation among data and involve frequent reads
- Replication Strategies
- Choice of going with synchronous or asynchronous updates
- Store only static data in the cache
- Implement proper cache eviction policies
- Garbage Collection
- Case study where performance dropped because of bad GC configuration
- Session Management
- Proper closing of sessions
- Case study where optimization lead to a huge performance improvement
- Client Side computing Vs Server side computing
- Retry mechanisms used to enhance user experience
- Choice of Design patterns – Usage of a Design Pattern Wizard tool
- Parallelism leads to high throughput and better performance
- Choice of Data structures
- Async transactions – helps reuse expensive resources effectively
- Avoid lot of hops
– Architecture & Design for Performance for a Large European Bank
Problem statement was to ensure performance of a system with huge volumes and complex ecosystem with huge data volumes of two million txs/day and end of day completion in under 2 hours
Key design considerations include tuning
- Web page response
- Gracefully stopping long running SQLs by configuring CPU_PER_CALL threshold. Configure this for separate oracle user and exercise control on when it needs to be invoked
- Network latency by avoiding HTTP 304 calls from the browser. This is a quick change which results in big gains. Also webserver can be configured to reload static content during s/w upgrade
- Control on search performance by identifying most used search combinations and tune them using indexes. Set filter criteria for blank search and restrict size of result set using ROWNUM.
- Straight Through Processing was achieved through a scalable design comprising distributed architecture where there is a file processor to sift and categorize incoming non-homogenous files into chunks of homogenous files based on size.
- Batch performance is achieved by having allocation of transaction to threads where distinct accounts are processed by separate threads to avoid contention.
- Also grouping the jobs logically into clusters and running them in parallel to squeeze the elapsed time
- Appropriate collection objects like concurrent hash map reduces the possibilities of transactions entering into deadlock situations
- Apply sequence caching which leads to improvement in performance
- Performance considerations have to be taken into account during design itself.
– Designing for Performance Management in Mission Critical Software Systems
- Poor performance management has a very negative impact having a loss in brand value, productivity and customer complaints.
- Performance management does not end with implementing a performance monitoring tool in place but encompasses several topics like
- Instrumentation – include in all tiers for quick isolation of performance problems. Should include resp times, thread, correlating identifier, user identifier, component name and status
- System Performance archive – Important to get a historical trend and recognizing patterns. Also helps in matching new anomaly with past pattern
- Controlled monitoring – needed for simulating end user experience and to confirm the anomaly
- Simulation environment – ability to reproduce anomaly and come up with a fix after causal analysis
- Integrated operations console – unified control view of all the components. Mechanism to resolve anomalies for which resolution processes are known
- The above practices should become integral part of design and construction of a mission critical software system
– Incremental Risk calculation: A Case study of performance optimization on multi core
Incremental risk charge is a financial term where additional money that banks have to put aside to do trading (additional capital to be maintained in a trading book along with Value at Risk). This was needed after 2008 Financial meltdown and introduced to cover credit migration and default risk.
Estimating credit loss depends on several factors eg: loss events like market crash, terror attacks and their impact. The IRC computational flow is calculated as the aggregated effect of all events along with their severity values.
This calculation is done through Monte Carlo simulations and the distribution function is calculated by curve fitting on simulation results. The aggregate is done by Panjer recursions or FFT
For these calculations processing is offloaded to a grid of 50 work stations. The compute time takes more than 45 min.
This is a HPC problem with FFT computations for 133 scenarios. Each scenario consisting of 160,000 arrays. Each data array with doubles (37268 elements). Each scenario has 37 GB data.
Experimental setup is on 3 different multi-core architectures: Nvidia platform, Intel Xeon platform, Intel MIC
IRC on Nvidia platform
- FFT plan was created using the cuFFT library
- Naïve implementation took 67 min of which data transfer takes 61 min.
- The system was tuned on the following parameters:
- Memory type destination was changed to Pinned from Pageable. This resulted in improving to 2x performance and time reduced to 30 minutes. This is because Host memory is pinned and thus avoids a 2stage copy.
- Computing mode was changed from default to multiple stream. This is because the multiple streaming mode enables the GPUs to hide latencies by overlapping communication and compute. 2.7x improvement in performance
- Hybrid computing enables 30% performance boost as this removes host idling when GPU does computation. Thus splitting computation between CPUs and GPUs resulted in performance improvement
IRC on Intel Xeon and MIC
- Intel MKL provides APIs for FFTs
- Baseline implementation took 120 min
- System was tuned on following parameters:
- Thread binding – Limits migration of thread from one core to another . Leads to 5% improvement in performance
- Memory alignment – Memory address for input and output data is aligned to 64 byte. Resulted in 7% improvement in performance.
- Reusing DFTI structures resulted in 3.6x improvement in performance
- Hybrid computing – splitting computations on host and coprocessors. 35% improvement in performance.
- Extrapolating to 2 MIC reduced time to 8.64 min
Using libraries is not enough – analyze and optimize
– Automatically determining Load test duration using confidence intervals
A very common problem is how long should the load test be run until the avg response time converges.
- There have been several guesstimates on this but no scientific method as to how long a regular performance test should run.
- Surprisingly none of the Performance test tools have any recommendations. The tester has to manually specify the test duration
- As a result of this test duration is too short or too long resulting in either incorrect performance metrics or impacting schedules
- It would be best if performance test can decide for itself when it has converged.
- It would be best if the test could determine its run duration
- As run duration increases you expect to converge to a given value of mean response time
- Solution is to keep increasing n until R(n) -> E[R]
- We need a level of confidence that our estimate of mean response time R(n) is in the neighborhood of true mean E[R]
- Start test for max duration – when steady state reached, reset all measurement counters – if test converges (to desired level of confidence) and min duration has elapsed to stead state then Stop test before max duration – output test results
Central Limit theorem in statistics states that given certain conditions the arithmetic mean of sufficiently large number of iterates of independent random variables will be approximately normally distributed.
But the problem is that successive response time samples from a test run are not necessarily independent. To solve this using Batch means would reduce the correlation.
Thus it is possible to say with 99% confidence that Yavg is in the interval (mu + 2.576 sigma). Another roadblock with this approach is that the mean and standard deviation are not known to start with. To overcome this, we can use student t-distribution which uses estimated or computed values of mean and std.
According to student t-distribution, tables are available for computed estimates of true mean and std. Given this, we can go back to the suggested approach as described before.
If xk is the throughput after k minutes of run duration, a heuristic to know if steady state is reached is if we know xk is within 90% of xk-1.
Empirically 99% confidence interval is within 15% of estimated average response time works well.
With this a running mean and std can be calculated and the test can be stopped when interval value is within 15% of mean with a 99% of confidence interval. This was validated on a few real world applications.
Summary – There were lot of other interesting presentations, which can be accessed here The CMG India conference provided a good forum for networking and getting to know interesting happenings in the field of performance engineering. The second annual conference was declared to be held in Bangalore in 2015.
Jayanth Ganapathiraju (LinkedIn) is a Performance Engineer at a large Healthcare organization with over 10 years of experience working as a senior software developer and a performance lead. As part of current role, he works with several teams within the company in the organization on coming up with performance test strategies and mentor teams on the need for continuous performance testing. He has an appetite for exploring new technologies and processes and strongly believes in life long learning. When not in front of his laptop, you can find him reading a non-technical book in a quiet corner or in deep soliloquy.