Tuesday, May 28, 2013

Hmm... Users Experience the Average SQL Elapsed Time

You're joking. Right?


Oracle Database performance tuning and analysis has come a long way in the last 20 years. First there was the “just add more resources” approach and tuning the blatantly poor SQL. Then there was ratio analysis, followed by wait event analysis, time based analysis, and unit of work time based analysis. In addition to the performance diagnosis and analysis evolution, the Oracle Database as a product has changed, and architectures are more diverse. Yet with all this change, some things are in many ways timeless. They relate to complexity, basic mathematical statistics, efficiency, and doctrinal purity. Over the next few weeks, I'll post four different "myths." Last week I posted about increased complexity. This posting centers on a user's experience.

Myth #2

The second myth is Users experience the average SQL elapsed time. Concurrency, multiple execution plans, different bind variables, and the current cache situation make the average elapsed time less relevant—and perhaps even misleading. Let me explain.

If I tell someone the average elapsed time for their key SQL statement is 10 seconds, 99.9% of the time they will picture in their minds a bell curve. They will think to themselves, “OK. This means that usually the statement runs for about 10 seconds, sometimes less and sometimes more.” Unless verified, my research clearly shows that there is likely to be a significant difference between the average and the typical elapsed time(s). This means we are missetting expectations and flat-out misleading users. Not the place we want to be!

It is very simple to calculate the average SQL statement elapsed time. Even a Statspack report will show you the total elapsed time and the total number of executions over the report interval. The average elapsed time is simply the total elapsed time divided by the total number of executions. So it’s very enticing to make a quick statement about the elapsed time using the average.

Now suppose this SQL statement has two execution plans: one typically completes in 2 seconds and the other completes in 18 seconds. Now also assume that they both get executed the same number of times (don’t count on this in your environment!). The average elapsed time would then be 10 seconds. Now I picture myself telling my client that I had discovered the key SQL statement and its average elapsed time is 10 seconds. The “2 seconds” group would think I’m making the situation look worse than it is so I can claim an amazing performance improvement. The “12 seconds” group would think I’m simply an idiot. Either way, I lose. So I needed a way to find out the truth.

What I learned through my research is that if the statement is important to the users and I care about not misleading them, I need to collect some data, calculate the statistics, and create a histogram. I even blogged about the problem here and and provided solutions here! My research clearly showed three reliable ways to collect elapsed times: Oracle tracing by SQL_ID, instrumenting application code, and sampling running SQL. I have created a low overhead tool called “SQL Sampler” to sample SQL statement elapsed times. You can download it for free from my website here. The moral of this story is Do not expect users to experience average elapsed times, and if appropriate, provide details about the typical elapsed times.

Thanks for reading!

Craig.

If you enjoy my blog, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting,  Advanced Oracle Performance Analysis, and my One-Day Oracle Performance Research Seminar. I teach these classes around the world multiple times each year. For the latest schedule, go to www.orapub.com . I also offer on-site training and consulting services.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently orapub.general@comcast .net.



1 comment:

  1. I was searching on the phrase "tuning one sql statement versus the user experience" and happened upon this article. What I am dealing with is a customer moving from IMS to Oracle, actually having an SLA for a query running in 29ms. On Oracle it runs in about 90ms, which is the best they will get without trying to totally mimic the denormalized IMS construct. It's a bank transaction customer lookup application BTW. I believe they are way too hung up on this 60ms difference, since the user experience will not detect it. Plus, they are testing this thing in isolation, and not taking into consideration how it will perform in production on their full rack Exadata machine. Can you offer any comments on this? How do I get them to step back from the micro view to the macro view?

    ReplyDelete