Thursday, May 29, 2014

It's All About CPU But There Is NO CPU Bottleneck! What?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

It's All About CPU But There Is NO CPU Bottleneck...


Diagnosing Oracle Database performance is like solving a puzzle. But what I really enjoy is coming up with performance solutions that are anti-intuitive for most people. You know, the ones when you can see people stop talking and actually think, "Why would Craig say that!?" In this posting, I delve into one of these situations.

The Situation


Just over a month ago I received an email from a former Australian Oracle Performance Firefighting student about their "CPU bottleneck." Based on an one-hour AWR report's Time Model Statistics section (which is based on v$sys_time_model view), the  DB CPU was 95% of DB Time, leaving 5% for Oracle wait time. The top CPU consuming SQL statements were easily identified and work had begun with tuning them. By "tuning" I mean a small group of DBAs were working to reduce their CPU consumption. But something didn't feel right to him, so he emailed me about it.

My first question was, "Is there an OS CPU bottleneck?" That is, is the host (or virtual machine if you like) out of CPU resources? I was wondering if the average CPU utilization was 80% or higher. Why was I asking this? Read on...

DB Time Math


Over an interval of time, DB Time is all the Oracle server process CPU time (i.e., DB CPU) plus all the non-idle wait time.

Note: Oracle does not include background process CPU in DB Time. Why they don't include background process CPU time is an entirely different subject. And of course, we are trusting Oracle that what they consider "non-idle" wait time should actually be "non-idle" wait time. Again, that's an entirely different subject.

My former student knew the DB Time math and therefore understood that since DB CPU was pretty close to DB Time, there was little non-idle wait time. It was "all about CPU." While I still wanted to know the wait event situation because it provides performance hindering clues and forces the DBA to review their work and think a little more.

OS CPU Utilization Math


In my Utilization On Steroids online video seminar, I demonstrate utilization math by shrinking myself down to the size of a water glass and also the size of a printer cartridge. (Like the motto says, "What Online Training Should Be!") In a nutshell, utilization can be viewed/understood as requirements divided by capacity. If a 500ml glass contains 250ml of water, the glass is 50% utilized that is, 50% full.

As I present in my online seminar, there are two simple methods to figure OS CPU utilization using only the v$osstat view. One of these methods as based on the BUSY_TIME and the IDLE_TIME statistics from the Operating System Statistics section of any AWR and Statspack report. The BUSY_TIME is the instance CPU consumption, that is, the CPU requirements. As I carefully explain in my seminar and in my classes, the capacity is actually equal to the BUSY_TIME plus the IDLE_TIME. That is:

Utilization = Requirements / Capacity

Utilization =  BUSY_TIME / ( BUSY_TIME + IDLE_TIME )

This is true unless the OS is updating OS structures incorrectly (e.g., /proc) or Oracle is not gathering the data from the OS correctly or frequently enough. And yes, I talk about how to check this in my seminar.

My former student did the utilization math. The average utilization was around 65%. Since this is the only Oracle instance doing any real work on the box, is was not a surprise to him that vmstat also indicated the OS CPU utilization was 65%.

Why I Asked About A CPU Bottleneck?


Suppose there is plenty of available CPU resources AND the SQL users are complaining about are doing nothing but buffer gets (i.e., logical reads). In other words, from a resource consumption perspective, it's all about CPU consumption. Is this possible? The answer is, YES!

If there was a single server process running this SQL statement, parallel query operations were not occurring and there are four CPU cores, what would the CPU utilization be? Probably around 25% (plus overhead between 5% to 15%). That is because on average only one CPU core was processing the SQL statement!

If this SQL statement took "too long to run" adding more CPU power (specifically more cores) would not help.

As strange at it may seem, one performance improving solution is to use the available CPU! Read on.

Use More CPU! Now!


Suppose the average CPU utilization of a 10 CORE box (or VM if you prefer) was running at an average of 50% utilization. This means that on average 5 of the 10 cores are busy. Now suppose a SQL statement takes 4 minutes to process without parallel query or very-cool (but perhaps costly) application design. How long would the SQL statement take to run if it ran in four parallel streams?

It depends. We need to ensure there is available CPU power. Assuming each parallel stream consumes one CPU core, the box has 10 cores with 5 being available (remember the average utilization is 50%), that leaves with us 5 "extra" cores. For our SQL statement to run in 4 parallel streams, we need three more CPU cores... and we have them!

If there is no parallelism overhead (ha! don't count on that!), the SQL statement would run in around 1 minutes, 8 CPU cores would be active (5+3), and the average CPU utilization would be around 80% (8 cores required / 10 cores of capacity).

So by creatively using available CPU resources, we were able to consume 4 minutes of CPU in a single minute! ...a beautiful example of parallelism. And of course, the users will be delighted and I may get a free pizza.

Be Anti-Intuitive


When a SQL statement, batch process or an Oracle process is constrained by CPU, yet there is plenty of CPU power available, the problem is probably a serialization issue. (Another possibility is some type of locking or blocking problem.) The solution can be found by parallelizing, thereby using the available CPU resources which will reduce the wall/clock/run time of the process.

If the above paragraph makes sense, then you understand what I'm talking about. If it does not make sense, stop, breath, and re-read the previous paragraph or paragraphs. It will be worth your time.

This next paragraph is really important to understand:

It is important to understand we have NOT reduced the total SQL statement's CPU requirements. We have NOT increased the total SQL statement's CPU requirements. We have NOT tuned the SQL statement in the traditional sense by reducing the buffer gets (logical reads, logical IOs). What we have done is simply altered the situation by consuming the necessary CPU requirements within a shorter duration. (In our example, the four minutes of CPU time was consumed within a single minute.) We did this taking a serial process and parallelizing it to use more of the available CPU resources.

Expand Our View Into The OS


Without understanding the OS situation we can easily misdiagnose the problem. And we can miss creative and powerful solutions.

Most DBAs immediately jump to tuning the SQL and tuning Oracle. While those are both valid technical solutions, there are others... like parallelizing to reduce run time while consuming more resources (in a shorter period of time).

Some DBAs but more likely OS Administrator but for sure hardware vendors may insist on getting more CPU power. While faster CPU will help somewhat, adding more of the same CPUs cores will likely do absolutely nothing...unless we can increase parallelism to use them.

Being able to determine the OS bottleneck is so important, I devote an entire chapter in my Oracle Performance Firefighting book on the subject. When I was learning to optimize Oracle systems, learning how to find the OS bottleneck significantly increased my value in the Oracle DBA market. Later when I ruminated on serialization and parallelization, I turned situations upside down by placing a larger (yet shorter-term) load on the system as a valid performance improving solution.

Summary


If performance is not what it needs to be AND there are available OS resources, consider using them to improve performance. It's a creative and rarely used solution.

Thanks for reading!

Craig.