Thursday, August 30, 2012

Variable Load Impact on Hard Parsing

Over 1100 Times Slower is... Bad!

In my previous posting I experimentally demonstrated hard parsing was not just bad, but really bad. In fact, my non-load test showed the initial parse (hard parse) was 180 times longer than the second parse (soft parse)! Also, once session cache cursors kicked in after the third parse, parse time dropped over 1100 times when compared to the initial hard parse! While that's useful (and interesting), rarely is parsing an issue in a no-load situation, that is, single session situation. That brings us to this posting...

This posting is focused on parse time when there is an intense load that completely consumes all the available CPU combined with massive library cache mutex related contention.

I should also note that the load placed on the system is not from the SQL statement under observation but from other SQL statements (detailed below).

Review Parsing at Various Levels of Intensity

From a performance perspective, I categorize parsing as follows:

No parse. There is actually a "no parse." I have not actually tested this, so I'll say nothing more about it. (I feel another blog posting...)

Softer parse. Question: Does the statement exist in my session cache cursor memory? If so, where is the cursor in the library cache? (i.e., What's the address?) This does not require a LC latch or mutex because the server process's private memory is being accessed, not the shared LC searching (i.e., hashing) structure.  If not, then continue to Soft Parse.

Soft parse. Question: Does the statement already exist in the library cache? If so, where is it? This requires LC latch and/or mutex access. If not, continue to Hard parse.

Hard parse. No questions, just get to it! But there are more steps involved. Allocate shared pool memory (requires shared pool sub-pool latch(es), create the cursor parent/child (requires mutex and/or LC latch), bring needed information into the shared pool (think: recursive SQL) (requires mutex and/or LC latches), and create all the associated memory links (requires mutex and/or LC latches). Wow, that's a lot of stuff and massive serialization control.

As one would expect, my previous posting documented that my experiments showed hard parsing elapsed time took 180 times longer than soft parsing and over 1100 times longer than softer parsing. But again, that was in a no-load situation in a pretty serial-like situation. This experiment and posting interjects competition for serialization control structures and for CPU resources.

Experimental Setup

In summary, I gathered the elapsed parse time when a statement is executed; once, twice, ... seven times. However, I also injected an increasing load so we could see the impact of a load on the initial hard parse (execution one) and the soft-type parses.

The load I placed on the system was very simple. A number of sessions ran the below script:
def sleepT=&1
declare
  i number;
  cursor cur_one is
    select count(*) from dba_objects where 1=0;
begin
  loop
    for i in 0..9999
    loop
      open cur_one;
      close cur_one;
    end loop;
    sys.dbms_lock.sleep(exp(&sleepT*dbms_random.normal));
  end loop;
end;
/
The strange looking sleep time determination above creates a much more realistic sleep/think time. I detail this in my May 2012 posting entitled, Creating Realistic Think Times for Research Projects.

Opening the cursor forces a parse to occur requiring serialization control via mutexes (cursor: pin S). Just to double check, I SQL traced a very simple open and close, clearly revealing there was indeed parse time related to the open and close.

As I increased the number of sessions concurrently running the above script, CPU consumption increased, CPU utilization increased, and finally mutex related wait time began to appear. Once the experimental load reached a load level of 10, the timing situation looked like this:


The above script rtpctx.sql comes from my free OSM toolkit. The "% RT" column is the percentage of time related to the total CPU and non-idle wait time over the interval (38 seconds in this situation). The "% WT" is the percentage of time related to all the non-idle wait time over the interval. You can see the wait event "cursor: pin S" was 99.36% of all the non-idle wait time and 63% of all the CPU plus wait time. And as you might expect, the 6 CPU cores where averaging 100% busy!

The experiment was run at 10 different load levels, signified as level 3 to level 13. (At load level 1 and 2, nothing much was happening.) The load level is a relative term. Each load unit is related to 7 concurrent sessions each with a different sleep time (see code above).

About the SQL under observations: At each load level a similar yet unique SQL statement (different objects but otherwise exactly the same SQL) was run seven times; the first is the hard parse, the second is the soft parse... At each load level and for each of the seven execution times, I took 100 samples. Not 30, but 100! I gathered 30 the first time, but when I created a few histograms they looked really interesting, so I decided to gather more samples. Because the SQL statements execute so quickly, gathering all the data only took around an hour.

The experimental hardware and software is as follows: Oracle 11.2.0.1.0 Enterprise Edition running on (uname -a): Linux sixcore 2.6.32-300.3.1.el6uek.x86_64 #1 SMP Fri Dec 9 18:57:35 EST 2011 x86_64 x86_64 x86_64 GNU/Linux.

Experimental Results

This experiment generated interesting plots, histograms, and numerics. Everything has been placed on-line for you spend countless hours analyzing. However, I will highlight just a few things below.

If you want to look at the experimental details; including all the scripts, output including the data, plots, Mathematica notepad, etc. click here (3.6 MB).

I'll report my findings through a series of questions.

Does hard parse time increase as the load increases?

Yes, but it's even more interesting than you might think! First, remember the load on the system was competing for CPU resources and hard parsing requires CPU resources. If there was an IO bottleneck, the hard parse time might not increase as the load increases. Here is a plot which clearly shows as the load increases, so does the hard parse time.
Each dot (and many are stacked) represents one of the 100 samples taken at each load... for the initial execution, that is, the hard parse. For example, at load 3 while it may look like there are only perhaps five samples, there are actually 100... not a relatively large variance or data spread.

The above plot clearly shows two things:

1. Hard parse elapsed time increased as the load increased. Once the load level hit 10, CPU utilization averaged around 100% and 99% of the wait time was cursor: pin S. Here are the median elapsed parse times (ms) starting at load level 3: 24.5 , 25.5 ,  31.3 , 55.3 , 81.0 , 76.4 , 132.1 , 140.0 , 152.0 , 134.2 , 190.5 .

2. Hard parse elapsed time variance increased as the load increased. The plot above clearly shows the spread of sample time increases as the load increases. It's actually much more than I expected. Keep in mind, that because the samples can overlap, looking at the above plot it's not possible to get a good understanding of the distribution. For that, we need a histogram as I show below.
The two tall above smoothed histograms are the hard parse at Load 3 and Load 4. They are so tall because their spread is relatively small so most of the samples end up being stacked on each other. Load 5 peaks far below Load 3 and Load 4 but does have a peak (i.e., mode). Load 5 and Load 6 are tremendously spread out and are barely visible. The spread was massive for Loads 7 to 13, I did not show them in the histogram. Large data spreads produce inconsistent performance... something that users do not enjoy.

I also wanted to know if the relationship between hard parse elapsed time to load is linear. All the needed data is in the Mathematica notepad, but looking at the plot below we can see the relationship appears to be linear. 

The relationship between hard parse elapsed time and system load appears to be linear.

Certainly there is not an exponential-type bend to the curve. I incorrectly assumed that as the load increased the hard parse time would increasingly increase resulting in a non-linear relationship between hard parse times and load. Here's the plot.

Is there is significant difference between hard, soft, and softer parse times at the various loads?

Yes. Below are the plots clearly demonstrating this! Going left to right and top to bottom, the upper left plot are the elapsed parse times for the initial parse (hard) at various load levels. And yes, this is the same chart at the one above. The bottom right plot is the seventh parse (very soft) at various load levels.
Just by looking at the above plots we can easily see:

1. The hard parse is always much longer than a soft parse, regardless of the load level. In fact, here are the median hard parse times (ms) at load level six: 48.7 (hard), 0.142 (soft), 0.026 (soft), 0.020 (softer), 0.020 (softer), 0.020 (softer), 0.020 (softer). At load level six, the difference between the hard (first execution) and soft parse (second execution) is 348 times! (348=486.8ms/1.4ms) In my previous posting related to a no-load system, the difference hard parsing was 180 times longer than soft parsing (second execution), so a load definitely accelerates the situation... about double in this situation.

2. At load level six, the difference between the hard parse and the softer parse (execution 4, lower left plot) is 2435 times! (2435=48.7/0.020)  In my previous posting related to a no-load system, the hard parsing was 1129 times longer than softer parsing (fourth execution), so a load definitely accelerates the situation... about double in this situation.

When does the elapsed parse time stabilize?

The fourth execution is about the best Oracle can do in terms of elapsed parse time. While there are significant drops from the first (hard) to the second (soft) parse and also from the second (soft) and third (soft) parses, once the fourth execution (softer) occurs the parse time stabilizes. This is exactly what we saw during the no-load experiment and demonstrates the the session cached cursor performance is the best Oracle can do. This is why we don't continue to see a parse time drop when comparing the fourth to the fifth, sixth, and seventh executions.
In the above plot, the third execution (soft parse) elapsed times are the top round blue circles. The fourth to seventh executions are clumped together fairly consistently below the third execution soft parses.

Is soft(er) parsing elapsed time immune to the system load?

Yes, in my specific experiment. If you look at the above plot and one above it (with the seven plots), you'll notice that the elapsed parse time is very consistent  and did not increase much (it does a little) as the load increased. The obvious exception is the hard parse. This is another example of why we want soft parsing to occur and just how performance harming hard parsing can be.

Summary

While there are lots of insights we can glean from this experiment, to me these are the most important:

1. Hard parsing elapsed time increases as the (CPU) system load increases. That is, it gets worse and worse! While the relationship is linear, in my experiment hard parse times increased from around 25 ms to 190 ms. That is around a 7.6X increase or if you prefer, a 760% increase!

2. By sharing SQL, Oracle is able to soft and softer parse dramatically decreasing parse time, reducing CPU consumption, and reducing memory serialization control issues (e.g., mutex wait events). At an intense CPU workload (load 6), the parse time drop from a hard parse (first execution) to a softer parse (fourth execution) was around 2435 times! Note: There are rare and very specific situations where we want a similar SQL statement to be slightly textually different reducing the likelihood of many sessions executing the same exact SQL statement. I will focus on this in a future blog post.

3. Once a statement is executed four times, parse time is unlikely to decrease any further and is not materially affected by the CPU load.

In my next parsing related posting, I am going to delve into the situation where hundreds and hundreds of child cursors can be created causing massive serialization and parse time issues. If you're not familiar with this situation it can seem obscure and unusual. But it occurs more often than people think and can be a very serious performance issue.

Thanks for reading!

Craig.

Tuesday, August 7, 2012

What I do in my spare time...

This is very unusual indeed. This post is not about Oracle, but about one of the things that I do besides Oracle performance analysis.

Many people ask me what I do besides Oracle and I reply with a long list of things like coffee roasting, beekeeping, spiritually focused youth work, and the list goes on (just ask my wife).

One of the things I love is riding my motorcycle and this is what this post is about. I just returned from a 14 day trip from where I live (Portland, OR USA area) riding down south about 1000 miles to a big motorcycle race (MotoGP at Laguna Seca) and then back home. I took some video and pictures and made comments along the way. I put it all together into a three part series and posted it on youtube for anyone to see.

If you want to take a look and see "the real Craig" (not even one mention of "oracle" or "orapub"), here are the direct links: