Tuesday, February 28, 2012

Wanna Bet? Cool facts about the central limit theorem.

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Wanna Bet?

Suppose there is a room of 75 Oracle DBAs. I randomly pick 25 of them and ask them to randomly divide up into 5 groups of 5 each. Then I ask each group to compute the average age for their group and give me the results. The averages are 38.2, 34.7, 41.2, 43.9, and 42.1. I do a little math but don't say anything. I then look to the remaining 50 DBAs, ask them to break up into 10 groups of 5 each, compute their group's average age, write their result on a piece of paper, and without showing me, place the paper face down on the table before me.

Start the drum roll please... I take a deep breath, reach into my pocket, take out a crisp new $100 bill, and place it on the table next to ten pieces of paper. I then proclaim that if anyone is willing to match my $100 bet, I will turn over the ten pieces of paper in front of me and bet them that written on nine out of the ten are values between 32.8 and 47.2. The range doesn't spook the group, but the 9 out of 10 visibly does.

No one says a word... I start breathing again. But just for fun, I slowly peal back each piece of paper one by one exposing their values: 42.4, 35.9, 49.7 (oh oh...), 46.1, 45.3, 40.0, 43.0, 39.3, 35.5, and 42.1. So 9 out of the 10 were between 32.8 and 47.2. Was I lucky? No. Could I have lost my $100? Yes, but it was unlikely and why the saying, "Two out of three?" came from I suppose.

So how did I know the odds were in my favor even with less than a 14 year spread? Believe it or not, the principle demonstrated is this: If you divide up a population into sample sets and calculate the mean of each sample set, the distribution of the means will be normal. This normality characteristic allows us to make bold "plus or minus" statements... perhaps with a $100 bill involved. Can't wait until Collaborate/IOUG 2012 in Las Vegas!

While this trick is pretty cool, in my view, the really cool part is it will work using years of Oracle experience, number of kids the DBA has, and size of the DBA's IT department. Or more boring figures like wait event times, SQL elapsed times, and SQL arrival rates.

The fine print.

The distribution being normal tells us that 90% of the samples are between plus or minus 1.645 times the standard deviation from the average. In my example, the average of the initial five sample sets is 40.0, the standard deviation is 3.6, so the "minus" 90% bound is 34.1, and the "plus" 90% bound is 46.0. But those aren't the bounds I used in the challenge!

You may have noticed my "wanna bet" range was not 34.1 and 46.0 (90% range) but rather between 32.8 and 47.2. I needed some insurance because I'm using less than 30 sample values so the math to determine the upper and lower bounds is slightly different. So to hedge my bet, I used the 95% level instead of 90%, which is 2 times the standard deviation, not 1.645. This is why and how I used the bounds of 43.8 and 47.2 (95% range).

If you really dig into this, at its core this is about the central limit theorem. This also indirectly relates to what is known as the standard error, which I'll save for another posting.

What does this have to do with our performance analysis work? As I explain in more detail at the end of the Demonstration On A Larger Scale section, this posting touches on the very core of making "plus and minus" statements. As I have blogged previously, using only the average leaves a lot to the imagination and can cause miscommunication. The "plus and minus" presented in this posting is central to understanding and making "plus and minus" predictive analysis statements, but with an interesting twist.

If you want to follow exactly how I did this, you'll want to download the Mathematica notepad in either PDF or native format. It's pretty much top down and you'll see all statistics including the histograms.

Demonstration On A Larger Scale

I was never one to grasp mathematical proofs or believe all that I read. And especially with a flimsy "test" using 15 sample sets of 5 values each. For me, I need to visually see repeated examples before I began to trust claims like I've stated above. I could have selected one very convincing example above to demonstrate my points... and I did! Why? Because I wanted to keep the number of values very small to reduce confusion. But to see if the above claims are really true, we need a much more robust test. And that's what I've done and will describe below.

LOTS of notes and all the information (data, visuals, statistics) used in this example is available for download in the Mathematica notepad in PDF or the native format.

Step 1. Create a sample set.

I created a 900 value sample set based on the exponential distribution with an average of 5.0. The actual mean of the 900 values is 5.343. (The means don't match exactly because I'm randomlly pulling my samples from a mathematical formula or distribution.) I choose the exponential distribution because it is clearly (both visually and numerically) not normally distributed and will confirm some of the wild claims I'll make.
The above histogram is composed of the raw 900 values. Clearly the values are not normally distributed but look very exponentially distributed (because they are). In fact, the normality test p-value is 0.00000 and we need a 0.05 for the data to likely be normal.

Step 2. From the parent sample set, create a number of child sample sets.

I wrote a program in Mathematica to sequentially pull from the parent sample set, 30 sample sets each containing 30 sample values. So every one of the 900 population values reside in one of the 30 child sample sets. I'm not going to post all the samples here, but they are included in the Mathematica file (both pdf and native).

Detail: To ensure I didn't use a sample value twice, I started pulling from the population set sequentially. A variable sets the starting sample place, which in this example is the sixth. To ensure I didn't try to pull beyond the end of the population sample set, I simply joined the initial 900 sample set to itself.

Step 3. Calculate the mean of each child sample set.

Here are means for the 30 child sample sets that were created in Step 2 above. There is one mean for each of the 30 child sample sets.
{6.35511, 4.91329, 3.7737, 5.27418, 6.52658, 7.68304, 4.4319,6.09553, 4.64664, 6.10894, 4.422, 5.21073, 4.33155, 6.78821, 5.38437, 5.38456, 5.80179, 4.60562, 4.87516, 6.77315, 5.38634, 5.55096, 5.43312, 4.48588, 4.75642, 5.06397, 4.43674, 4.72971, 5.00164, 6.06055}
The mean of the parent sample set and the mean of the sub sample sets are the same!

According to the central limit theorem, the mean of the parent sample set and the mean of the child sample sets are the same. The parent sample set mean is 5.343 (see Step 1) and mean of the 30 child sample sets is 5.343. Not bad! Actually, if you try this yourself once the number of sample sets is greater than 20 and all the population values are included in the child sample sets, their means will be pretty darn close.

Is the distribution of child sample set means is normal!

According to the central limit theorem, a distribution of means (e.g., the means from our 30 child sample sets) is normal. Below is a histogram based on our 30 means.
Based on the same 30 means, below is a smoothed histogram. Sometimes it is much easier to see the normality when using a smoothed histogram.
These histograms are called the sampling distribution of the mean. The standard histogram doesn't look perfectly normal, but statistically it is! I performed a normality check and the p-value is 0.185 and we only needed a value greater than 0.05 for the values to likely be normally distributed. This is awesome that a distribution of sample means is normal... because now we can make some very bold statements about these sample values (i.e., the means from our 30 child sample sets).

Using the normal distribution claim.

Because the distribution of the child sample set means is normal, 95% of the child sample set means will be within two standard deviations of the mean (the mean of our 30 child sample sets). The mean of the child sample sets (of mean values) is 5.343 and the standard deviation is 0.892. Therefore, 95% of the 30 child sample sets means will be between 5.343 +/- 2*0.892 which is also 5.34 +/-1.78 which is from 3.56 to 7.12, 95% of the time. Statistically this is true, but is it really true?

If you look back to Step 3, where the 30 means are listed, you will notice that 29 of the means are between 3.56 and 7.12, which is 97%. How cool is that!

Now let's scale it up!

Just for fun I increased the sample set size from 900 to 8100, which resulted in 90 child sample sets, each with 90 sample values. I then reran the numbers. The population mean and the mean of the 90 child sample set means is also 5.028.(!!) Not only does the histogram look normal, but the normality test resulted in a p-value of 0.62, which is above our 0.050 threshold indicating the sample set is likely to be normal. How many samples where within two times the standard deviation? Hopefully at least 95% and in this situation, 98% or 88 out of 90 where within the boundaries!

Making This Practical

All this math is actually very practical when you want to describe a set of numbers with more than simply the average. As I've written about previously, the word average invokes the possibility of an unlimited number of possible perceptions of the situation. Simply providing the "plus and minus" and perhaps a histogram can significantly improve communication.

But the challenge is, many times our data is not normal. And as I've demonstrated, when the data is normal we can make some very bold (well... not that bold actually) claims.

However, demonstrating the data is not normal is valuable because it breaks the typical picture people have in the minds when we talk about, for example, the average SQL statement run time or the average wait event time. Knowing a set of data is not normal is valuable because it improves our understanding of the data.

But what I really like about this is regarding making predictive analysis statements. When making a prediction, it's not enough to only supply the average. Part of developing a robust predictive model is understanding its precision and usefulness. One of the ways to do this is understanding its "plus and minus." But if the underlying data is not normal, the "plus and minus" doesn't really help much.  However, when undemanding a predictive model's precision, we are describing the average error (and the key word is average). If we have a nice set of data (there is sooo much I can say here, but won't) then the distribution of the error will be normal allowing us to make bold "plus and minus" statements about the precision of our forecasting model. Really powerful stuff.

Sales pitch: If you're interested forecasting, you'll be into my Forecasting Oracle Performance book and my Oracle Forecasting and Predictive Analysis class.


This posting took me many, many hours to create. I started with the central limit theorem and then tried to make it work using real numbers. That was the Demonstrating On A Larger Scale part of this posting. But to make this a little more fun and interesting, I wanted to turn it into a story. The "wanna bet" introduction was the result. I'm hoping you learned something of value and if you're a statistical master you have seen the theory play out using real numbers, perhaps, for the first time.

Summarizing the key take-aways, I'd distill them into these five points:

1. There is value in knowing if data is or is not normally distributed. If we understand the distribution, we can make make bold statements about the data. Plus it helps everyone better understand what the "average" value implies.

2. If a data set is normally distributed we can make bold "plus and minus" statements. For example, 95% of the samples are within plus or minus two standard deviations of the mean.

3. The mean of the parent sample set and the mean of the child sample set means are the same. (And I thought saying, "This is the LRU end of the LRU." confused people!)

4. A distribution of means is normal REGARDLESS of the parent distribution. I hope this is crystal clear in your mind. This enables point 5.

5. We can use bold normality statements when describing forecasting model precision (i.e., error).

Thanks for reading!


If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting and Advanced Oracle Performance Analysis. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.

P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at orapub.general@gmail .com.

Tuesday, February 21, 2012

Speed Sometimes Means Changing The Rules

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Do like Kirk did

How did James Tiberius Kirk beat the Kobayashi Maru? He changed the rules. As performance analysts, to beat the performance game Oracle sometimes allows us to change the rules as well. While relational heresy to some, the commit write facility is an in your face way to improve redo constained performance.

In a nutshell, the commit write facility is this: When a commit is issued and the client process quickly receives a "commit complete", it's not really completed... not yet anyway. So there is a possibility of loosing committed data.

Recorded in my book, Oracle Performance Firefighting, are the results of some experiments using Oracle's commit write facility based on Oracle 10g. They experimental results showed massive performance gains. However, in Oracle 11 some syntax changes were made. It was getting a little confusing, so I thought it was time for a new round of experiments. I thought I would post the results for all to enjoy.

It's OK to loose committed data...sometimes

There exists applications where the loss of committed transitions is acceptable. There are also applications where redo related logging is the key performance constraint. If you can put these two together, Oracle does provide a novel solution. Starting in Oracle 10g, there are ways for a client process to receive a "commit complete" when the commit may have not physically occurred... which means committed data could be lost.

While most DBAs will find this unsavory and near heresy, there are others who have discovered a delayed commit, with the unlikely possibility of lost committed transaction, just what they need to meet performance objectives. I know multiple DBAs who have proudly stated they have a system using the commit write facility. But to understand why, you need to understand their application.

Consider a social networking application. While very unlikely, suppose there was an outage during the commit related to adding others to your network. Are their legal issues? I doubt it. Has money been lost? Let's hope not. Has someone been hurt? Perhaps you lost some time, but hurt? No. All you end up doing is re-adding the people to your network. It may be irritating, but the performance gains combined with an unlikely interruption compared to constantly slower performance or more hardware, it is a compelling business argument to allow Oracle to say, "Yep, it's committed." when it's really not quite on disk just yet.

How to use the commit write facility

The commit write facility can be influenced within the commit statement, by issuing an alter session or alter system, and also by setting the instance parameters commit_wait and commit_logging. In llg the instance parameter commit_write exists, but the alert log states it has been deprecated. After I define what the various options mean, I'll demonstrate the various ways to enable commit write.

The first set of options relates to how the log writer background process writes. The options are either wait or nowait. As you can guess the default is wait, which is a synchronous write, that is, the log writer will effectively wait for the IO call to complete before issuing another IO call. As with accessing Oracle memory structure serially, synchronous writes can become a performance issue.

The other option relates to batching redo entries before the log writer physically issues the write. The options are immediate and batch. As you can guess, the default is immediate, which means as soon as the commit is issued, the log writer issues the write (either in wait or nowait mode).

As you would expect, the default mode is wait and immediate. And we would expect the fastest way to complete a commit would be a nowait (i.e., asynchronous) batch write. This is what I checked in my experiments far below.

In both Oracle 10g and 11g, the command line commit options both work. If you run the below command in either Oracle 10g or 11g, the results will look the same. In my experiments, I used the command line options.
SQL> update op_test set object_name='OraPub training rocks!' where object_id=100;
1 row updated.
SQL> commit write immediate nowait;
Commit complete.
SQL> update op_test set object_name='OraPub training rocks!' where object_id=100;
1 row updated.
SQL> commit write batch nowait;
Commit complete.
If this sounds convoluted, it is a little. Especially in 10g where both parameters were set using a single entry, like "wait,immediate". So in Oracle 11g, the commit_write was replaced with both the commit_logging and the commit_wait parameters. The good news is, your 10g based applications will still functionally work after you upgrade to 11g, except they commit write facility will not technically be working! While issuing these statements in 11g still results in a statement processed, if you look at the alert log file, you'll see something like this:
COMMIT_WRITE is deprecated,use COMMIT_LOGGING and COMMIT_WAIT instead.
What makes this really confusing (to me anyways) is the commit_write instance parameter exists, you can set it, you can see it, but according to the alert.log file it is depricated! Bottom line: Stop using commit_write, which leads us to the changes in Oracle 11g.

So in 11g, there are two new parameters commit_logging and commit_wait instead of a single commit_write parameter. And you can still alter session, alter system, and change these instance parameters. For example:

SQL> show parameter commit_wait

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
commit_wait                          string      WAIT

SQL> show parameter commit_logging

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
commit_logging                       string      IMMEDIATE
SQL> alter system set commit_wait=nowait;

System altered.

SQL> alter system set commit_logging=batch;

System altered.

SQL> show parameter commit_wait

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
commit_wait                          string      NOWAIT
SQL> show parameter commit_logging

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
commit_logging                       string      BATCH
While I issued the above commands, this is what occurred in the alert.log file.

[oracle@fourcore trace]$ pwd
[oracle@fourcore trace]$ tail -f alert_prod18.log

Mon Feb 20 09:58:43 2012
Mon Feb 20 09:58:54 2012
If you have been using the instance parameter commit_write="batch,nowait" in 10g, that should be replaced with the commit_logging=immediate and the commit_wait=nowait parameters.

Interesting, on Linux running Oracle 11.2 and tracing the log writer using the Linux strace command, regardless of the command line commit write settings, the log writer always issued the pwrite64 system call. The async io calls (e.g., io_setup, io_submit, aio) where not used. This was cool: I also consistently observered the with only the batch/nowait option a second or more delay before the write was issued. It was pretty cool and tested this a few times and it always occurred. 

Why use the commit write facility

The log file sync wait event is posted by a server process (the only time I've seen it) after it has issued a commit, is not consuming CPU, and is waiting for the OK to continue processing the commit.

It's important to understand that when a server process commits, both the server process and the log writer engage in a very serial and very coordinated/controlled steps to ensure the committed data properly resides on an on-line redo log. Therefore, for every commit, CPU must be consumed, IO writing must occur, and serialization must be controlled. This allows for either a CPU, IO, or process CPU serialization issue (think: single core busy while other cores are not). This is one of the reasons why DBAs have trouble diagnosing and solving log file sync issues.

I'd say that 99% of the log file sync issues are the result of a commit occurring after each DML statement. For an extreme example, think of an insert and commit together in a plsql loop. As I researched and wrote about in my blog entries (4 in total) about log file sync and commit times, simply batching or grouping the DML before a commit (even a batch of two!) makes a massive performance difference.

This is why when you see a log file sync issue think application code first. Look for a situation where there is a commit (implicit or explicit) following a DML statement.

There are usually a number of options to resolve this, but one way is to cheat! Let Oracle do the batching of commits instead of the application. Or let Oracle ease up on the serialization control a bit. This is what the commit write facility allows us to do! The batch (vs immediate) allows Oracle to batch redo activity so it doesn't have to issue a write immediately after a commit. (Though it appears, based on the CPU bottleneck experiment below, the application change strategy has a greater affect.) And the nowait (vs wait) options allows for an async write, which immediate returns an "OK, your committed data is safely on an on-line redo log" when it may not quite be on disk yet and if there was a failure you may loose that data.

But what about performance impact? Read on...

Experiment Introduction

This was a rather involved experiment because I created two loads (CPU and IO write) for each of the four commit write options. The top wait event, log file sync was easily created by enclosing a DML statement and a commit within a simple plsql loop. And it's this commit statement that contained one of the four commit write options.

Each sample was 60 or 90 seconds and 30 or 32 were collected for each load and commit write options. This means over 240 samples where collected.

You can download the 8.5MB experimental pack HERE, which includes all the scripts, notes, and results in both PDF and Mathematica notebook format. There is much more interesting data, charts, and comments available than I have posted.

Remember, while I mention and link to specific scripts below, they are all included in the experimental pack zip file.

Here's a common situation: According to the users, the entire application is unacceptably slow. The top wait event is log file sync. During daytime processing, there is a clear OS CPU bottleneck and during the evening and nighttime processing there is a clear IO bottleneck.

Interesting, when the top wait event is log file sync there can be either a CPU or an IO bottleneck. As the experiments below will clearly demonstrate, we need to focus on minimizing time; both CPU consumption time and wait time. Focusing on one can lead to un-optimal results.

Because with log file sync the system can be constrained by either a CPU or IO bottleneck, I ran the experiment for both loads.

Experimental Setup

Simulating five concurrent sessions, there is a load script (dml1.sql (CPU) or dml2.sql (IO) which is run in the background five times. These two short plsql scripts are very different and you may want to quickly review them. Here is the actual dml2.sql script:

-- This is the dml2.sql SQL
def startv=&1
def endv=&2
def wait=&3
def batch=&4
def opt='write &wait &batch';
    update op_test
    set object_name='kldfkjldfkjdfkjdfakl;dakldkldkldsfakjldfkjldfadkldakldfksljdfkjldfskjldfsk;l'
    where object_id=trunc(dbms_random.value(&startv,&endv));
    commit &opt;
  end loop;
Here is how I used the dml2.sql script to place a log file sync /  IO write load on the system using the nowait/batch option.

sqlplus tester/jester @./dml2.sql     1  5000 $wait $batch &
sqlplus tester/jester @./dml2.sql  5500 10000 $wait $batch &
sqlplus tester/jester @./dml2.sql 10500 15000 $wait $batch &
sqlplus tester/jester @./dml2.sql 15500 20000 $wait $batch &
sqlplus tester/jester @./dml2.sql 20500 25000 $wait $batch &
sqlplus tester/jester @./dml2.sql 25500 30000 $wait $batch &
In another session I collect the performance data. It's pretty straightforward: I gather the initial performance data, sleep x seconds, gather the final data, compute the differences, store the differences (op_results table), and repeat 30 or more times. The script is a little long to paste, but CLICK HERE and you can view it.

The experiment was run using Oracle 11.2 on a Dell single CPU 4 core server running Linux version 2.6.18-164.el5PAE (mockbuild@ca-build10.us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)).

Now let's take a look at the actual results!

Experimental Results: CPU Bottleneck

First I need to mention that ALL the results and lots of notes reside in the Mathematica Notebook. You can view the 3.2MB PDF file here. For the CPU bottleneck experiment the load was placed on the system using the dml1.sql script.

From a broad perspective,  here are the results:

There were 32, ninety second samples gathered for each of the four settings. Here is a short description of the table columns:
Setting is the commit write setting.
Commit/sec is the average workload expressed in commits.
CPU % is the average OS CPU utilization percentage.
log file sync Wait Time % is the average log file sync wait time when compared to all the other non-idle wait time.
log file par write Wait Time % is the average log file parallel write wait time when compared to all the other non-idle wait time.

From a raw "did performance get better" perspective, the results are disastrous. Specifically, the workload rate only slightly increased and our users would probably not notice (or not tell us) a difference. But from a performance analysis perspective, there are deep Oracle performance analysis insights to be gleaned!

The above histogram shows the difference between our four sample sets. It's a histogram so it shows the number of occurrences (vertical axis) vs CPU time plus non-idle wait time per commit. (Because it's smoothed the vertical axis occurrences is not the actual number of occurrences.) The main separation between the histograms is when using the nowait option, that is, allowing the log write to write asynchronously. But this can be misleading, because while the time looks much better and is statistically significant, it's only around 2%!!

The good news is we essentially removed the log file sync wait time by using the "nowait" option. While we did very effectively alter the wait time situation (below: Avg Wt), wait time makes up a very small portion of the total time (below: Avg Rt) and so nearly eliminating the wait time did not improve the performance situation much. To better understand this, we need to look at the performance situation from a different perspective.
The statistics shown in the table above are in a unit of work time based analysis format. This allows us to unite Oracle performance analysis with Operations Research queuing theory. There are many references about this; my Advanced Analysis class, the last chapter in my Oracle Performance Firefighting book, conference presentations, white paper, and in my blog postings. Let me explain the strange looking columns from the above table.

Avg L is the workload, which in queuing theory is represented by the greek letter lambda, hence L.
Avg CPUt is the CPU time per each commit. This is an approximation and is derived as the total instance CPU time divided by the total commits that occurred over the sample period, which in this case is 90 seconds. The more singular the bottleneck, the less approximate the number becomes. This is also called the service time.
Avg Wt is the non-idle wait time per commit. This is an approximation and is derived as the total instance non-idle wait time divided by the total commits that occurred over the sample period, which in this case is 90 seconds. The more singular the bottleneck, the less approximate the number becomes. This is also called the queue time.
Avg Rt is the response time, which is simply the sum of CPUt plus the Wt time or queuing theory terms, service time plus queue time. Keep in mind, response time is the time to process a single unit of work, which in this case is a commit. The elapsed time is the time it takes to run a SQL statement, process a group of SQL statements, run a process, etc... many pieces of work.

If you look at the above table closely, you'll notice the CPU time per commit is a massive part of the response time (CPU + wait time) AND the commit write options did not change the CPU time component much. Regardless of the workload intensity, async IO, or batching the commits, it still took pretty much the same amount of CPU. Since CPU consumption is what drives this response time, our small CPU consumption drop didn't do much for performance. Yes, statistically the CPU decreased but not enough to make a real business difference.

We did see wait time percentage improvements (reduction) when comparing immediate to batch. If you look closely at the table above there is about a 2% wait time drop when changing from the default of wait immediate to wait batch, but the massive percentage drop occurred when going from nowait immediate to nowait batch. Because the wait time is so low, the 85% wait time reduction (1.3 to 0.2) doesn't amount to much. However, if wait time wait was the majority of the response time, it would have indeed made a difference!

When using the batching option, I expected to see a greater reduction in the CPU consumption per commit. My guess is the CPU load was so intense, the next Oracle issue was also CPU related, and since a commit was still occurring after each DML statement, Oracle just couldn't gain significant CPU efficiencies.

Focusing on the wrong thing...

This is a wonderful example of how focusing only on wait time can get you into trouble! We need to focus on reducing total time and the time per unit of work. So while our very specific strategy of reducing log file sync wait time worked beautifully, because the response time was more dependent on CPU consumption rather than wait time, the net performance gain was minimal.

Going deeper...

While not occuring in this particular experiment, it's important to understand that even a 5% drop in the service time (i.e., CPU consumption per unit of work) can make a significant performance improvement. This is especially true when the system is operating deep in the "elbow" of the response time curve. A drop in the service time (CPU time per unit of work) shifts the response time curve down and to the right. As a result of the response time curve shift, a 5% drop in service time can result in a much larger drop in the response time. And since elapsed time (i.e., the time to run a SQL/process/etc) is equal to the amount of work to be processed multiplied by the response time (time/work), a small drop in the service time can results in a much larger proportional drop in the elapsed time! I blogged about this in my Anticipating SQL Elapsed Times posting.

While this is out of scope for this posting, if you want to watch the response time curve change (before you eyes) when the service time decreases, download RT Explore interactive Mathematica tool. It's pretty cool.

What does this mean in real production system?
  1. Batch DML to reduce the likelihood of accumulating harmful log file sync time in the first lpace. (see blog post here).
  2. Using the commit write facility can dramatically reduce log file sync wait time, which can yield similar dramatic performance improvements.
  3. If the log file sync time is a small part of response time, don't expect a large performance improvement by reducing or even eliminating it. This is what occurred in this experiment.
This experiment was a CPU bottleneck with log file sync as the top wait event. It's also common to see an IO write bottleneck when log file sync is the top wait event. Hence the next experiment...

Experimental Results: IO Bottleneck

First I need to mention that ALL the results and lots of notes reside in the Mathematica Notebook. You can view the 3.2MB PDF file here.

From a broad perspective,  here are the results:
From a raw "did performance get better" perspective, the results are amazing! Over a 25X workload improvement is pretty good, eh? And if you read below, you'll notice when using the "nowait" options, the CPU and non-idle wait time to process a commit dropped by 96%. Wow! Let's take a closer look to glean some deeper performance insights.
The smoothed histogram above, like with the CPU bottleneck situation, is based on the CPU time plus the non-idle wait time per commit for each of our samples. While we can't really tell if the difference will impact performance, we can obviously see there is a very significant difference between the wait and nowait options.

By using the "nowait" options, we removed the log file sync wait time and shifted the processing focus (percentage perspective) from IO to CPU. We can see this in the table below. Click on the table so its humanly readable.
The statistics are in a unit of work time based analysis format, which I detailed in the CPU bottleneck section above.

First, look at the split between the CPU consumed per commit (Avg CPUt) and the non-idle wait time per commit (Avg Wt). In the CPU bottleneck situation most of the time was related to CPU consumption and so eliminating the wait time had little impact on performance. But in this IO bottleneck situation, the situation is reversed with the majority of commit time being associated with wait time... and specifically log file sync! In fact, using the wait/immediate option (the Oracle default), 84% of the commit time was related to log file sync

Since there is plenty of CPU resources available, all CPU cores are being used (I checked), commit time is heavily wait time dependent, and the wait event is log file sync, we can reduce the wait time and increase performance using the commit wait facility! Of course not committing after each DML would also help, but the commit write facility provides a non-application centric solution, that is, an Oracle-centric solution.

Why did the commit write facility work and in particular the nowait option? Because the log file sync time occurs when the server process is not consuming CPU but waiting on something to complete. What to complete? Predominately the log writer writing to disk! By releasing the log writer from writing synchronously, the server process receives the "commit complete" much quicker, the log file parallel write is "complete", and the server process can continue processing. As with the CPU bottleneck, log file sync wait time was virtually eliminated. The big difference in this case was the commit time was highly log file sync time dependent, so when this time was eliminated there was only CPU time left to complete the commit.

Oracle was also able to gain some efficiencies and that decreased the CPU consumption per commit around 48%. The drop in CPU requirements and drop in wait time allowed the system to absorb more work. In the experiment, the workload will increase until the system stabilizes and in this case it stabilized at a staggering 2400% increase.

It's important to understand that when the commit intensity increases, the time to process each commit may increase. This is another reason why a 2400% workload increase is so amazing, because the time to process a commit did not remain the same or even increase... it decreased by around 96%. So we witnessed a commit time decrease while the number of commits per second increased!

Going deeper...

Understanding the relationship between work and time is fascinating and it what unit of work time based analysis is all about. If you look closely at the very bottom of the Mathematica notebook, you'll notice I developed a formal mathematical model relating time and work. I was then able to set the workload for all four wait/immediate combinations to the default of wait/immediate and then fairly compare their response times, that is, the time involved to complete a single commit. Even better would have been to create their respective response time graphs, but I'll save that for another posting!

What does this mean in real production systems?
  1. Batch DML to reduce the likelihood of accumulating harmful log file sync time in the first place. (see blog post here).
  2. Using the commit write facility can dramatically reduce log file sync wait time, which can yield similar dramatic performance improvements. Which clearly occurred in this experiment.
  3. If the log file sync time is a large part of response time, expect a large response time decrease, a large workload increase, or a mix of the two. In this experiment we saw both a large response time decrease and a large workload increase.

While there is lots of detail and analysis in this posting, it can be distilled down to a few important points.

1. Batch DML to reduce the likelihood of accumulating harmful log file sync time.

2. If the top wait event is log file sync and its wait time is a significant part of the total time (CPU + wait time), then using the commit write facility can yield impressive performance gains measured by both a decrease in response time and an increase in workload (i.e., arrival rate).

3. The 10g commit_write instance parameter has been replaced in 11g with both the commit_logging and the commit_wait parameters. The parameters can be altered in the instance parameter file or by issuing an alter session or alter system. And finally, the commit command line options remain the same.

4. Never use the commit write facility were loosing committed data is not acceptable. And always get this in writing! (or make sure your resume is updated)

Thanks for reading!