Monday, August 26, 2013

Is Oracle 12c Faster Than 11g?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

I Know That's Not A Fair Question

Is 12c faster than 11g? Yeah, I know that's not a fair question. But I do want to know if 12c can process 1000 buffer gets faster than 11g... don't you?

I want to know how much CPU time is consumed when Oracle processes a single buffer get without virtually any resource competition. Why? Because when a buffer is processed without any wait time and there is no CPU subsystem queuing, the performance situation turns into Oracle kernel code versus Oracle kernel code!

Just to avoid any confusion, I mean the same thing when writing a buffer get, a logical IO, or simply LIO.

Aren't You A Little Curious?

Suppose my experiment shows running the same SQL statement (and execution plan) running on 11g consumes 0.05 ms of CPU time to process a single LIO yet on 12c it only takes 0.03 ms. Then I know Oracle has made some low level kernel code optimization improvements. And if a SQL statement is LIO dependent, without any additional Oracle performance improvements I would expect the SQL statement's elapsed time to be reduced. (related posting HERE)

However, if the situation is reversed and 12c consumes more CPU to process a single buffer in a "no load" experiment, then Oracle had better have some new features to make up for the initial performance "loss." The new features could be a better caching algorithm, new SQL optimization paths, improved memory serialization control, new process architecture options, slick instance parameter settings that I can change, ... or something. But there had better be something.

Why A Buffer Get?

The CPU time to process a single buffer is an important performance metric. On an active system with little CPU subsystem queuing and virtually no Oracle wait time and the same SQL (and execution plan) running, the performance situation boils down to pure Oracle kernel code. Every line of Oracle source code consumes a little CPU. If a new Oracle kernel release requires 100 more lines of kernel code to be executed to process a buffer get, that will be reflected in the CPU required to process a single buffer get.

It's important to understand this is NOT about the SQL, the SQL execution plan, concurrency, the IO subsystem, the operating system, the operating system version, or even the CPU speed. It's about raw Oracle kernel code processing. It's pure and a great way to compare Oracle versions.

Experimental Setup

The experimental setup was very simple. For example, I did not alter the load on the system or force CPU queuing and Oracle wait time. (That's coming though...) I created a buffer cache large enough to buffer all the blocks my SQL would touch and limited the number of concurrent sessions to ensure the CPU utilization was between 30% and 40%. A low CPU utilization reduces the likelihood of CPU queuing and Oracle wait time.

There was no DML, only queries.

I also compared the SQL execution plans for both 11g and 12c to ensure they were identical. They were... once I disabled parallel query in 12c.

I used my OPLOAD TOOL KIT to place a logical IO (i.e., buffer get) load on the system. Before I began collecting data, I recycled the Oracle instance, started the LIO load, let is settle for a few minutes, and using vmstat ensured there was no IO activity and that CPU utilization was always below 40%.

My data collection was very straightforward. I made an initial statistic collection, slept for 3 minutes, made another statistics collection, calculated the deltas, stored the delta data, slept for 5 seconds, and then repeated 90 times.

The three minute sample time reduces the impact of collection overhead, the 5 second chill time reduces the likelihood that sample X collection will impact sample X+1 collection, and finally, the 90 samples gives me a good statistical picture of reality.

I also turned off AWR collection, the result cache, and the resource manager. For 12c I checked to ensure threaded execution was set to false. It turns out that on my system, I was unable to get the instance parameter threaded_execution to TRUE. Even if it was set to TRUE in the instance parameter file, after the instance started it was always FALSE. Perhaps I don't have thread capability on my system?

The data collection script is well commented and can be seen HERE. You can see all 11g data HERE and all the 12c data HERE. Included is each sample is the sample's duration, total buffer gets processed, total non-idle wait time, and total CPU consumption. While in this posting I only discuss the CPU consumption and buffer gets processed, those of you who have taken my Advanced Oracle Performance Analysis will know what to do with the other stats!

The data source for CPU consumption was from the v$sys_time_model view's DB CPU and background cpu time columns. For buffer gets the view was v$sysstat and column session logical reads. It is then a simple matter to get the average CPU per buffer get by dividing the total CPU consumption by total buffer gets. The file containing just the CPU per LIO samples for 11g is HERE and for 12c HERE. These two CPU/LIO files are what I contrast in this experiment.

The six core Linux box details:

[oracle@sixcore ~]$ uname -a
Linux sixcore 2.6.32-300.3.1.el6uek.x86_64 #1 SMP Fri Dec 9 18:57:35 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

The 12c details:

[oracle@sixcore ~]$ sqlplus system/manager

SQL*Plus: Release 12.1.0.1.0 Production on Fri Aug 16 09:53:46 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Last Successful login time: Tue Aug 13 2013 18:08:24 -07:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

The 11g details:

[oracle@sixcore ~]$ sqlplus system/manager

SQL*Plus: Release 11.2.0.1.0 Production on Fri Aug 16 09:54:25 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Now on to the results...

The Experimental Results

To summarize, the 12c average CPU consumed per buffer get was around 11% less compared to 11g. Honestly, I didn't know what to expect. Why? Because with each new release Oracle puts more and more features into their product, so it must be extremely difficult to reduce the CPU consumption to process a little tiny buffer get! Even keeping the CPU consumption per buffer get the same is an amazing accomplishment!

The free statistical package R was used to perform the statistical analysis. The raw R output is HERE and the R script is HERE.

Here are the results:


version     average      median     p-value   samples
         (ms CPU/LIO) (ms CPU/LIO)

11g        0.004725     0.004725     0.7029     90
12c        0.004420     0.004408     0.0000     90

Numerically we can see there is a difference between the 11g and 12c average CPU per LIO. But this does not mean anything! The key to understanding the results is to check if there is a statistically significant difference between 11g and 12c, not if there is a numeric difference. The statistical analysis is in the next section.

Also, this experiment is not about how much CPU time is involved... it's about the comparison. For example, suppose 11c consumes 0.05 ms of CPU time per LIO. This experiment is not about the 0.05 because that depends on a number of other factors I don't care about in this experiment, such as the CPU speed, the operation system, SQL execution plan, etc.

This experiment is about the difference in CPU consumption per LIO between two Oracle versions... period.

Is 12c Really Faster Than 11g?

Just because, in my experiment, the average and median CPU per LIO for 12c is less than in 11g, it does not mean 12c is noticeably faster than 11g. First we need to see if there is a statistical difference and then second consider the real-life impact.

Because the 12c samples are not normally distributed (p-value < 0.05) a sample t-test is not appropriate when comparing the two sample sets (11g and 12c CPU/LIO samples). We need to perform a location test. R contains an appropriate test, called the "wilcox" test.

Note: My appologies (not really) for the simplicity of this: For this experiment, a p-value of 1.0 means the sample sets are the same, a p-value greater than 0.050 means the sample sets are statistically the same, a p-value less than 0.050 means the sample sets are statistically different, and a p-value of 0.0000 means the sample sets are very different.

The result of the wilcox test is a p-value of 0.0000. Since the wilcox p-value is 0.0000, we can say the sample sets are statistically different... which means, in my test, the CPU per LIO for 12c is statistically speaking significantly less than 11g.

Let's take a look at the two samples using a "smoothed" histogram.

The blue histogram on the left is 12c and the red histogram on the right is 11g. Clearly, the two sample sets look very different. And statistical significance tests say they are different. So for my experiment:

Statistically speaking 12c is faster at processing buffer gets than 11g.

But will we and our users notice the difference?

Will Anyone Notice The Difference?

While in my test 12c consumed a statistically significant 11% less CPU per buffer get compared to 11g, will we notice or feel a difference? Let's look at this from two perspectives. First an elapsed time perspective and second a throughput perspective.

Elapsed Time Perspective

Suppose a query touched 100,000 buffers. Based on my no-load test 11g would complete the query in around 472.5 ms, that is 100,000 lio * 0.004725 ms/lio. In 12c the query would complete in around 442.0 ms, that is 10000 lio * 0.004420 ms/lio. If you could feel the difference, I recommend reducing your caffeine intake!

However, as the CPU workload increases the queuing impact would begin sooner in 11g than in 12c. (The next section explains why.) When this occurs, the time to process a single buffer contains both CPU time and queue time (which we would see as non-idle wait time) AND if a lot of buffers are touched in a SQL statement users may begin to feel the impact. But on a no-load system, I can't imagine anyone could notice the difference.

Throughput Perspective

Let's change our perspective from a single SQL elapsed time perspective to a system-wide throughput perspective. Consuming less CPU to process a LIO means more LIOs can be processed before queuing sets in and slows down performance. To give you a visual picture of this, look at the below response time chart.

Note: You can download the spreadsheet for the below chart HERE and the actual raw tool at the OraPub website HERE. I teach how to do this kind of analysis in my Advanced Oracle Performance Analysis class (always instructor led by me).

Above, the blue line represents the 11g system and the orange line represents the 12c system. The orange line crosses the Y-axis below the blue line because it takes less CPU time to process a single piece of work AND when there is no load (a near zero Arrival Rate) there is no queueing... only CPU consumption!

Here's an example shown by the circles on the above chart. If the 11c workload was at 1150 LIO/ms, the 12c workload could increase to around 1350 LIO/ms before the 12c response time equaled the 11g response time. So the 11% decrease in the 12c CPU per LIO translated into a 19% increase in workload before the 12c response time matched 11c response time (response time is the CPU time plus wait time per LIO). What does this mean? In my experiment: