Saturday, November 1, 2014

Does Increasing An Oracle Background Process OS Priority Improve Performance?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Does Increasing An Oracle Background Process OS Priority Improve Performance?


Does increasing an Oracle Database background process operating system priority improve performance? As you might expect, the answer is, "It depends."

In this posting I will explain the results of an experiment where I increase the Oracle Database 12c log writer background processes operating system priority.

In my experiment I created a clear CPU bottleneck and the top wait event was log file parallel write. I gathered some data. Next I increased all the log writer background process priorities. Again, I gathered some data and then I analyzed the two data sets. The results were disappointing, not surprising, but a fundamental rule in performance tuning was demonstrated and reinforced.

You can download the "analysis pack" which contains the raw experimental data, histogram, statistical R results and the R statical package commands HERE. You can download total time delta reporting script (ttpctx.sql) I show below, which is contained within my OraPub System Monitor (OSM) Toolkit HERE.

Before I could gather some data, I needed to create the appropriate system load, the data gather scripts and the data analysis scripts. Here's a quick overview of each.

Increasing The LGWR Processes Priority

If you are not familiar with changing Oracle Database background OS process priority, I blogged about how to do this HERE.

My experiment consisted of creating two situations and statistically comparing them to see if increasing the LGWR background process OS priority increased performance. The only difference in the "two situations" was the instance parameter, _high_priority_processes. For the "normal priority" part of the experiment, the default "LMS*|VKTM" was used. For the "high priority" part of the experiment the parameter was changed to "LMS*|VKTM|LG*". The "LG*" caused the increase in the Linux OS priority of all log writer processes from the default 19 to 41.

Ready for some version specifics? Oracle continues to make OS priority related instance parameter changes... even within 12c releases. Since this experiment was done with Oracle Database version 12.1.0.1.0 the parameter default was "LMS*|VKTM" not "LMS*" as with version 12.1.0.2.0. Also, in 12.1.0.2.0 VKTM is placed into a new parameter, _highest_priority_processes.

Generating The DML With CPU Bottleneck Load


To generate the DML workload, I used my OP Load Generator. You can download it HERE. It's quick, easy and I'm familiar with it. I kept increasing the number of DML processes until the CPU utilization was pegged at 100%. While the database server was clearly out of CPU power, the log writer background processes were rarely seen as the top CPU consuming process. I used the "top" program to monitor the process activity. The top CPU consuming processes were almost always the Oracle server/foreground/shadow processes. As I'll explain in the Results section below, this has significant ramifications on the results.

Oracle Time Based Analysis Summary


The data collection script was similar to the one I used back in 2012 when collecting data regarding Oracle's commit write facility. Essentially, I collected multiple three minute samples of the delta elapsed time, user commits, total non-idle wait time and CPU consumption. In both the normal and high priority runs, around 99% of the database time was CPU consumption and 1% of the time Oracle non-idle wait time. Also the top wait event (~70%) was log file parallel write (display name: log file redo write).

If you want master an Oracle Time Based Analysis, check out my online seminar, Tuning Oracle Using An AWR Report. It will teach you how to using an AWR report to optimize Oracle performance so users will feel the difference.

The OraPub System Monitor Toolkit script ttpctx.sql report below was taken during the "normal priority" log writer load.

SQL> @ttpctx

Database: prod35                                                 16-MAY-14 06:07pm
Report:   ttpctx.sql                OSM by OraPub, Inc.             Page         1
                           Total Time Activity (39 sec interval)

                                                         Avg Time        Time     Wait
Time Component                           % TT    % WT Waited (ms)       (sec) Count(k)
------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs   98.91    0.00       0.000     238.716        0
log file redo write                      0.73   67.56       8.082       1.770        0
control file parallel write              0.18   16.41      33.077       0.430        0
target log write size                    0.05    4.20       6.111       0.110        0
oracle thread bootstrap                  0.03    3.05      40.000       0.080        0
os thread creation                       0.02    1.53      20.000       0.040        0
commit: log file sync                    0.01    0.76      10.000       0.020        0
enq: CR - block range reuse ckpt         0.01    0.76      20.000       0.020        0
Disk file operations I/O                 0.00    0.00       0.000       0.000        0

The OSM report below was taken during the "high priority" log writer load.

SQL> @ttpctx

Database: prod35                                                 16-MAY-14 09:25pm
Report:   ttpctx.sql                OSM by OraPub, Inc.             Page         1
                           Total Time Activity (41 sec interval)

                                                         Avg Time        Time     Wait
Time Component                           % TT    % WT Waited (ms)       (sec) Count(k)
------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs   98.92    0.00       0.000     238.733        0
log file redo write                      0.83   77.01       8.272       2.010        0
control file parallel write              0.08    7.28      14.615       0.190        0
target log write size                    0.05    4.98       5.909       0.130        0
oracle thread bootstrap                  0.03    3.07      40.000       0.080        0
os thread creation                       0.02    1.92      25.000       0.050        0
commit: log file sync                    0.01    0.77      10.000       0.020        0
enq: CR - block range reuse ckpt         0.01    0.77      20.000       0.020        0
enq: RO - fast object reuse              0.00    0.38      10.000       0.010        0


Data Collection


For the normal priority load 27 three minute samples where collected. For the high priority situation there were 30 three minute samples collected. (I forgot why there was only 27 samples collected for the normal priority.)  I collected the elapsed time, total non-idle wait time, total CPU consumption (v$sys_time_model: db_cpu + background cpu time) and total user commits.

In this experiment more user commits processed per second means better performance.


Experimental Results


I used the free statistics package "R" (www.r-project.org) to analyze the data. I demonstrate how to get, install and use "R" in my online video seminar, Using Skewed Data To Your Advantage.

With the normal log writer process priority, an average of 984.5 commits/sec and a median of 983.0 commits/sec occurred. With the LG* high process priority, an average of 993.6 commits/sec and a median of 991.0 commits/sec occurred. While the "high priority" situation was able to process more commits per second, is this statistically significant?

The red "smoothed" histogram is the normal priority situation and the blue smoothed histogram is when the log writers were set to the higher priority. The more separated the two histograms the more "different" the sample sets, the more likely there is a statistically significant difference and the more likely a user would feel the difference. Looking at the above histograms plot, there does not appear to be a real difference. But let's do a proper significance test!

Because both sample sets are normally distributed (details are in the Analysis Pack), I could use a simple t-test. R produced a p-value of 0.04451. To be statistically "different" I want the p-value to be less than 0.05 and it is. What does this mean?

While statistically and numerically the commit rates are different, I wouldn't expect any special performance tuning award! In fact, the hassles with cycling a production instance and setting underscore/hidden parameters would make it very unlikely I would increase the OS priority of the log writer background processes. I want to see a big performance difference.

To Summarize... What I Learned... Again


This situation is a perfect example of focusing on the wrong thing! While there is a clear operating system CPU bottleneck and the top wait event is about redo, the log writers are not suffering from a want/need of CPU resources. I suspect the server processes want more CPU resources, but they are NOT the processes we increased their OS priority.

If the log writers were suffering from a lack of CPU resources and fighting for CPU resources, I would expect to see them consuming CPU resources along with the Oracle server processes. And I would definitely expect to see them near the top of the "top" process monitor... especially when their priority has been increased!

Because of this "misguided" tuning effort, this experiment does not build a case for or against changing the log writer priority. What it reinforces is in our quest to optimize performance, make sure we focus on the right thing.

As a side note, this is a great statistical analysis example for two reasons. First, our samples sets look similar, but statistically they are not. Second, while they are statistically different, the performance impact will not be very different. And my guess is the users won't feel a thing... except frustration and anger.

To Super Summarize


When considering increasing a background process's operating system priority, make sure the process is in need of CPU and is not able to get it. In this situation, the DBA could have been mislead by the CPU bottleneck. But upon closer inspection of the log writers from an operating system perspective and knowing the wait event "log file parallel write" is probably more about IO than CPU (Oracle perspective) it would be unlikely that increasing the log writer processes OS priority would help increase the commits per second.

Thanks for reading!

Craig.












2 comments: