Monday, December 29, 2014

Can A Background Process Impact A Foreground Process And Its Database Time?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Can A Background Process Impact A Foreground Process And Its Database Time?


Have you ever heard someone say, "Background processes do not impact foreground processes because they run in the background and in parallel with foreground processes." I've heard this hundreds of times!

While doing some performance research I came across a great example of how an Oracle Database background process can directly and significantly impact a foreground process.

The above quote represents a masterfully constructed lie; it contains both a lie and a truth. The mix of a truth and a lie make understanding the reality of the situation difficult. In this post, I'll explain the truth, delve into the lie and relate it all to foreground process database time.

By the way, I am in no way saying there is something wrong with or incorrect about DB Time. I want to ensure this is clear from the very beginning of this post.

Just so there is no confusion, an Oracle foreground process is sometimes also called a server process or a shadow process. These can terms can be used interchangeably in this post.

The Truth


Clearly background and foreground processes operate in parallel. I don't think any DBA would deny this. As I frequently say, "serialization is death and parallelism is life!" A simple "ps" command will visually show both Oracle background and foreground processes at work. But this in no way implies they do not impact each other's activity and performance.

In fact, we hope they do impact each other! Can you imagine what performance would be with the background processes NOT running in parallel?! What a performance nightmare that would be. But this where the "no impact" lie lives.

The Lie


Most senior DBAs can point to a specific situation where Oracle cache buffer chain latch contention affected multiple foreground sessions. In this situation, foreground sessions were franticly trying to acquire a popular cache buffer chain latch. But this is a foreground session versus foreground session situation. While this is example is important, this post is about when a background process impacts a foreground process.

Have you every committed a transaction and it hangs while the foreground process is waiting on "log file switch (checkpoint incomplete)" or even worse "log file switch (archiving needed)" event? All the foreground process knows is that its statement can't finish because a required log switch has not occurred because a checkpoint is incomplete. What the server process does not know is the checkpoint (CKPT), the database writer (DBWR) and the log writer (LGWR) background processes are involved. There is a good chance the database writer is frantically writing dirty buffers to the database (dbf) files so the LGWR can safely overwrite the associated redo in the next online redo log.

For example, if a server process issued a commit during the checkpoint, it will wait until the checkpoint is complete and the log writer has switched and can write into the next redo log. So, while the log writer background processes is probably waiting on "log file parallel write" and the database writer is burning CPU and waiting on "db file parallel write", the foreground processes are effectively hung.

This is a classic example of how a background process can impact the performance of a foreground process.

A Demonstration Of The Lie


Here's a quick demonstration of the above situation. On an existing database in my lab, I created two 4MB redo logs and dropped all the other redo logs. I started a DML intensive workload. According to the alert.log file, the redo logs where switching every couple of seconds! Take a look at this:
$ tail -f /home/oracle/base/diag/rdbms/prod30/prod30/trace/alert*log
Thread 1 cannot allocate new log, sequence 2365
Checkpoint not complete
  Current log# 4 seq# 2364 mem# 0: /home/oradata/prod30/redoA1.log
Mon Dec 29 11:02:09 2014
Thread 1 advanced to log sequence 2365 (LGWR switch)
  Current log# 5 seq# 2365 mem# 0: /home/oradata/prod30/redoA2.log
Thread 1 cannot allocate new log, sequence 2366
Checkpoint not complete
  Current log# 5 seq# 2365 mem# 0: /home/oradata/prod30/redoA2.log
Thread 1 advanced to log sequence 2366 (LGWR switch)
  Current log# 4 seq# 2366 mem# 0: /home/oradata/prod30/redoA1.log
Thread 1 cannot allocate new log, sequence 2367
Checkpoint not complete
  Current log# 4 seq# 2366 mem# 0: /home/oradata/prod30/redoA1.log
Thread 1 advanced to log sequence 2367 (LGWR switch)
  Current log# 5 seq# 2367 mem# 0: /home/oradata/prod30/redoA2.log
Thread 1 cannot allocate new log, sequence 2368
Checkpoint not complete
  Current log# 5 seq# 2367 mem# 0: /home/oradata/prod30/redoA2.log
Mon Dec 29 11:02:20 2014

Obviously not what you want to see on a production Oracle system! (But my guess many of you have.)

Using my OSM realtime session sampler tool (rss.sql - related blog posting HERE) I sampled the log writer every half a second. (There is only one log writer background process because this is an Oracle 11g database, not an Oracle Database 12c system.) If the log writer session showed up in v$session as an active session, it would be picked up by rss.sql.  Both "ON CPU" and "WAIT" states are collected. Here is a sample of the output.


It's very obvious the log writer is doing some writing. But we can't tell from the above output if the process is impacting other sessions. It would have also been very interesting to sample the database writer also, but I didn't do that. To determine if the background processes are impacting other sessions, I needed to find a foreground session that was doing some commits. I noticed that session 133, a foreground process was busy doing some DML and committing as it processed its work. Just as with the log writer background process, I sampled this foreground process once every 0.5 second. Here's a sample of the output.


Wow. The foreground process is waiting a lot for the current checkpoint to be completed! So... this means the foreground process is being effectively halted until the background processes involved with the checkpoint have finished their work.

This is a great example of how Oracle background processes can impact the performance of an Oracle foreground process.

But let's be clear. Without the background processes, performance would be even worse. Why? Because all that work done in parallel and in the background would have to be done by each foreground process AND all that work would have to be closely controlled and coordinated. And that, would be a performance nightmare!

DB Time Impact On The Foreground Process


Just for the fun of it, I wrote a script to investigate DB Time, CPU consumption, non-idle wait time and the wait time for the "log file switch wait (checkpoint incomplete)" wait event for the foreground process mentioned above (session 133). The script simply gathers some session details, sleeps for 120 seconds, again gathers some session details, calculates the differences and displays the results. You can download the script HERE. Below is the output for the foreground process, session 133.
SQL> @ckpttest.sql 133

Table dropped.

Table created.

PL/SQL procedure successfully completed.

CPU_S_DELTA NIW_S_DELTA DB_TIME_S_DELTA CHECK_IMPL_WAIT_S
----------- ----------- --------------- -----------------
      2.362      117.71      119.973692            112.42

1 row selected.

Here is a quick description of the output columns.

  • CPU_S_DELTA is the CPU seconds consumed by session 133, which is the time model statistic DB CPU.
  • NIW_S_DELTA is the non-idle wait time for session 133, in seconds.
  • DB_TIME_S_DELTA is the DB Time statistic for session 133, which is the time model statistic DB Time.
  • CHECK_IMPL_WAIT_S is the wait time only for event "log file switch (checkpoint incomplete)" for session 133, in seconds.

Does the time fit together as we expect? The "log file switch..." wait time is part of the non-idle wait time. The DB Time total is very close to the CPU time plus the non-idle wait time. Everything seems to add up nicely.

To summarize: Oracle background processes directly impacted the database time for a foreground process.

In Conclusion...


First, for sure Oracle foreground and background processes impact each other...by design for increased performance. Sometimes on real production Oracle Database systems things get messy and work that we hoped would be done in parallel must become momentarily serialized. The log file switch example above, is an example of this.

Second, the next time someone tells you that an Oracle background process does not impact the performance of a foreground process, ask them if they have experienced a "log file switch checkpoint incomplete" situation. Pause until they say, "Yes." Then just look at them and don't say a word. After a few seconds you may see a "oh... I get it." look on their face. But if not, simply point them to this post.

Thanks for reading and enjoy your work!

Craig.




Monday, December 22, 2014

Is Oracle Database Time Correct? Something Is Not Quite Right.

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Is Oracle Database Time Correct? Something Is Not Quite Right.


Oracle Database tuning and performance analysis is usually based on time. As I blogged HERE, the Oracle "database time" statistic is more interesting than simply "time spent in the database." It is the sum of CPU consumption and non-idle wait time. And Elapsed Time is the sum of all the database time related to perhaps a session or a SQL statement execution. However...

If you do the fundamental math, you'll notice the numbers don't always add up. In fact, they rarely match. In this posting, I want to demonstrate this mismatch and I want you to see this on your systems!

I'll include experimental data from a number of different systems, the statistical analysis (including pictures) and provide a tool you can download for free from OraPub.com to check out the situation on your systems.

Checking DB Time Math


DB Time is defined as "time spent in the database." This is the sum of Oracle process CPU consumption plus non-idle wait time. Usually we don't derive DB Time. The Time Model Statistics view v$sys_time_mode contains the DB Time statistic. But this view also contains the DB CPU statistic. Since there is no sum of non-idle wait time, most people just assume everything is fine.

However, if you run the simple query below on your system, it could look something this:
SQL> l
  1  select db_time_s, db_cpu_s, tot_ni_wt_s
  2  from (select value/1000000 db_time_s from v$sys_time_model where stat_name = 'DB time' ),
  3   (select value/1000000 db_cpu_s from v$sys_time_model where stat_name = 'DB CPU' ),
  4*  (select sum(TIME_WAITED_MICRO_FG)/1000000 tot_ni_wt_s from v$system_event where wait_class != 'Idle' )
SQL> /

 DB_TIME_S   DB_CPU_S TOT_NI_WT_S
---------- ---------- -----------
330165.527 231403.925  119942.952

1 row selected.
If you add up the DB CPU and the total non-idle wait time, the value is 351,346.877. Woops! 351K does not equal 330K. What happened on my Oracle Database 12c (12.1.0.2.0)? As I have demonstrated in this POSTING (which contains videos of this) and in my online seminar training HERE, many times DB Time does nearly equal DB CPU plus the non-idle wait time. But clearly in the above situation something does not seem quite right.

Checking DB Time On Your Systems


To demonstrate the possibility of a DB Time mismatch, I created a simple plsql tool. You can download this free tool or do an OraPub.com search for "db time tool". The tool, which is easily configurable, takes a number of samples over a period of time and displays the output.


Here is an example of the output.

OraPub DB Time Test v1a 26-Sep-2014. Enjoy but use at your own risk.
.
Starting to collect 11 180 second samples now...
All displayed times are in seconds.
.
anonymer Block abgeschlossen
..........................................................................
... RAW OUTPUT (keep the output for your records and analysis)
..........................................................................
.
sample#,  db_time_delta_v ,  db_cpu_delta_v,  tot_ni_wait_delta_v, derived_db_time_delta_v, diff_v, diff_pct_v
.
1, 128,4, 128,254, ,103, 128,357266, ,043, 0
2, 22,014, 3,883, 17,731, 21,614215, ,399, 1,8
3, 1,625, 1,251, ,003, 1,253703, ,371, 22,8
4, 13,967, 12,719, 1,476, 14,194999, -,228, -1,6
5, 41,086, 41,259, ,228, 41,486482, -,4, -1
6, 36,872, 36,466, ,127, 36,593884, ,278, ,8
7, 38,545, 38,71, ,137, 38,847459, -,303, -,8
8, 37,264, 37,341, ,122, 37,463525, -,199, -,5
9, 22,818, 22,866, ,102, 22,967141, -,149, -,7
10, 30,985, 30,614, ,109, 30,723831, ,261, ,8
11, 5,795, 5,445, ,513, 5,958586, -,164, -2,8
.
The test is complete.
.
All displayed times are in seconds.

The output is formatted to make it easy to statistically analyze. The far right column is percent difference between the reported DB Time and the calculated DB Time. In the above example, they are pretty close. Get the tool and try it out on your systems.

Some Actual Examples


I want to quickly show you four examples from a variety of systems. You can download all the data in the "analysis pack" HERE. The data, for each of the four systems, contains the raw DB Time Test output (like in the section above), the statistical numeric analysis output from the statistical package "R", the actual "R" script and the visual analysis using "smooth histograms" also created using "R."

Below is the statistical numeric summary:


About the columns: Only the "craig" system is mine and other are real production or DEV/QA systems. The statistical columns all reference the far right column of the DB Time Test Tool's output, which is the percent difference between the reported DB Time and the calculated DB Time. Each sample set consists of eleven 180 second samples. The P-Value greater than 0.05 means the reported and calculated DB Time differences are normally distributed. This is not important in this analysis, but gives me clues if there is a problem with the data collection.

As you can easily see, two of the system's "DB Times" difference is greater than 10% and one of them was over 20%. The data collected shows that something is not quite right... but that's about it.

What Does This Mean In Our Work?


Clearly something is not quite right. There are a number of possible reasons and this will be focus of my next few articles.

However, I want to say that even though the numbers don't match perfectly and sometimes they are way off, this does not negate the value of a time based analysis. Remember, we not trying to land a man on the moon. We try diagnosing performance to derive solutions that (usually) aim to reduce the database time. I suspect that in all four cases I show, we would not be misled.

But this does highlight the requirement to also analysis performance from a non-Oracle database centric perspective. I always look at the performance situation from an operating system perspective, an Oracle centric perspective and an application (think: SQL, processes, user experience, etc.) perspective. This "3 Circle" analysis will reduce the likelihood of making a tuning diagnosis mistake. So in case DB Time is completely messed up, by diagnosing performance from the other two "circles" you will know something is not right.

If you want to learn more about my "3-Circle" analysis, here are two resources:
  1. Paper. Total Performance Management. Do an OraPub search for "3 circle" and you'll find it.
  2. Online Seminar: Tuning Oracle Using An AWR Report. I go DEEP into an Oracle Time Based Analysis but keeping it day-to-day production system practical.
In my next few articles I will drill down into why there can be a "DB Time mismatch," what to do about it and how to use this knowledge to our advantage.

Enjoy your work! There is nothing quite like analyzing performance and tuning Oracle database systems!!

Craig.





Monday, December 8, 2014

Changing The Number Of Oracle Database 12c Log Writers

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Changing The Number Of Oracle Database 12c Log Writers


In an Oracle Database 12c instance you will likely see multiple log writer (LGWR) background processes. When you first start the Oracle instance you will likely see a parent and two redo workers. This is a very big deal and something many of us have been waiting for - for many years!

While I'm excited about the change, if I can't control the number of LGWRs I could easily find myself once again constrained by the lack of LGWRs!

So, my question is how do I manipulate the number of LGWRs from the default. And what is the default based on? It's these types of questions that led me on this quest. I hope you enjoy the read!


Serialization Is Death


Multiple LGWRs is great news because serialization is death to computing performance. Think of it like this. A computer program is essentially lines of code and each line of code takes a little bit of time to execute. A CPU can only process N lines of code per second. This means every serial executing program has a maximum through capability. With a single log writer (LGWR) background process the amount of redo that can be processed is similarly constrained.

An Example Of Serialization Throughput Limitation


Suppose a CPU can process 1000 instructions per millisecond. Also, assume through some research a DBA determined it takes the LGWR 10 instructions to process 10 KB of redo. (I know DBAs who have taken the time to figure this stuff out.) Given these two pieces of data, how many KB of redo can the CPU theoretically process per second?

? KB of redo/sec = (1000 inst / 1 ms)*(10 KB redo / 10 instr)*(1000 ms / 1 sec)* (1 MB / 1000 KB) = 1000 KB redo/sec

This is a best case scenario. As you can see, any sequential process can become a bottleneck. One solution to this problem is to parallelize.

Note: Back in April of 2010 I posted a series of articles about parallelism. If you are interested in this topic, I highly recommend you READ THE POSTS.

Very Cool! Multiple 12c LGWRs... But Still A Limit?


Since serialization is death... and parallelism is life, I was really excited when I saw on my 12c Oracle instance by default it had two redo workers in addition to the "parent" log writer. On my Oracle version 12.0.1.0.2.0 Linux machine this is what I see:
$ ps -eaf|grep prod40 | grep ora_lg
oracle   54964     1  0 14:37 ?        00:00:00 ora_lgwr_prod40
oracle   54968     1  0 14:37 ?        00:00:00 ora_lg00_prod40
oracle   54972     1  0 14:37 ?        00:00:00 ora_lg01_prod40

This is important. While this is good news, unless Oracle or I have the ability to change and increase the number of LGWR redo workers, at some point the two redo workers, will become saturated bringing us back to the same serial LGWR process situation. So, I want and need some control.

Going Back To Only One LGWR


Interestingly, starting in Oracle Database version 12.0.1.0.2.0 there is an instance parameter _use_single_log_writer. I was able to REDUCE the number LGWRs to only one by setting the instance parameter _use_single_log_writer=TRUE. But that's the wrong direction I want to go!

More Redo Workers: "CPU" Instance Parameters


I tried a variety of CPU related instance parameters with no success. Always two redo workers.

More Redo Workers: Set Event...


Using my OSM script listeventcodes.sql I scanned the Oracle events (not wait events) but was unable to find any related Oracle events. Bummer...

More Redo Workers: More Physical CPUs Needed?


While talking to some DBAs about this, one of them mentioned they heard Oracle sets the number of 12c log writers is based on the number of physical CPUs. Not the number CPU cores but the number of physical CPUs. On a Solaris box with 2 physical CPUs (verified using the command, psrinfo -pv) upon startup there was still on two redo workers.

$ psrinfo -p
2
$ psrinfo -pv
The physical processor has 1 virtual processor (0)
  UltraSPARC-III (portid 0 impl 0x14 ver 0x3e clock 900 MHz)
The physical processor has 1 virtual processor (1)
  UltraSPARC-III (portid 1 impl 0x14 ver 0x3e clock 900 MHz)

More Redo Workers: Adaptive Behavior?


Looking closely at the Solaris LGWR trace file I repeatedly saw this:

Created 2 redo writer workers (2 groups of 1 each)
kcrfw_slave_adaptive_updatemode: scalable->single group0=375 all=384 delay=144 r
w=7940

*** 2014-12-08 11:33:39.201
Adaptive scalable LGWR disabling workers
kcrfw_slave_adaptive_updatemode: single->scalable redorate=562 switch=23

*** 2014-12-08 15:54:10.972
Adaptive scalable LGWR enabling workers
kcrfw_slave_adaptive_updatemode: scalable->single group0=1377 all=1408 delay=113
 rw=6251

*** 2014-12-08 22:01:42.176
Adaptive scalable LGWR disabling workers

It looks to me like Oracle has programed in some sweeeeet logic to adapt the numbers of redo workers based the redo load.

So I created six Oracle sessions that simply inserted rows into a table and ran all six at the same time. But it made no difference in the number of redo workers. No increase or decrease or anything! I let this dml load run for around five minutes. Perhaps that wasn't long enough, the load was not what Oracle was looking for or something else. But the number of redo workers always remained at two.

Summary & Conclusions


It appears at instance startup the default number of Oracle Database 12c redo workers is two. It also appears that Oracle has either already built or is building the ability for Oracle to adapt to changing redo activity by enabling and disabling redo workers. Perhaps the number of physical CPUs (not CPU cores but physical CPUs) plays a part in this algorithm.

While this was not my research objective, I did discover a way to set the number of redo workers back to the traditional single LGWR background process.

While I enjoyed doing the research for this article, it was disappointing that I was unable to influence Oracle to increase the number of redo workers. I sure hope Oracle either gives me control or the adaptive behavior actually works. If not, two redo workers won't be enough for many Oracle systems.

All the best in your Oracle performance endeavors!

Craig.


Monday, December 1, 2014

The Perfect Gift For The Oracle DBA: Top 5 DBA T-Shirts

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

The Perfect Gift For The Oracle DBA: Top 5 DBA T-Shirts


It's that time of year again and I can already hear it, "Dad, what do you want for Christmas?" This year I'm taking action. Like forecasting Oracle performance, I'm taking proactive action.

Like most of you reading this, you have a, let's say, unique sense of humor. I stumbled across the ultimate geek website that has an astonishing variety of t-shirts aimed at those rare individuals like us that get a rush in understanding the meaning of an otherwise cryptic message on a t-shirt.

I picked my Top 5 DBA Geek T-Shirts based on the challenges, conflicts and joys of being an Oracle DBA. With each t-shirt I saw, a story came to mind almost immediately. I suspect you will have a similar experience that rings strangely true.

So here they are—the Top 5 T-Shirts For The Oracle DBA:


Number 5: Change Your Password


According to Slash Data the top password is now "Password".  I guess the upper-case "P" makes people feel secure, especially since last years top password was "123456" and EVERYBODY knows thats a stupid password. Thanks to new and improved password requirements, the next most popular password is "12345678". Scary but not surprising.

As Oracle Database Administrators and those who listened to Troy Ligon's presentation last years IOUG conference presentation, passwords are clearly not safe. ANY passwords. Hopefully in the coming years, passwords will be a thing of the past.


Number 4: Show Your Work


Part of my job as a teacher and consultant is to stop behavior like this: I ask a DBA, "I want to understand why you want to make this change to improve performance." And the reply is something like one of these:

  1. Because it has worked on our other systems.
  2. I did a Google search and an expert recommended this.
  3. Because the box is out of CPU power, there is latching issues, so increasing spin_count will help.
  4. Because we have got to do something and quick!

I teach Oracle DBAs to think from the user experience to the CPU cycles developing a chain of cause and effect. If we can understand the cause and effect relationships, perhaps we can disrupt poor performance and turn it to our favor. "Showing your work" and actually writing it down can be really helpful.

Number 3: You Read My T-Shirt


Why do managers and users think their presence in close proximity to mine will improve performance or perhaps increase my productivity? Is that what they learn in Hawaii during "end user training"?

What's worse is when a user or manager wants to talk about it...while I'm obviously in concentrating on a serious problem.

Perhaps if I wear this t-shirt, stand up, turn around and remain silent they will stop talking and get the point. We can only hope.

Number 2: I'm Here Because You Broke Something


Obnoxious but true. Why do users wonder why performance is "slow" when they do a blind query returning ten-million rows and then scroll down looking for the one row they are interested in.... Wow. The problem isn't always the technology... but you know that already.

Hint to Developers: Don't let users do a drop down or a lookup that returns millions or even thousands or even hundreds of rows... Please for the love of performance optimization!


Number 1 (drum roll): Stand Back! I'm Going To Try SCIENCE


One of my goals in optimizing Oracle Database performance is to be quantitative. And whenever possible, repeatable. Add some basic statistics and you've got science. But stand back because, as my family tells me, it does get a little strange sometimes.

But seriously, being a "Quantitative Oracle Performance Analyst" is always my goal because my work is quantifiable, reference-able and sets me up for advanced analysis.


So there you go! Five t-shirts for the serious and sometimes strange Oracle DBA. Not only will these t-shirts prove and reinforce your geeky reputation, but you'll get a small yet satisfying feeling your job is special...though a little strange at times.

All the best in your Oracle performance endeavors!

Craig.

Monday, November 17, 2014

Off May Not Be Totally Off: Is Oracle In-Memory Database 12c (12.1.0.2.0) Faster?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Off May Not Be Totally Off: Is Oracle In-Memory Database 12c (12.1.0.2.0) Faster?


Most Oracle 12c installations will NOT be using the awesome Oracle Database in-memory features available starting in version 12.1.0.2.0. This experiment is about the performance impact of upgrading to 12c but disabling the in-memory features.

Every experiment I have performed comparing buffer processing rates, clearly shows any version of 12c performs better than 11g. However, in my previous post, my experiment clearly showed a performance decrease after upgrading from 12.1.0.1.0 to 12.1.0.2.0.

This posting is about why this occurred and what to do about it. The bottom line is this: make sure "off" is "totally off."

Turn it totally off, not partially off


What I discovered is by default the in-memory column store feature is not "totally disabled." My experiment clearly indicates that unless the DBA takes action, not only could they be a license agreement violation but a partially disabled in-memory column store slightly slows logical IO processing compared to the 12c non in-memory column store option. Still, any 12c version processes buffer faster than 11g.

My experiment: specific and targeted


This is important: The results I published are based on a very specific and targeted test and not on a real production load. Do not use my results in making a "should I upgrade decision." That would be stupid and an inappropriate use of the my experimental results. But because I publish every aspect of my experiment and it is easily reproducible it is a valid data point with which to have a discussion and also highlight various situations that DBAs need to know about.

You can download all my experimental results HERE. This includes the raw sqlplus output, the data values, the free R statistics package commands, spreadsheet with data nicely formatted and lots of histograms.

The instance parameter settings and results


Let me explain this by first showing the instance parameters and then the experimental results. There are some good lessons to learn!

Pay close attention to the inmemory_force and inmemory_size instance parameters.

SQL> show parameter inmemory

NAME         TYPE  VALUE
------------------------------------ ----------- ------------------------------
inmemory_clause_default       string
inmemory_force        string  DEFAULT
inmemory_max_populate_servers      integer  0
inmemory_query        string  ENABLE
inmemory_size        big integer 0
inmemory_trickle_repopulate_servers_ integer  1
percent
optimizer_inmemory_aware      boolean  TRUE

SQL> show sga

Total System Global Area 7600078848 bytes
Fixed Size      3728544 bytes
Variable Size   1409289056 bytes
Database Buffers  6174015488 bytes
Redo Buffers     13045760 bytes

In my experiment using the above settings the median buffers processing rate was 549.4 LIO/ms. Looking at the inmemory_size and the SGA contents, I assumed the in-memory column store was disabled. If you look at the actual experimental result file "Full ds2-v12-1-0-2-ON.txt", which contain the explain plan of the SQL used in the experiment, there is no mention of the in-memory column store being used. My assumption, which I think is a fair one, was that the in-memory column store had been disabled.

As you'll see I was correct, but only partially correct.

The parameter settings below are when the in-memory column store was totally disabled. They key is changing the default inmemory_force parameter value from DEFAULT to OFF.

SQL> show parameter inmemory

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
inmemory_clause_default              string
inmemory_force                       string      OFF
inmemory_max_populate_servers        integer     0
inmemory_query                       string      ENABLE
inmemory_size                        big integer 0
inmemory_trickle_repopulate_servers_ integer     1
percent
optimizer_inmemory_aware             boolean     TRUE
SQL> show sga

Total System Global Area 7600078848 bytes
Fixed Size                  3728544 bytes
Variable Size            1291848544 bytes
Database Buffers         6291456000 bytes
Redo Buffers               13045760 bytes

Again, the SGA does not show any in-memory memory space. In my experiment with the above "totally off" settings, the median buffers processing rate was 573.5 LIO/ms compared to "partially off" 549.4 LIO/ms. Lesson: Make sure off is truly off.

It is an unfair comparison!


It is not fair to compare the "partially off" with the "totally off" test results. Now that I know the default inmemory_force must be changed to OFF, the real comparison should be made with the non in-memory column store version 12.1.0.1.0 and the "totally disabled" in-memory column store version 12.1.0.2.0. This is what I will summarize below. And don't forget all 12c versions showed a significant buffer processing increase compared to 11g.

The key question: Should I upgrade?


You may be thinking, if I'm NOT going to license and use the in-memory column store, should I upgrade to version 12.1.0.2.0? Below is a summary of my experimental results followed by the key points.


1. The non column store version 12.1.0.1.0 was able to process 1.1% more buffers/ms (median: 581.7 vs 573.5) compared to to "totally disabled" in-memory column store version 12.1.0.2.0. While this is statistically significant, a 1.1% buffer processing difference is probably not going to make-or-break your upgrade.

2. Oracle Corporation, I'm told, knows about this situation and is working on a fix. But even if they don't fix it, in my opinion my experimental "data point" would not warrant not upgrading to the in-memory column store version 12.1.0.2.0 even if you are NOT going to use the in-memory features.

3. Visually (see below) the non in-memory version 12.1.0.1.0 and the "totally off" in-memory version 12.1.0.2.0 samples sets look different. But they are pretty close. And as I mentioned above, statistically they are "different."

Note for the statistically curious: The red color 12.1.0.1.0 non in-memory version data set is highly variable. I don't like to see this in my experiments. Usually this occurs when a mixed workload sometimes impacts performance, I don't take enough samples or my sample time duration is too short. To counteract this, in this experiment I captured 31 samples. I also performed the experiment multiple times and the results where similar. What I could have done was used more application data to increase the sample duration time. Perhaps that would have made the data clearer. I could have also used another SQL statement and method to create the logical IO load.

What I learned from this experiment


To summarize this experiment, four things come to mind:

1. If you are not using an Oracle Database feature, completely disable it. My mistake was thinking the in-memory column store was disabled when I set it's memory size to zero and "confirmed" it was off by looking at the SGA contents.

2. All versions of 12c I have tested are clearly faster at processing buffers than any version of 11g.

3. There is a very slight performance decrease when upgrading from Oracle Database version 12.1.0.1.0 to 12.1.0.2.0.

4. It is amazing to me that with all the new features poured into each new Oracle Database version the developers have been able to keep the core buffer processing rate nearly at or below the previous version. That is an incredible accomplishment. While some people may view this posting as a negative hit against the Oracle Database, it is actually a confirmation about how awesome the product is.

All the best in your Oracle performance tuning work!

Craig.




Monday, November 10, 2014

Is Oracle Database 12c (12.1.0.2.0) Faster Than Previous Releases?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Is Oracle Database 12c (12.1.0.2.0) Faster Than Previous Releases?


I was wondering if the new Oracle Database 12c version 12.1.0.2.0 in-memory column store feature will SLOW performance when it is NOT being used. I think this is a fair question because most Oracle Database systems will NOT be using this feature.

While the new in-memory column store feature is awesome and significant, with each new Oracle feature there is additional kernel code. And if Oracle is not extremely careful, these new lines of Oracle kernel code can slow down the core of Oracle processing, that is, buffer processing in Oracle's buffer cache.

Look at it this way, if a new Oracle release requires 100 more lines of kernel code to be executed to process a single buffer, that will be reflected in how many buffers Oracle can process per second.

To put bluntly, this article is the result of my research comparing core buffer processing rates between Oracle Database versions 11.2.0.2.0, 12.1.0.1.0 and 12.1.0.2.0.

With postings like this, it is very important for everyone to understand the results I publish are based on a very specific and targeted test and not on a real production load. Do not use my results in making a "should I upgrade decision." That would be stupid and an inappropriate use of the my experimental results. But because I publish every aspect of my experiment and it is easily reproducable it is valid data point with which to have a discussion and also highlight various situations that DBAs need to know about.

There are two interesting results from this research project. This article is about the first discovery and my next article will focus on the second. The second is by far the most interesting!

FYI. Back in August of 2013 performed a similar experiment where I compared Oracle database versions 11.2.0.2.0 with 12.1.0.1.0. I posted the article HERE.

Why "Faster" Means More Buffer Gets Processed Per Second


For this experiment when I say "faster" I am referring to raw buffered block processing. When a buffer is touched in the buffer cache it is sometimes called a buffer get or a logical IO. But regardless of the name, every buffer get increases the instance statistic, session logical reads.

I like raw logical IO processing experiments because they are central to all Oracle Database processing. Plus with each new Oracle release, as additional functionality is inserted it is likely more lines of Oracle kernel code will exist. To maintain performance with added functionality is an incredible feat. It's more likely the core buffer processing will be slower because of the new features. Is this case with Oracle's in-memory column store?

How I Setup The Experiment


I have included all the detailed output, scripts, R commands and output, data plots and more in the Analysis Pack that can be downloaded HERE.

There are a lot of ways I could have run this experiment. But two key items must exist for a fare comparison. First, all the processing must be in cache. There can be no physical read activity. Second, the same SQL must be run during the experiment and have the same execution plan. This implies the Oracle 12c column store will NOT be used. A different execution plan is considered "cheating" as a bad plan will clearly loose. Again, this is a very targeted and specific experiment.

The experiment compares the buffer get rates for a given SQL statement. For each Oracle version, I gathered 33 samples and excluded the first two, just to ensure caching was not an issue. The SQL statement runs for around 10 seconds, processes around 10.2M rows and touches around 5M buffers. I checked to ensure the execution plans are the same for each Oracle version. (Again, all the details are in the Analysis Pack for your reading pleasure.)

I ran the experiment on a Dell server. Here are the details:
$ uname -a
Linux sixcore 2.6.39-400.17.1.el6uek.x86_64 #1 SMP Fri Feb 22 18:16:18 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
To make this easier for myself, to perform the test I used my CPU Speed Test tool (version 1i). I blogged about this last month HERE. The latest version of this tool can be downloaded HERE.

The Results, Statistically


Shown below are the experimental results. Remember, the statistic I'm measuring is buffer gets per millisecond.


Details about the above table: The "Normal" column is about if the statistical distribution of the 31 samples is normal. If the p-value (far right column) is greater than 0.05 then I'll say they are normal. In all three cases, the p-value is less than 0.05. If fact, if you look at the histograms contained in the Analysis Pack every histogram is visually clearly not normal. As you would expect the "Average" and the "Median" are the statistical mean and median. The "Max" is the largest value in the sample set. The "Std Dev" is the standard deviation, which is doesn't mean much since our sample sets are not normally distributed.

As I blogged about before the Oracle Database 12c buffer processing is faster than Oracle Database 11g. However, the interesting part is Oracle version with in-memory column store 12.1.0.2.0 is slower then the previous version of 12c, 12.1.0.1.0. In fact, in my experiment the in-memory column store version is around 5.5% slower! This means version 12.1.0.1.0 "out of the box" can process logical buffers around 5.5% faster! Interesting.

In case you're wondering, I used the default out-of-the-box in-memory column store settings for version 12.1.0.2.0. I checked the in-memory size parameter, inmemory_size and it was indeed set to zero. Also, when I startup the Oracle instance there is no mention of the in-memory column store.

Statistically Comparing Each Version


As an important side bar, I did statistically compare the Oracle Database versions. Why? Because while a 5.5% decrease in buffer throughput may seem important, it may not be statistically significant, meaning this difference can not be explained with our sample sets.

So going around saying version 12.1.0.2.0 is "slower" by 5.5% would be misleading. But in my experiment, it would NOT be misleading because the differences in buffer processing are statistically significant. The relevant experimental details are shown below.

Version A   Version B   Statistical  p-value
                         Difference
----------  ----------  -----------  -------
11.2.0.1.0  12.1.0.1.0      YES       0.0000
11.2.0.1.0  12.1.0.2.0      YES       0.0000
12.1.0.1.0  12.1.0.2.0      YES       0.0000

In all three cases the p-value was less than 0.05 signifying the two sample sets are statistically
different. Again, all the details are in the Analysis Pack.

The chart above shows the histograms of both Oracle Database 12c version sample sets together. Visually they look very separated and different with no data crossover. So from both a numeric and visual perspective there is a real difference between 12.1.0.1.0 and 12.1.0.2.0.


What Does This Mean To Me


To me this is surprising. First, there is a clear buffer processing gain upgrading from Oracle 11g to 12c. That is awesome news! But I was not expecting a statistically significant 5.5% buffer processing decrease upgrading to the more recent 12.1.0.2.0 version. Second, this has caused me to do a little digging to perhaps understand the performance decrease. The results of my experimental journey are really interesting...I think more interesting than this posting! But I'll save the details for my next article.

Remember, if you have any questions or concerns about my experiment you can run the experiment yourself. Plus all the details of my experiment are included in the Analysis Pack.

All the best in your Oracle performance tuning work!

Craig.





Saturday, November 1, 2014

Does Increasing An Oracle Background Process OS Priority Improve Performance?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Does Increasing An Oracle Background Process OS Priority Improve Performance?


Does increasing an Oracle Database background process operating system priority improve performance? As you might expect, the answer is, "It depends."

In this posting I will explain the results of an experiment where I increase the Oracle Database 12c log writer background processes operating system priority.

In my experiment I created a clear CPU bottleneck and the top wait event was log file parallel write. I gathered some data. Next I increased all the log writer background process priorities. Again, I gathered some data and then I analyzed the two data sets. The results were disappointing, not surprising, but a fundamental rule in performance tuning was demonstrated and reinforced.

You can download the "analysis pack" which contains the raw experimental data, histogram, statistical R results and the R statical package commands HERE. You can download total time delta reporting script (ttpctx.sql) I show below, which is contained within my OraPub System Monitor (OSM) Toolkit HERE.

Before I could gather some data, I needed to create the appropriate system load, the data gather scripts and the data analysis scripts. Here's a quick overview of each.

Increasing The LGWR Processes Priority

If you are not familiar with changing Oracle Database background OS process priority, I blogged about how to do this HERE.

My experiment consisted of creating two situations and statistically comparing them to see if increasing the LGWR background process OS priority increased performance. The only difference in the "two situations" was the instance parameter, _high_priority_processes. For the "normal priority" part of the experiment, the default "LMS*|VKTM" was used. For the "high priority" part of the experiment the parameter was changed to "LMS*|VKTM|LG*". The "LG*" caused the increase in the Linux OS priority of all log writer processes from the default 19 to 41.

Ready for some version specifics? Oracle continues to make OS priority related instance parameter changes... even within 12c releases. Since this experiment was done with Oracle Database version 12.1.0.1.0 the parameter default was "LMS*|VKTM" not "LMS*" as with version 12.1.0.2.0. Also, in 12.1.0.2.0 VKTM is placed into a new parameter, _highest_priority_processes.

Generating The DML With CPU Bottleneck Load


To generate the DML workload, I used my OP Load Generator. You can download it HERE. It's quick, easy and I'm familiar with it. I kept increasing the number of DML processes until the CPU utilization was pegged at 100%. While the database server was clearly out of CPU power, the log writer background processes were rarely seen as the top CPU consuming process. I used the "top" program to monitor the process activity. The top CPU consuming processes were almost always the Oracle server/foreground/shadow processes. As I'll explain in the Results section below, this has significant ramifications on the results.

Oracle Time Based Analysis Summary


The data collection script was similar to the one I used back in 2012 when collecting data regarding Oracle's commit write facility. Essentially, I collected multiple three minute samples of the delta elapsed time, user commits, total non-idle wait time and CPU consumption. In both the normal and high priority runs, around 99% of the database time was CPU consumption and 1% of the time Oracle non-idle wait time. Also the top wait event (~70%) was log file parallel write (display name: log file redo write).

If you want master an Oracle Time Based Analysis, check out my online seminar, Tuning Oracle Using An AWR Report. It will teach you how to using an AWR report to optimize Oracle performance so users will feel the difference.

The OraPub System Monitor Toolkit script ttpctx.sql report below was taken during the "normal priority" log writer load.

SQL> @ttpctx

Database: prod35                                                 16-MAY-14 06:07pm
Report:   ttpctx.sql                OSM by OraPub, Inc.             Page         1
                           Total Time Activity (39 sec interval)

                                                         Avg Time        Time     Wait
Time Component                           % TT    % WT Waited (ms)       (sec) Count(k)
------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs   98.91    0.00       0.000     238.716        0
log file redo write                      0.73   67.56       8.082       1.770        0
control file parallel write              0.18   16.41      33.077       0.430        0
target log write size                    0.05    4.20       6.111       0.110        0
oracle thread bootstrap                  0.03    3.05      40.000       0.080        0
os thread creation                       0.02    1.53      20.000       0.040        0
commit: log file sync                    0.01    0.76      10.000       0.020        0
enq: CR - block range reuse ckpt         0.01    0.76      20.000       0.020        0
Disk file operations I/O                 0.00    0.00       0.000       0.000        0

The OSM report below was taken during the "high priority" log writer load.

SQL> @ttpctx

Database: prod35                                                 16-MAY-14 09:25pm
Report:   ttpctx.sql                OSM by OraPub, Inc.             Page         1
                           Total Time Activity (41 sec interval)

                                                         Avg Time        Time     Wait
Time Component                           % TT    % WT Waited (ms)       (sec) Count(k)
------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs   98.92    0.00       0.000     238.733        0
log file redo write                      0.83   77.01       8.272       2.010        0
control file parallel write              0.08    7.28      14.615       0.190        0
target log write size                    0.05    4.98       5.909       0.130        0
oracle thread bootstrap                  0.03    3.07      40.000       0.080        0
os thread creation                       0.02    1.92      25.000       0.050        0
commit: log file sync                    0.01    0.77      10.000       0.020        0
enq: CR - block range reuse ckpt         0.01    0.77      20.000       0.020        0
enq: RO - fast object reuse              0.00    0.38      10.000       0.010        0


Data Collection


For the normal priority load 27 three minute samples where collected. For the high priority situation there were 30 three minute samples collected. (I forgot why there was only 27 samples collected for the normal priority.)  I collected the elapsed time, total non-idle wait time, total CPU consumption (v$sys_time_model: db_cpu + background cpu time) and total user commits.

In this experiment more user commits processed per second means better performance.


Experimental Results


I used the free statistics package "R" (www.r-project.org) to analyze the data. I demonstrate how to get, install and use "R" in my online video seminar, Using Skewed Data To Your Advantage.

With the normal log writer process priority, an average of 984.5 commits/sec and a median of 983.0 commits/sec occurred. With the LG* high process priority, an average of 993.6 commits/sec and a median of 991.0 commits/sec occurred. While the "high priority" situation was able to process more commits per second, is this statistically significant?

The red "smoothed" histogram is the normal priority situation and the blue smoothed histogram is when the log writers were set to the higher priority. The more separated the two histograms the more "different" the sample sets, the more likely there is a statistically significant difference and the more likely a user would feel the difference. Looking at the above histograms plot, there does not appear to be a real difference. But let's do a proper significance test!

Because both sample sets are normally distributed (details are in the Analysis Pack), I could use a simple t-test. R produced a p-value of 0.04451. To be statistically "different" I want the p-value to be less than 0.05 and it is. What does this mean?

While statistically and numerically the commit rates are different, I wouldn't expect any special performance tuning award! In fact, the hassles with cycling a production instance and setting underscore/hidden parameters would make it very unlikely I would increase the OS priority of the log writer background processes. I want to see a big performance difference.

To Summarize... What I Learned... Again


This situation is a perfect example of focusing on the wrong thing! While there is a clear operating system CPU bottleneck and the top wait event is about redo, the log writers are not suffering from a want/need of CPU resources. I suspect the server processes want more CPU resources, but they are NOT the processes we increased their OS priority.

If the log writers were suffering from a lack of CPU resources and fighting for CPU resources, I would expect to see them consuming CPU resources along with the Oracle server processes. And I would definitely expect to see them near the top of the "top" process monitor... especially when their priority has been increased!

Because of this "misguided" tuning effort, this experiment does not build a case for or against changing the log writer priority. What it reinforces is in our quest to optimize performance, make sure we focus on the right thing.

As a side note, this is a great statistical analysis example for two reasons. First, our samples sets look similar, but statistically they are not. Second, while they are statistically different, the performance impact will not be very different. And my guess is the users won't feel a thing... except frustration and anger.

To Super Summarize


When considering increasing a background process's operating system priority, make sure the process is in need of CPU and is not able to get it. In this situation, the DBA could have been mislead by the CPU bottleneck. But upon closer inspection of the log writers from an operating system perspective and knowing the wait event "log file parallel write" is probably more about IO than CPU (Oracle perspective) it would be unlikely that increasing the log writer processes OS priority would help increase the commits per second.

Thanks for reading!

Craig.












Monday, October 27, 2014

How To Change The Priority Of Oracle Background Processes

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

How To Change The Priority Of Oracle Background Processes


Before you get in a huf, it can be done! You can change an Oracle Database background process

priority through an instance parameter! I'm not saying it's a good idea, but it can be done.

In this post I explore how to make the change, just how far you can take it and when you may want to consider changing an Oracle background process priority.

To get your adrenaline going, check out the instance parameter _high_priority_processes from one of your production Oracle system with a version of 11 or greater. Here is an example using my OSM tool, ipx.sql on my Oracle Database version 12.1.0.2.0.
SQL> @ipx _highest_priority_processes
Database: prod40                                               27-OCT-14 02:22pm
Report:   ipx.sql              OSM by OraPub, Inc.                Page         1
                         Display ALL Instance Parameters

Instance Parameter and Value                       Description          Dflt?
-------------------------------------------------- -------------------- -----
_highest_priority_processes         = VKTM         Highest Priority     TRUE
                                                   Process Name Mask
Then at the Linux prompt, I did:
$ ps -eo pid,class,pri,nice,time,args | grep prod40
 2879 TS   19   0 00:00:00 ora_pmon_prod40
 2881 TS   19   0 00:00:01 ora_psp0_prod40
 2883 RR   41   - 00:02:31 ora_vktm_prod40
 2889 TS   19   0 00:00:01 ora_mman_prod40
 2903 TS   19   0 00:00:00 ora_lgwr_prod40
 2905 TS   19   0 00:00:01 ora_ckpt_prod40
 2907 TS   19   0 00:00:00 ora_lg00_prod40
 2911 TS   19   0 00:00:00 ora_lg01_prod40
...
Notice the "pri" for priority of the ora_vktm_prod40 process? It is set to 41 while all the rest of the Oracle background processes are set to the default of 19. Very cool, eh?

Surprised By What I Found


Surprised? Yes, surprised because changing Oracle process priority is a pandoras box. Just imagine if an Oracle server (i.e., foreground) process has its priority lowered just a little and then attempts to acquire a latch or a mutex? If it doesn't get the latch quickly, I might never ever get it!

From a user experience perspective, sometimes performance really quick and other times the application just hangs.

This actually happened to a customer of mine years ago when the OS started reducing a process's priority after it consumed a certain amount of CPU. I learned that when it comes to Oracle processes, they are programed to expect an even process priority playing field. If you try to "game" the situation, do so at your own risk... not Oracle's.

Then why did Oracle Corporation allow background process priority to be changed. And why did Oracle Corporation actually change a background processes priority?!

Doing A Little Exploration


It turns out there are a number of "priority" related underscore instance parameters! On my 11.2.0.1.0 system there 6 "priority" parameters. On my 12.1.0.1.0 system there are 8 "priority" parameters. On my 12.1.0.2.0 system there are 13 "priority" parameters! So clearly Oracle is making changes! In all cases, the parameter I'm focusing on, "_high_priority_processes" exists.

In this posting, I'm going to focus on my Oracle Database 12c version 12.1.0.2.0 system. While you may see something different in your environment, the theme will be the same.

While I'll be blogging about all four of the below parameters, in this posting my focus will be on the _high_priority_processes parameter. Below are the defaults on my system:
_high_priority_processes        LMS*
_highest_priority_processes     VKTM
_os_sched_high_priority         1
_os_sched_highest_priority      1

Messing With The LGWR Background Processes


I'm not testing this on a RAC system, so I don't have an LMS background process. When I saw the "LMS*" I immediately thought, "regular expression." Hmmm... I wonder if I can change the LGWR background process. So I made the instance parameter change and recycled the instance. Below shows the instance parameter change:
SQL> @ipx _high_priority_processes
Database: prod40                                               27-OCT-14 02:36pm
Report:   ipx.sql              OSM by OraPub, Inc.                Page         1
                         Display ALL Instance Parameters

Instance Parameter and Value                       Description          Dflt?
-------------------------------------------------- -------------------- -----
_high_priority_processes            = LMS*|LGWR    High Priority        FALSE
                                                   Process Name Mask

Below is an operating system perspective using the ps command:

ps -eo pid,class,pri,nice,time,args | grep prod40
...
 5521 RR   41   - 00:00:00 ora_vktm_prod40
 5539 TS   19   0 00:00:00 ora_dbw0_prod40
 5541 RR   41   - 00:00:00 ora_lgwr_prod40
 5545 TS   19   0 00:00:00 ora_ckpt_prod40
 5547 TS   19   0 00:00:00 ora_lg00_prod40
 5551 TS   19   0 00:00:00 ora_lg01_prod40
...

How Far Can I Take This?


At this point in my journey, my mind was a blaze! The log file sync wait event can be really difficult to deal with and especially so when there is a CPU bottleneck. Hmmm... Perhaps I can increase the priority of all the log writer background processes?

So I made the instance parameter change and recycled the instance. Below shows the instance parameter change:
SQL> @ipx _high_priority_processes
Database: prod40                                               27-OCT-14 02:44pm
Report:   ipx.sql              OSM by OraPub, Inc.                Page         1
                         Display ALL Instance Parameters

Instance Parameter and Value                       Description          Dflt?
-------------------------------------------------- -------------------- -----
_high_priority_processes            = LMS*|LG*     High Priority        FALSE
                                                   Process Name Mask

Below is an operating system perspective using the ps command:

ps -eo pid,class,pri,nice,time,args | grep prod40
...
 5974 TS   19   0 00:00:00 ora_psp0_prod40
 5976 RR   41   - 00:00:00 ora_vktm_prod40
 5994 TS   19   0 00:00:00 ora_dbw0_prod40
 5996 RR   41   - 00:00:00 ora_lgwr_prod40
 6000 TS   19   0 00:00:00 ora_ckpt_prod40
 6002 RR   41   - 00:00:00 ora_lg00_prod40
 6008 RR   41   - 00:00:00 ora_lg01_prod40
 6014 TS   19   0 00:00:00 ora_lreg_prod40
...

So now all the log writer background processes have a high priority. My hope would be that if there is an OS CPU bottleneck and the log writer background processes wanted more CPU, I now have the power to give that to them! Another tool in my performance tuning arsenal!

Security Hole?


At this point, my exuberance began to turn into paranoia. I thought, "Perhaps I can increase the priority of an Oracle server process or perhaps any process." If so, that would be a major Oracle Database security hole.

With fingers trembling, I changed the instance parameters to match an Oracle server process and recycled the instance. Below shows the instance parameter change:

SQL> @ipx _high_priority_processes
Database: prod40                                               27-OCT-14 02:52pm
Report:   ipx.sql              OSM by OraPub, Inc.                Page         1
                         Display ALL Instance Parameters

Instance Parameter and Value                       Description          Dflt?
-------------------------------------------------- -------------------- -----
_high_priority_processes            =              High Priority        FALSE
LMS*|LG*|oracleprod40                              Process Name Mask

Below is an operating system perspective using the ps command:

$ ps -eo pid,class,pri,nice,time,args | grep prod40
...
 6360 TS   19   0 00:00:00 ora_psp0_prod40
 6362 RR   41   - 00:00:00 ora_vktm_prod40
 6366 TS   19   0 00:00:00 ora_gen0_prod40
 6382 RR   41   - 00:00:00 ora_lgwr_prod40
 6386 TS   19   0 00:00:00 ora_ckpt_prod40
 6388 RR   41   - 00:00:00 ora_lg00_prod40
 6394 RR   41   - 00:00:00 ora_lg01_prod40
 6398 TS   19   0 00:00:00 ora_reco_prod40
...
 6644 TS   19   0 00:00:00 oracleprod40...
...

OK, that didn't work so how about this?

SQL> @ipx _high_priority_processes
Database: prod40                                               27-OCT-14 02:55pm
Report:   ipx.sql              OSM by OraPub, Inc.                Page         1
                         Display ALL Instance Parameters

Instance Parameter and Value                       Description          Dflt?
-------------------------------------------------- -------------------- -----
_high_priority_processes            =              High Priority        FALSE
LMS*|LG*|*oracle*                                  Process Name Mask

Let's see what happened at the OS.

$ ps -eo pid,class,pri,nice,time,args | grep prod40
...
 6701 RR   41   - 00:00:00 ora_vktm_prod40
 6705 RR   41   - 00:00:00 ora_gen0_prod40
 6709 RR   41   - 00:00:00 ora_mman_prod40
 6717 RR   41   - 00:00:00 ora_diag_prod40
 6721 RR   41   - 00:00:00 ora_dbrm_prod40
 6725 RR   41   - 00:00:00 ora_vkrm_prod40
 6729 RR   41   - 00:00:00 ora_dia0_prod40
 6733 RR   41   - 00:00:00 ora_dbw0_prod40
...
 6927 RR   41   - 00:00:00 ora_p00m_prod40
 6931 RR   41   - 00:00:00 ora_p00n_prod40
 7122 TS   19   0 00:00:00 oracleprod40 ...
 7124 RR   41   - 00:00:00 ora_qm02_prod40
 7128 RR   41   - 00:00:00 ora_qm03_prod40

Oh Oh... That's not good! Now EVERY Oracle background process has a higher priority and my Oracle server process does not.

So my "*" wildcard caused all the Oracle processes to be included. If all the processes a high prioirty, then the log writer processes have no advantage over the others. And to make matters even worse, my goal of increasing the server process priority did not occur.

However, this is actually very good news because it appears this is not an Oracle Database security hole! To me, it looks like the priority parameter is applied during the instance startup for just the background processes. Since my server process was started after the instance was started and for sure not included in the list of background processes, its priority was not affected. Good news for security, not as good of news for a performance optimizing fanatic such as myself.

Should I Ever Increase A Background Process Priority?


Now that we know how to increase an Oracle Database background process priority, when would we ever want to do this? The short answer is probably never. But the long answer is the classic, "it depends."

Let me give you an example. Suppose there is an OS CPU bottleneck and the log writer background processes are consuming lots of CPU while handling all the associated memory management when server process issues a commit. In this situation, performance may benefit by making it easier for the log writer processes to get CPU cycles, therefore improving performance. But don't even think about doing this unless there is a CPU bottleneck. And even then, be very very careful.

In my next block posting, I'll detail an experiment where I changed the log writer background processes priority.

Thanks for reading!

Craig.



Wednesday, October 8, 2014

11 Tips To Get Your Conference Abstract Accepted

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

11 Ways To Get Your Conference Abstract Accepted


This is what happens when your abstract is selected!
Ready for some fun!? It's that time of year again and the competition will be intense. The "call for abstracts" for a number of Oracle Database conferences are about to close.

The focus of this posting is how you can get a conference abstract accepted.

As a mentor, Track Manager and active conference speaker I've been helping DBAs get their abstracts accepted for many years. If you follow my 11 tips below, I'm willing to bet you will get a free pass to any conference you wish in any part of the world.

1. No Surprises! 


Track Manager After A Surprise
The Track Manager wants no surprises, great content and a great presentation. Believe me when I say, they are looking for ways to reduce the risk of a botched presentation, a cancelation or a no show. Your abstract submissions is your first way to show you are serious and will help make the track incredibly awesome.

Tip: In all your conference communications, demonstrate a commitment to follow through.

2. Creative Title.


The first thing everyone sees is the title. I can personally tell you, if the title does not peak my curiosity without sounding stupid, then unless I know the speaker is popular I will not read the abstract. Why do I do this? Because as a Track Manager, I know conference attendees will do the same thing! And as a Track Manager, I want attendees to want to attend sessions in my track.

Tip: Find two people, read the title to them and ask what they think. If they say something like, "What are you going to talk about?" that's bad. Rework the title.

3. Tell A Story


The abstract must tell a compelling story. Oracle conferences are not academic conferences! There needs to be some problem along with a solution complete with drama woven into the story.

Tip: People forget bullet points, but they never forget a good story.

4. Easy To Read


The abstract must be easy to review. The abstract reviewers may have over a hundred abstracts to review. Make it a good quick read for the reviewers and your chances increase.

Tip: Have your computer read your abstract back to you. If you don't say, "Wow!" rework the abstract. 

5. Be A Grown-Up


You can increase the perception you will physically show up and put on a great show at the conference by NOT putting into your abstract emoji, bullet points, your name and title or pushing a product or service. NEVER copy/paste from a powerpoint outline into the abstract or outline. (I've seen people do this!)

Tip: Track Managers do not want to baby sit you. They want an adult who will help make their track great.

6. Submit Introductory Level Abstracts


I finally figured this out a couple years ago. Not everyone is ready for a detailed understanding of cache buffer chain architecture, diagnosis, and solution development. Think of it from a business perspective. Your market (audience) will be larger if your presentation is less technical. If this bothers you, read my next point.

Tip: Submit both an introductory level version and advanced level version of your topic.

7. Topics Must Be Filled


Not even the Track Manager knows what people will submit. And you do not know what the Track Manager is looking for. And you do not know what other people are submitting. Mash this together and it means you must submit more than one abstract. I know you really, really want to present on topic X. But would you rather not have an abstract accepted?

Tip: Submit abstracts on multiple topics. It increases your chances of being accepted.

8. Submit Abstract To Multiple Tracks


This is similar to submitting both an introductory version of your abstract. Here's an example: If there is a DBA Bootcamp track and a Performance & Internals Track, craft your abstract to Bootcamp version has a more foundational/core feel to it. And craft your Performance & Internals version to feel more technical and advanced.

Do not simply change the title and the abstract can not be the same.  If the conference managers or the Track Manager feels you are trying to game the conference, you present a risk to the conference and their track and your abstracts will be rejected. So be careful and thoughtful.

Tip: Look for ways to adjust your topic to fit into multiple tracks.

9. Great Outline Shows Commitment


If the reviewers have read your title and abstract, they are taking your abstract seriously. Now is the time to close the deal by demonstrating you will put on a great show. And this means you already have in mind an organized and well thought out delivery. You convey this with a fantastic outline. I know it is difficult to create an outline BUT the reviewers also know this AND having a solid outline demonstrates to them you are serious, you will show up, and put on a great show.

Tip: Develop your abstract and outline together. This strengthens both and develops a kind of package the reviewers like to see.

10. Learning Objectives Show Value


You show the obvious value of your topic through the learning objectives. Personally, I use these to help keep me focused on my listener, just not what I'm interested in at the moment. Because I love my work, I tend to think everyone also does... not so. I must force myself to answer the question, "Why would a DBA care about this topic?"

Tip: Develop your learning objectives by asking yourself, "When my presentation is over, what do I want the attendees to remember?"

11. Submit About Problems You Solved


Submit on the topics you have personally explored and found fascinating. Every year, every DBA has had to drill deep into at least one problem. This concentrated effort means you know the topic very well. And this means you are qualified to tell others about it! People love to hear from people who are fascinated about something. Spread the good news resulting from a "bad" experience.

Tip: Submit on topics you have explored and are fascinated with.

How Many Abstracts Should I Submit?


It depends on the conference, but for a big North America conference like ODTUG, RMOUG and IOUG I suggest at least four.

Based on what I wrote above, pick three topics, perhaps create both an introductory and advanced version and look to see if it makes sense to submit to multiple tracks. That means you'll probably submit at least four abstracts. It's not as bad as it sounds, because you will only have perhaps three core abstracts. All the others are modifications to fit a specific need. Believe when you receive the acceptance email, it will all be worth it!

See you at the conference!

Craig.


Monday, October 6, 2014

Comparing SQL Execution Times From Different Systems

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Comparing SQL Execution Times From Different Systems


Suppose it's your job to identify SQL that may run slower in the about-to-be-upgrated Oracle Database. It's tricky because no two systems are alike. Just because the SQL run time is faster in the test environment doesn't mean the decision to upgrade is a good one. In fact, it could be disastrous.

For example; If a SQL statement runs 10 seconds in production and runs 20 seconds in QAT, but the production system is twice as fast as QAT, is that a problem? It's difficult to compare SQL runs times when the same SQL resides in different environments.

In this posting, I present a way to remove the CPU speed differences, so an appropriate "apples to apples" SQL elapsed time comparison can be made, thereby improving our ability to more correctly detect risky SQL that may be placed into the upgraded production system.

And, there is a cool, free, downloadable tool involved!

Why SQL Can Run Slower In Different Environments


There are a number of reasons why a SQL's run time is different in different systems. An obvious reason is a different execution plan. A less obvious and much more complex reason is a workload intensity or type difference. In this posting, I will focus on CPU speed differences. Actually, what I'll show you is how to remove the CPU speed differences so you can appropriately compare two SQL statements. It's pretty cool.

The Mental Gymnastics


If a SQL statement's elapsed time in production is 10 seconds and 20 seconds in QAT, that’s NOT an issue IF the production system is twice as fast.

If this makes sense to you, then what you did was mentally adjust one of the systems so it could be appropriately compared. This is how I did it:

10 seconds in production * production is 2 times as fast as QA  = 20 seconds 

And in QA the sql ran in 20 seconds… so really they ran “the same” in both environments. If I am considering placing the SQL from the test environment into the production environment, then this scenario does not raise any risk flags. The "trick" is determining "production is 2 times as fast as QA" and then creatively use that information.

Determining The "Speed Value"


Fortunately, there are many ways to determine a system's "speed value." Basing the speed value on Oracle's ability to process buffers in memory has many advantages: a real load is not required or even desired, real Oracle code is being run at a particular version, real operating systems are being run and the processing of an Oracle buffer highly correlates with CPU consumption.

Keep in mind, this type of CPU speed test is not an indicator of scalability (benefit of adding additional CPUs) in any way shape or form. It is simply a measure of brut force Oracle buffer cache logical IO processing speed based on a number of factors. If you are architecting a system, other tests will be required.

As you might expect, I have a free tool you can download to determine the "true speed" rating. I recently updated it to be more accurate, require less Oracle privileges, and also show the execution plan of the speed test tool SQL. (A special thanks to Steve for the execution plan enhancement!) If the execution plan used in the speed tool is difference on the various systems, then obviously we can't expect the "true speeds" to be comparable.

You can download the tool HERE.

How To Analyze The Risk


Before we can analyze the risk, we need the "speed value" for both systems. Suppose a faster system means its speed rating is larger. If the production system speed rating is 600 and the QAT system speed rating is 300, then production is deemed "twice as fast."

Now let's put this all together and quickly go through three examples.

This is the core math:

standardized elapsed time = sql elapsed time * system speed value

So if the SQL elapsed time is 25 seconds and the system speed value is 200, then the standardized "apples-to-apples" elapsed time is 5000 which is 25*200. The "standardized elapsed time" is simply a way to compare SQL elapsed times, not what users will feel and not the true SQL elapsed time.

To make this a little more interesting, I'll quickly go through three scenarios focusing on identifying risk.

1. The SQL truly runs the same in both systems.

Here is the math:

QAT standardized elapsed time = 20 seconds X 300 = 6000 seconds

PRD standardized elapsed time = 10 seconds X 600 = 6000 seconds

In this scenario, the true speed situation is, QAT = PRD. This means, the SQL effectively runs just as fast in QAT as in production. If someone says the SQL is running slower in QAT and therefore this presents a risk to the upgrade, you can confidently say it's because the PRD system is twice as fast! In this scenario, the QAT SQL will not be flagged as presenting a significant risk when upgrading from QAT to PRD.

2. The SQL runs faster in production.

Now suppose the SQL runs for 30 seconds in QAT and for 10 seconds in PRD. If someone was to say, "Well of course it's runs slower in QAT because QAT is slower than the PRD system." Really? Everything is OK? Again, to make a fare comparison, we must compare the system using a standardizing metric, which I have been calling the, "standardized elapsed time."

Here are the scenario numbers:

QAT standardized elapsed time = 30 seconds X 300 = 9000 seconds
PRD standardized elapsed time = 10 seconds X 600 = 6000 seconds

In this scenario, the QAT standard elapsed time is greater than the PRD standardized elapsed time. This means the QAT SQL is truly running slower in QAT compared to PRD. Specifically, this means the slower SQL in QAT can not be fully explained by the slower QAT system. Said another way, while we expect the SQL in QAT to run slower then in the PRD system, we didn't expect it to be quite so slow in QAT. There must another reason for this slowness, which we are not accounting for. In this scenario, the QAT SQL should be flagged as presenting a significant risk when upgrading from QAT to PRD.

3. The SQL runs faster in QAT.

In this final scenario, the SQL runs for 15 seconds in QAT and for 10 seconds in PRD. Suppose someone was to say, "Well of course the SQL runs slower in QAT. So everything is OK." Really? Everything is OK? To get a better understanding of the true situation, we need to look at their standardized elapsed times.

QAT standardized elapsed time = 15 seconds X 300 = 4500 seconds
PRD standardized elapsed time = 10 seconds X 600 = 6000 seconds 

In this scenario, QAT standard elapsed time is less then the PRD standardized elapsed time. This means the QAT SQL is actually running faster in the QAT, even though the QAT wall time is 15 seconds and the PRD wall time is only 10 seconds. So while most people would flag this QAT SQL as "high risk" we know better! We know the QAT SQL is actually running faster in QAT than in production! In this scenario, the QAT SQL will not be flagged as presenting a significant risk when upgrading from QAT to PRD.

In Summary...


Identify risk is extremely important while planning for an upgrade. It is unlikely the QAT and production system will be identical in every way. This mismatch makes identifying risk more difficult. One of the common differences in systems is their CPU processing speeds. What I demonstrated was a way to remove the CPU speed differences, so an appropriate "apples to apples" SQL elapsed time comparison can be made, thereby improving our ability to more correctly detect risky SQL that may be placed into the upgraded production system.

What's Next?


Looking at the "standardized elapsed time" based on Oracle LIO processing is important, but it's just one reason why a SQL may have a different elapsed time in a different environment. One of the big "gotchas" in load testing is comparing production performance to a QAT environment with a different workload. Creating an equivalent workload on different systems is extremely difficult to do. But with some very cool math and a clear understanding of performance analysis, we can also create a more "apples-to-apples" comparison, just like we have done with CPU speeds. But I'll save that for another posting.

All the best in your Oracle performance work!

Craig.