Monday, December 29, 2014

Can A Background Process Impact A Foreground Process And Its Database Time?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Can A Background Process Impact A Foreground Process And Its Database Time?


Have you ever heard someone say, "Background processes do not impact foreground processes because they run in the background and in parallel with foreground processes." I've heard this hundreds of times!

While doing some performance research I came across a great example of how an Oracle Database background process can directly and significantly impact a foreground process.

The above quote represents a masterfully constructed lie; it contains both a lie and a truth. The mix of a truth and a lie make understanding the reality of the situation difficult. In this post, I'll explain the truth, delve into the lie and relate it all to foreground process database time.

By the way, I am in no way saying there is something wrong with or incorrect about DB Time. I want to ensure this is clear from the very beginning of this post.

Just so there is no confusion, an Oracle foreground process is sometimes also called a server process or a shadow process. These can terms can be used interchangeably in this post.

The Truth


Clearly background and foreground processes operate in parallel. I don't think any DBA would deny this. As I frequently say, "serialization is death and parallelism is life!" A simple "ps" command will visually show both Oracle background and foreground processes at work. But this in no way implies they do not impact each other's activity and performance.

In fact, we hope they do impact each other! Can you imagine what performance would be with the background processes NOT running in parallel?! What a performance nightmare that would be. But this where the "no impact" lie lives.

The Lie


Most senior DBAs can point to a specific situation where Oracle cache buffer chain latch contention affected multiple foreground sessions. In this situation, foreground sessions were franticly trying to acquire a popular cache buffer chain latch. But this is a foreground session versus foreground session situation. While this is example is important, this post is about when a background process impacts a foreground process.

Have you every committed a transaction and it hangs while the foreground process is waiting on "log file switch (checkpoint incomplete)" or even worse "log file switch (archiving needed)" event? All the foreground process knows is that its statement can't finish because a required log switch has not occurred because a checkpoint is incomplete. What the server process does not know is the checkpoint (CKPT), the database writer (DBWR) and the log writer (LGWR) background processes are involved. There is a good chance the database writer is frantically writing dirty buffers to the database (dbf) files so the LGWR can safely overwrite the associated redo in the next online redo log.

For example, if a server process issued a commit during the checkpoint, it will wait until the checkpoint is complete and the log writer has switched and can write into the next redo log. So, while the log writer background processes is probably waiting on "log file parallel write" and the database writer is burning CPU and waiting on "db file parallel write", the foreground processes are effectively hung.

This is a classic example of how a background process can impact the performance of a foreground process.

A Demonstration Of The Lie


Here's a quick demonstration of the above situation. On an existing database in my lab, I created two 4MB redo logs and dropped all the other redo logs. I started a DML intensive workload. According to the alert.log file, the redo logs where switching every couple of seconds! Take a look at this:
$ tail -f /home/oracle/base/diag/rdbms/prod30/prod30/trace/alert*log
Thread 1 cannot allocate new log, sequence 2365
Checkpoint not complete
  Current log# 4 seq# 2364 mem# 0: /home/oradata/prod30/redoA1.log
Mon Dec 29 11:02:09 2014
Thread 1 advanced to log sequence 2365 (LGWR switch)
  Current log# 5 seq# 2365 mem# 0: /home/oradata/prod30/redoA2.log
Thread 1 cannot allocate new log, sequence 2366
Checkpoint not complete
  Current log# 5 seq# 2365 mem# 0: /home/oradata/prod30/redoA2.log
Thread 1 advanced to log sequence 2366 (LGWR switch)
  Current log# 4 seq# 2366 mem# 0: /home/oradata/prod30/redoA1.log
Thread 1 cannot allocate new log, sequence 2367
Checkpoint not complete
  Current log# 4 seq# 2366 mem# 0: /home/oradata/prod30/redoA1.log
Thread 1 advanced to log sequence 2367 (LGWR switch)
  Current log# 5 seq# 2367 mem# 0: /home/oradata/prod30/redoA2.log
Thread 1 cannot allocate new log, sequence 2368
Checkpoint not complete
  Current log# 5 seq# 2367 mem# 0: /home/oradata/prod30/redoA2.log
Mon Dec 29 11:02:20 2014

Obviously not what you want to see on a production Oracle system! (But my guess many of you have.)

Using my OSM realtime session sampler tool (rss.sql - related blog posting HERE) I sampled the log writer every half a second. (There is only one log writer background process because this is an Oracle 11g database, not an Oracle Database 12c system.) If the log writer session showed up in v$session as an active session, it would be picked up by rss.sql.  Both "ON CPU" and "WAIT" states are collected. Here is a sample of the output.


It's very obvious the log writer is doing some writing. But we can't tell from the above output if the process is impacting other sessions. It would have also been very interesting to sample the database writer also, but I didn't do that. To determine if the background processes are impacting other sessions, I needed to find a foreground session that was doing some commits. I noticed that session 133, a foreground process was busy doing some DML and committing as it processed its work. Just as with the log writer background process, I sampled this foreground process once every 0.5 second. Here's a sample of the output.


Wow. The foreground process is waiting a lot for the current checkpoint to be completed! So... this means the foreground process is being effectively halted until the background processes involved with the checkpoint have finished their work.

This is a great example of how Oracle background processes can impact the performance of an Oracle foreground process.

But let's be clear. Without the background processes, performance would be even worse. Why? Because all that work done in parallel and in the background would have to be done by each foreground process AND all that work would have to be closely controlled and coordinated. And that, would be a performance nightmare!

DB Time Impact On The Foreground Process


Just for the fun of it, I wrote a script to investigate DB Time, CPU consumption, non-idle wait time and the wait time for the "log file switch wait (checkpoint incomplete)" wait event for the foreground process mentioned above (session 133). The script simply gathers some session details, sleeps for 120 seconds, again gathers some session details, calculates the differences and displays the results. You can download the script HERE. Below is the output for the foreground process, session 133.
SQL> @ckpttest.sql 133

Table dropped.

Table created.

PL/SQL procedure successfully completed.

CPU_S_DELTA NIW_S_DELTA DB_TIME_S_DELTA CHECK_IMPL_WAIT_S
----------- ----------- --------------- -----------------
      2.362      117.71      119.973692            112.42

1 row selected.

Here is a quick description of the output columns.

  • CPU_S_DELTA is the CPU seconds consumed by session 133, which is the time model statistic DB CPU.
  • NIW_S_DELTA is the non-idle wait time for session 133, in seconds.
  • DB_TIME_S_DELTA is the DB Time statistic for session 133, which is the time model statistic DB Time.
  • CHECK_IMPL_WAIT_S is the wait time only for event "log file switch (checkpoint incomplete)" for session 133, in seconds.

Does the time fit together as we expect? The "log file switch..." wait time is part of the non-idle wait time. The DB Time total is very close to the CPU time plus the non-idle wait time. Everything seems to add up nicely.

To summarize: Oracle background processes directly impacted the database time for a foreground process.

In Conclusion...


First, for sure Oracle foreground and background processes impact each other...by design for increased performance. Sometimes on real production Oracle Database systems things get messy and work that we hoped would be done in parallel must become momentarily serialized. The log file switch example above, is an example of this.

Second, the next time someone tells you that an Oracle background process does not impact the performance of a foreground process, ask them if they have experienced a "log file switch checkpoint incomplete" situation. Pause until they say, "Yes." Then just look at them and don't say a word. After a few seconds you may see a "oh... I get it." look on their face. But if not, simply point them to this post.

Thanks for reading and enjoy your work!

Craig.




2 comments:

  1. Thanks Craig for this interesting post.
    Keep on writing this way in 2015 !
    Jean-michel, Nemours, France

    ReplyDelete
  2. Craig -

    As we discussed on the phone, the statement that backgrounds don't affect foregrounds belies a complete misunderstanding of the purpose of backgrounds in general and is therefore nonsensical at face value.

    Most backgrounds exist entirely to save foregrounds from doing expensive things like physical I/O on their own. This is especially true in the case of DBWR and LGWR. So when these backgrounds are struggling and seem to be negatively affecting foregrounds, this needs to be properly understood simply as a reduction in the oodles of time foregrounds would otherwise spend doing the entire work on their own, and not as some "extra" imposition of time on the foregrounds.

    JB

    ReplyDelete