Monday, August 18, 2014

What Is Oracle Elapsed Time And Wall Time With A Parallelism Twist

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.


What Is Oracle Elapsed Time And Wall Time With A Parallelism Twist


In this post I'm focusing on Oracle Database SQL elapsed time, adding parallelism into the mix and then revisiting wall time. What initially seems simple can take some very interesting twists!

If you are into tuning Oracle Database systems, you care about time. And if you care about time, then you need to understand the most important time parameters: what they are, their differences, how they relate to each other and how to use them in your performance tuning work.

A couple weeks ago I wrote about Oracle DB Time, non-idle wait time, and server process CPU consumption (DB CPU) time. If you have not read that posting, HERE is the link. It must be a good read because it quickly become my most viewed post ever! In this posting, the focus is SQL elapsed time, parallelism, and again wall time. Enjoy!

Quick Review


In my previous related post, I covered non-idle wait time, DB CPU, and DB Time. Here is a very quick summary of each.

Non-Idle Wait Time occurs when an Oracle process is not consuming CPU, the session pauses (i.e., waits) and Oracle considers the wait time important for performance tuning. An example of a non-idle wait event is direct path read temp. An example of an idle wait event is SQL*Net message from client or pmon timer.

DB CPU is Oracle server/foreground/shadow process CPU consumption. This is not include Oracle background process CPU consumption.

DB Time is DB CPU plus Non-Idle Wait Time. Remember that DB Time does not include background process CPU consumption and Oracle Corporation determines which wait events are considered idle.

Elapsed Time


Elapsed Time (ET) is all DB Time related to a defined task. A "defined task" could be a SQL statement, group of SQL statements, pl/sql procedure, batch job, etc. It is whatever makes sense in your tuning situation.

The elapsed time for a SQL_ID can be found in v$sql. But be careful because this elapsed time is related to "all" the SQL_ID executions. Thankfully, there is an "executions" column in v$sql.


Elapsed time is displayed in a number of areas within an Oracle Database AWR and Statspack report. Looking at the above screen shot, the "top" elapsed time SQL has an elapsed time of 268561 seconds. This means that over the AWR report's snapshot interval, for all this SQL's executions, its total DB Time is 268561 seconds. Said another way, if we were to add up all this SQL's DB CPU and non-idle wait time for all its executions within the snapshot interval, the value should be 268561.

There is a lot of great information provided in the AWR and Statspack SQL reports. For example, because the elapsed time and the CPU time (DB CPU) is shown above, we can calculate the non-idle wait time for the "top" elapsed time SQL ID.

non idle wait time = elapsed time - cpu time
       268465      =    268561    -    96

For the "top" elapsed SQL, its elapsed time 268561 and it's DB CPU is 96 therefore its non-idle wait time is 268465. Wow! This statement has tons of associated wait time compared to CPU consumption time.

But it gets even better! Because the total Elapsed Time and the total number Executions over the snapshot interval is displayed, we can determine the average elapsed time!

average elapsed time = total elapsed time / executions
        746.03       =      268561        /     36

Do not be deceived! The average elapsed time is unlikely what the user is experiencing. Two possibility examples for this deception are skewed elapsed times and parallelism.

For most DBAs this is unexpected. It also causes performance perception problems yet solutions are available to understand what's really going on.

I've spent so much time researching this topic and seen it increase my consulting value, I've posted a number of blog entries on this subject. Plus I created an OraPub Online Institute seminar focused specifically on this subject. It's called Using Skewed Performance Data To Your Advantage. Check it out. I'm really proud how it turned out. I also have a couple of OSM scripts dedicated to this topic, sqlelget[11].sql.

Revisiting Wall Time With A Parallelism Twist


Now it's time to put this all together.

DB CPU is the Oracle server process CPU consumption.

Non-Idle Wait Time (NIWT) is the time when an Oracle process can not consume CPU and must pause and we care about this time.

DB Time is the Oracle server process CPU consumption and all non-idle wait time.

Elapsed Time (ET) is the sum (i.e., all) DB Time related to a task, such as a SQL_ID.

Wall Time is what we hope the user experiences. I'll assume there is no time gap between Oracle and the user, therefore the wall time will equal the user's experience.

Effective Parallelism is the effective number of Oracle parallel slaves or some other form of parallelism, such as designed-in application parallelism. (For simplicity, I'm only going to mention Oracle parallel query.) If Oracle parallel query is not involved, then the effective parallelism is one. If two parallel query slaves are involved, then the effective parallelism will be a little less than 2

Parallelism can reduce wall time because we can simultaneously "burn time" in multiple places. For example, 60 seconds of elapsed time with a process running serially, results in a wall time of 60 seconds. But if we have two parallel query slaves, while the elapsed time (i.e., all the DB Time) is still 60 seconds (plus some overhead time), the wall time will be around 30 seconds (plus some overhead time).

The math is really simple...that is until you factor in scalability (i.e., the overhead), which I won't. If you're interested, read the last chapter of my book, Forecasting Oracle Performance.

Let's simplify this by using some mathematical notation.

DB Time = DB CPU + NIWT

Elapsed Time = Sum of DB Time

Wall Time = Elapsed Time / Effective Parallelism

Pretty straightforward, eh? Below is a short video clip summarizing the key time parameters taken from the OraPub Online Institute seminar, Tuning Oracle Using an AWR Report - Part 2. If you can't see the video, click HERE watch it on YouTube.



Test You Knowledge


True or False? If the total elapsed time is 60 seconds and parallel query is not involved, the total wall time will also be 60 seconds. True

True or False? If the elapsed time per execution is 60 seconds and the wall time is 30 seconds, then parallel query is involved. True

True or False? Bonus question yet very important to understand: If the elapsed time per execution is 60 seconds and two PQ slaves are involved, then the wall time will be 30 seconds.

The last question is false because there is overhead when parallelizing. Parallelism is not free. Because of this, the wall time will hopefully drop to perhaps 35 seconds. That 5 seconds is the parallelization overhead.

Coming Up Next: Video Proof!


While the above may seem correct, I ran some SQL statements and captured the relevant time statistics. There is quite a bit of detail and I ran two different tests, so I'll post that in a week or two.

Thanks for reading,

Craig.

Monday, August 11, 2014

Watch Oracle DB Session Activity With The Real-Time Session Sampler

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Watch Oracle DB Session Activity With My Real-Time Session Sampler


Watching session activity is a great way to diagnose and learn about Oracle Database tuning. There are many approaches to this. I wanted something simple, useful, modifiable, no Oracle licensing
issues and that I could give away. The result is what I call the Oracle Real-Time Session Sampler (OSM: rss.sql).

The tool is simple to use.  Based on a number filtering command line inputs, it repeatedly samples active Oracle sessions and writes the output to a file in /tmp. You can do a "tail -f" on the file to watch session activity in real time!

The rss.sql tool is included in the OraPub System Monitor (OSM) toolkit (v13j), which can be downloaded HERE.

If you simply want to watch a video demo, watch below or click HERE.


The Back-Story


Over the past two months I have been creating my next OraPub Online Institute seminar about how to tune Oracle with an AWR/Statspack report using a quantitative time based approach. Yeah... I know the title is long. Technically I could have used Oracle's Active Session History view (v$active_session_history) but I didn't want anyone to worry about ASH licensing issues. And ASH is not available with Oracle Standard Edition.

The Real-Time Session Sampler is used in a few places in the online seminar where I teach about Oracle session CPU consumption and wait time. I needed something visual that would obviously convey the point I wanted to make. The Real-Time Session Sampler worked perfectly for this.

What It Does


Based on a number of command line inputs, rss.sql repeatedly samples active Oracle sessions and writes the output to file in /tmp. The script contains no dml statements. You can do a "tail -f" on the output file to see session activity in real time. You can look at all sessions, a single session, sessions that are consuming CPU or waiting or both, etc. You can even change the sample rate. For example, once every 5.0 seconds or once every 0.25 seconds! It's very flexible and it's fascinating to watch.

Here is an example of some real output.



How To Use RSS.SQL


The tool is run within SQL*Plus and the output is written to the file /tmp/rss_sql.txt. You need two windows: one to sample the sessions and other other to look at the output file. Here are the script parameter options:

rss.sql  low_sid  high_sid  low_serial  high_serial  session_state  wait_event_partial|%  sample_delay

low_sid is the low Oracle session id.
high_sid is the high Oracle session id.
low_serial is the low Oracle session's serial number.
high_serial is the high Oracle session's serial number.
session_state is the current state of the session at the moment of sampling: "cpu", "wait" or for both "%".
wait_event_partial is when the session is waiting, select the session only with this wait event. Always set this to "%" unless you want to tighten the filtering.
sample_delay is the delay between samples, in seconds.

Examples You May Want To Try


By looking at the below examples, you'll quickly grasp that this tool can be used in a variety of situations.

Situation: I want to sample a single session (sid:10 serial:50) once every five seconds.

SQL>@rss.sql  10 10 50 50 % % 5.0

Situation: I want to essentially stream a single session's (sid:10 serial:50) activity.

SQL>@rss.sql 10 10 50 50 % % 0.125

Situation: I want to see what sessions are waiting for an row level lock while sampling once every second.

SQL>@rss.sql 0 99999 0 99999 wait enq%tx%row% 1.0

Situation: I want to see which sessions are consuming CPU, while sampling once every half second.

SQL>@rss.sql 0 99999 0 99999 cpu % 0.50

Be Responsible... It's Not OraPub's Fault!


Have fun and explore...but watch out! Any time you are sample repeatedly, you run the risk of impacting the system under observation. You can reduce this risk by sampling less often (perhaps once every 5 seconds), by limiting the sessions you want to sample (not 0 to 99999) and by only select sessions in either a "cpu" or "wait" state.

A smart lower impact strategy would be to initially keep a broader selection criteria but sample less often; perhaps once every 15 seconds. Once you know what you want to look for, tighten the selection criteria and sample more frequently. If you have identified a specific session of interest, then you stream the activity (if appropriate) every half second or perhaps every quarter second.

All the best in your Oracle Database tuning work,

Craig.

Tuesday, August 5, 2014

What Is Oracle DB Time, DB CPU, Wall Time and Non-Idle Wait Time

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

What Is Oracle DB Time, DB CPU, Wall Time and Non-Idle Wait Time


If you are into tuning Oracle Database systems, you care about time. And if you care about time, then you need to understand the most important time parameters: what they are, their differences, how they relate to each other and how to use them in your performance tuning work.

The key Oracle Database time parameters are elapsed time, database time (DB Time), non-idle wait time and server process CPU consumption (DB CPU) time.

This first post is pretty basic, yet core fundamental stuff. So in the following two posts I'll introduce elapsed time, add parallelism into the mix and revisit wall time. What initially seems simple can some take very interesting twists!

You probably know that I am all about quantitative Oracle performance analysis. I research, write, teach, and speak about it. I even have an OraPub Online Institute seminar about how to tune your Oracle Database systems from a standard AWR or Statspack report using an Oracle Time Based Analysis (OTBA) framework.

So let's get started!

Wall Time & Run Time


I'll start with Wall Time because that is close (hopefully) to what a user experiences. In fact, if there is no time gap between the Oracle Database and the user, then we can do a little math and figure out what the users are, on average, experiencing. I'll get back to wall time in the next post, where I include elapsed time and parallelism into the equation.

DB CPU


DB CPU is Oracle server/foreground/shadow process CPU consumption. Each Oracle server process gathers its own CPU consumption using the time and/or getrusage C function system call. So unless there is a major screw-up by either the operating system or the Oracle kernel developers, the time will be good... very good. The name DB CPU is taken from the actual statistic name, which is found in both v$sess_time_model and v$sys_time_model.

If you look at any AWR or Statspack report in the "Time Model" section, you will see DB CPU. The value shown will be all server process CPU consumption within the reporting snapshot interval, converted to seconds. (The raw statistic is stored in microseconds.)

Below is an example Time Model Statistics screen shot from a standard AWR report. I've highlighted DB CPU.



If you run one of my OraPub System Monitor (OSM) time related tools like ttpctx.sql or rtpctx.sql you see a CPU time statistic. That contains both the DB CPU (i.e., server process) and "background process cpu" statistics. Here's an example.

SQL> @ttpctx.sql
Remember: This report must be run twice so both the initial and
final values are available. If no output, press ENTER about 11 times.

Database: prod35                                                     31-JUL-14 12:09pm
Report:   ttpctx.sql            OSM by OraPub, Inc.                     Page         1
                       Total Time Activity (142 sec interval)

                                                         Avg Time        Time     Wait
Time Component                           % TT    % WT Waited (ms)       (sec) Count(k)
------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs   95.95    0.00       0.000     347.212        0
PX Deq: Slave Session Stats              1.45   35.74       0.113       5.240       47
library cache: mutex X                   0.58   14.26       0.136       2.090       15
PX Deq: Slave Join Frag                  0.43   10.57       0.067       1.550       23
PX Deq: Signal ACK EXT                   0.29    7.16       0.045       1.050       23
control file parallel write              0.28    7.03      20.600       1.030        0
PX qref latch                            0.27    6.75       0.012       0.990       85
latch free                               0.20    4.91       0.090       0.720        8
log file parallel write                  0.16    4.02      12.826       0.590        0

Non-Idle Wait Time


When an Oracle process can not consume CPU, it will pause. As an Oracle DBA, we know this as wait time. Sometimes a process waits and it's not a performance problem, so we call this Idle Wait Time. Oracle background processes typically have lots of idle wait time. However, when a user is waiting for sometime to complete and way down deep their Oracle server process is waiting to get perhaps a lock or latch, this is Non-Idle Wait Time. Obviously, when tuning Oracle we care a lot about non-idle wait time.

Below is a simple query showing wait event classifications. In this system there are 119 Idle wait events, so all the rest would be classified as non-idle wait events.

Oracle uses a variety of methods to determine wait time. I have a number of postings and educational content available about this. You'll see them if you do an OraPub or blog search for "time".

When working with non-idle wait time, remember the 80/20 rule. Most of the wait time we care about will be contained with in the largest ("top") two to four wait events. Don't waste YOUR time focusing on the 20%.

Here's an example. In the screen shot below, while not shown the total wait time is 1966 seconds.
If you add up the displayed "top" four wait events, their combined wait time is 1857. This is about 95% of all the non-idle wait time. This is a good example demonstrating that most of the wait time is found in the top two to four events.

My OSM toolkit has many wait time related tools. Most start with "sw" for "session wait" but the both ttpctx.sql or rtpctx.sql will contain the non-idle wait time and also CPU consumption. This is a good time to transition into DB Time.

DB Time


DB Time is a time model statistic that is the sum of Oracle process CPU consumption and non-idle wait time. When optimizing Oracle systems we typically focus on reducing "time", though many times database work is also part of the equation. This "time" is essentially DB Time, though sometimes I take control over what I consider idle wait time.

The name DB Time comes from the actual statistic name in both v$sess_time_model and v$sys_time_model.

If you look at any AWR or Statspack report in the "Time Model" section, you will see DB Time.
The DB time value is technically all server process CPU consumption plus the non-idle wait time within the reporting snapshot interval, converted to seconds. (The raw statistic is stored in microseconds.) Surprisingly, Oracle does not include "background cpu time" in the DB Time statistic. There are both good and not so good reasons the background CPU time is not include, but that's a topic for another posting.

A Little Math


We have enough detail to relate DB Time, DB CPU and non-idle wait time together... using a little math.

DB Time = DB CPU + non_idle_wait_time

And of course,

non_idle_wait_time = DB Time - DB CPU

This is important, because there is no single statistic that shows all the non-idle wait time. This must be derived. Shown above is one way to derive the non-idle wait time. Take a look at the AWR report snippet below.

In the Non-Idle Wait Time section above, I stated that the total non-idle wait time was 1966 seconds. I derived this from the Time Model screen shown above. I simply did:

non_idle_wait_time = DB Time - DB CPU
           1966.16 = 4032.03 - 2065.87

Coming Up Next


I wanted to keep this post short, which means I left out the more interesting topics. So in the next post I'll merge into the picture elapsed time along with parallelism and revisit wall time. Then in the third post (that's my guess at this point), I'll actually demonstrate this in two different systems.

Thanks for reading,

Craig.