Friday, January 10, 2014

Creating A Tool: Detailing Oracle Process CPU Consumption

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Detailing an Oracle Database process's CPU consumption is amazing, a lot of fun, and can lead to some “ah ha” moments. To make this posting daily digestible, I’m breaking it up into a multi-part series. Today I’m posting the third and final part entitled, Creating A Tool. Here we go…

In the previous post I ended with a discussion about how to gather an operating system's CPU consumption by function. More specific for our purposes is gathering Oracle process (session) CPU consumption at the Oracle kernel function level.

I Want Something Like This


At this point, we are ready to take the Linux perf tool output and merge that with Oracle wait event data. The output could look something like this:

Time Component                                                secs       %
------------------------------------------------------- ---------- -------
cpu : [.] kcbgtcr                                           29.714   66.87
cpu : [.] kdstf000010100001km                                3.716    8.36
cpu : [.] lnxsum                                             3.541    7.97
cpu : [?] sum of funcs consuming less than 2% of CPU ti      2.393    5.38
cpu : [.] kaf4reasrp0km                                      2.180    4.91
wait: latch: cache buffers chains                            2.100    4.73

I suspect you're not surprised that the above output was, in fact, taken from an existing tool. The tool is called "fulltime.sh" and is the result of a collaborative effort with myself and Frits Hoogland. You can download the free tool here. If the link doesn't work, just do an OraPub.com search for "fulltime" and you'll see it.

How The Tools Works


Obviously the tool collects both process CPU consumption and also Oracle wait event data. But we wanted some usage flexibility, so it's a little more interesting than that. Also, the real trick is outputting the perf data into a comma delimited file using "perf report -t,". It's the "t" option that is key here. Oracle can easily read the coma delimited file as an external table and then combine that with the wait event data collected from the v$sess_time_model view.

Here is the basic idea:

Help user identify the PID to profile
Initial setup
Loop
  get oracle wait times (snap 0)
  get oracle CPU time (snap 0)
  start oracle kernel cpu details collection
  sleep x
  get oracle wait times (snap 1)
  get oracle CPU time (snap 1)
  stop oracle kernel cpu collection
  do some cool math and combine results
  display results
End Loop

The looping capability easily allows for a single collection or multiple collection. The multiple collection option displays much like the "top" tool with a default cycle duration of three seconds.

If you're running on a virtual machine, make sure to change the shell script variable PERF_SAMPLE_METHOD to '-e cpu-clock' instead of the default '-e cycles'. If you look near the top of the fulltime.sh script, you will see a comment about this and also where to make the change.

Before you run the script, take a quick look at the script, especially the top third. There is some good information about the script, default settings, and usage details.

How To Run The Script


There are two input usages. Let's call them basic and advanced.

The BASIC USAGE helps you find the Oracle process ID, sets some defaults for you, and works on a three second cycle until you control-C. For example, let's say you don't know the OS process but you do know the Oracle SID or perhaps you know the machine name or user name. Then you're in luck because the fulltime.sh script will display this information and then prompt you for the OS PID! Here is an example of you can expect if you simply enter, "./fulltime.sh" and selected process 60505.


For each cycle, you would see something like this:


Notice that both Oracle and OS process details are shown in conjunction with the date and time. Plus the SQL statement the process is running at the end of the sample period! Then the total time is shown along with the two high level components, that is, CPU consumption and Oracle wait time. Finally, the details are shown. Notice the details clearly identify the time component as either CPU or wait time.

If the time is tagged CPU then the Oracle kernel function is displayed (thank you perf) along with it's inferred time (based on both perf and v$sess_time_model). If the time is tagged Oracle wait time then the wait event name is displayed along with the wait time (based on v$session_event).

It's important to understand that if we ran a standard wait event report, the output would be similar but with NO CPU information. The wait events detail would still be displayed. This tool simply adds value by incorporating the CPU consumption.

The ADVANCED USAGE gives us full control. The advanced usage requires the OS process ID, the cycle duration, and the number of cycles. The advanced option makes it simple to get a single long cycle or multiple shorter cycles. For example, if I wanted to watch process 1234 in 5 second intervals, I could do this:
$ ./fulltime.sh 1234 5 9999
Or suppose I wanted a single 60 second sample for process 5432. I would enter this:
$ ./fulltime.sh 5432 60 1
The output will look exactly like the output from basic usage. You are just not prompted or given information to help you pick the OS PID.

Here is a short video so you can watch this in action!


Fulltime.sh - Report both Oracle Wait and CPU time details from OraPub on Vimeo.

Why This Is So Important


Look at the example screen shot below and ask yourself, "Is there an IO bottleneck?"


If the CPU information was not combined with the wait time details, I suspect 90% of Oracle DBAs would say there is an IO problem. They would see the top wait event db file sequential read then conclude there is an IO issue. However, as the fulltime.sh script clearly shows, CPU consumption is much larger than Oracle wait time, providing us with a greater opportunity to reduce the total time and devise additional spot-on solutions. Also looking at the report's "total time" breakdown, it clearly shows CPU consumption is 97% of the total time. I wouldn't be surprised if there was a raging CPU bottleneck!

So Where Are We In This Quest?


Now that you have access to the fulltime.sh script and know how to use it, you're going to start asking, "Just what is kcbgtcr anyways?!" In my next posting I'll address this.

But until then, get the fulltime.sh script, log into your non-production yet active Linux box and give it a go! If it's not production, profile the Oracle log writer or database writer background process or perhaps an active server process!

Feeling dangerous yet? (Always!)

Here are the key resource links:

1. Presentation: Detailing Oracle CPU Consumption. Download HERE.
2. Tool: Oracle wait time and kernel function CPU monitor. Download the fulltime.sh script HERE.

Thanks for reading,

Craig.