Friday, January 10, 2014

Creating A Tool: Detailing Oracle Process CPU Consumption

Detailing an Oracle Database process's CPU consumption is amazing, a lot of fun, and can lead to some “ah ha” moments. To make this posting daily digestible, I’m breaking it up into a multi-part series. Today I’m posting the third and final part entitled, Creating A Tool. Here we go…

In the previous post I ended with a discussion about how to gather an operating system's CPU consumption by function. More specific for our purposes is gathering Oracle process (session) CPU consumption at the Oracle kernel function level.

I Want Something Like This

At this point, we are ready to take the Linux perf tool output and merge that with Oracle wait event data. The output could look something like this:

Time Component                                                secs       %
------------------------------------------------------- ---------- -------
cpu : [.] kcbgtcr                                           29.714   66.87
cpu : [.] kdstf000010100001km                                3.716    8.36
cpu : [.] lnxsum                                             3.541    7.97
cpu : [?] sum of funcs consuming less than 2% of CPU ti      2.393    5.38
cpu : [.] kaf4reasrp0km                                      2.180    4.91
wait: latch: cache buffers chains                            2.100    4.73

I suspect you're not surprised that the above output was, in fact, taken from an existing tool. The tool is called "" and is the result of a collaborative effort with myself and Frits Hoogland. You can download the free tool here. If the link doesn't work, just do an search for "fulltime" and you'll see it.

How The Tools Works

Obviously the tool collects both process CPU consumption and also Oracle wait event data. But we wanted some usage flexibility, so it's a little more interesting than that. Also, the real trick is outputting the perf data into a comma delimited file using "perf report -t,". It's the "t" option that is key here. Oracle can easily read the coma delimited file as an external table and then combine that with the wait event data collected from the v$sess_time_model view.

Here is the basic idea:

Help user identify the PID to profile
Initial setup
  get oracle wait times (snap 0)
  get oracle CPU time (snap 0)
  start oracle kernel cpu details collection
  sleep x
  get oracle wait times (snap 1)
  get oracle CPU time (snap 1)
  stop oracle kernel cpu collection
  do some cool math and combine results
  display results
End Loop

The looping capability easily allows for a single collection or multiple collection. The multiple collection option displays much like the "top" tool with a default cycle duration of three seconds.

If you're running on a virtual machine, make sure to change the shell script variable PERF_SAMPLE_METHOD to '-e cpu-clock' instead of the default '-e cycles'. If you look near the top of the script, you will see a comment about this and also where to make the change.

Before you run the script, take a quick look at the script, especially the top third. There is some good information about the script, default settings, and usage details.

How To Run The Script

There are two input usages. Let's call them basic and advanced.

The BASIC USAGE helps you find the Oracle process ID, sets some defaults for you, and works on a three second cycle until you control-C. For example, let's say you don't know the OS process but you do know the Oracle SID or perhaps you know the machine name or user name. Then you're in luck because the script will display this information and then prompt you for the OS PID! Here is an example of you can expect if you simply enter, "./" and selected process 60505.

For each cycle, you would see something like this:

Notice that both Oracle and OS process details are shown in conjunction with the date and time. Plus the SQL statement the process is running at the end of the sample period! Then the total time is shown along with the two high level components, that is, CPU consumption and Oracle wait time. Finally, the details are shown. Notice the details clearly identify the time component as either CPU or wait time.

If the time is tagged CPU then the Oracle kernel function is displayed (thank you perf) along with it's inferred time (based on both perf and v$sess_time_model). If the time is tagged Oracle wait time then the wait event name is displayed along with the wait time (based on v$session_event).

It's important to understand that if we ran a standard wait event report, the output would be similar but with NO CPU information. The wait events detail would still be displayed. This tool simply adds value by incorporating the CPU consumption.

The ADVANCED USAGE gives us full control. The advanced usage requires the OS process ID, the cycle duration, and the number of cycles. The advanced option makes it simple to get a single long cycle or multiple shorter cycles. For example, if I wanted to watch process 1234 in 5 second intervals, I could do this:
$ ./ 1234 5 9999
Or suppose I wanted a single 60 second sample for process 5432. I would enter this:
$ ./ 5432 60 1
The output will look exactly like the output from basic usage. You are just not prompted or given information to help you pick the OS PID.

Here is a short video so you can watch this in action! - Report both Oracle Wait and CPU time details from OraPub on Vimeo.

Why This Is So Important

Look at the example screen shot below and ask yourself, "Is there an IO bottleneck?"

If the CPU information was not combined with the wait time details, I suspect 90% of Oracle DBAs would say there is an IO problem. They would see the top wait event db file sequential read then conclude there is an IO issue. However, as the script clearly shows, CPU consumption is much larger than Oracle wait time, providing us with a greater opportunity to reduce the total time and devise additional spot-on solutions. Also looking at the report's "total time" breakdown, it clearly shows CPU consumption is 97% of the total time. I wouldn't be surprised if there was a raging CPU bottleneck!

So Where Are We In This Quest?

Now that you have access to the script and know how to use it, you're going to start asking, "Just what is kcbgtcr anyways?!" In my next posting I'll address this.

But until then, get the script, log into your non-production yet active Linux box and give it a go! If it's not production, profile the Oracle log writer or database writer background process or perhaps an active server process!

Feeling dangerous yet? (Always!)

Here are the key resource links:

1. Presentation: Detailing Oracle CPU Consumption. Download HERE.
2. Tool: Oracle wait time and kernel function CPU monitor. Download the script HERE.

Thanks for reading,


If you enjoy my blog, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting,  Advanced Oracle Performance Analysis, and my new seminar, Go Faster: Make Oracle Work For You! I teach these classes around the world multiple times each year. For the latest schedule, go to . I also offer on-site training and consulting services.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently info@ orapub. com.