Thursday, April 28, 2011

Giving Twitter A Try...@CShallahamer

I'm always looking for better ways to reach DBAs in less intrusive and obnoxious ways. Twitter may be an answer...maybe not. But I'm willing to give it a try!

So the next month, I'll be tweeting about new blog entries, tools, research, training schedules and discounts, etc. If you want to follow me on Twitter, my account is @CShallahamer.

Thanks for reading,

Craig.

Friday, April 22, 2011

Core vs Thread...CPU Utilization - Part 1

What's the Big Deal?

This blog series was motivated for two reasons. First, I want to demonstrate how one can calculate a database server's CPU utilization straight from Oracle v$ views. Second, over the past few years I've seen a couple examples where database servers have exhibited more power then the number of CPU cores could provide, which implies threads are indeed providing some extra power...so some investigation seems warranted.

This is a two part blog entry. This first part will detail how you can calculate the OS CPU utilization straight from v$ views along with the fundamentals of utilization. (Which can be widely applied to performance analysis.) In part two, I will present the results of my experiments contrasting CPU cores versus threads...and of course you'll be able to download and run the experimental scripts yourself!

Threads versus Cores

When you get down to what actually services work, it is a CPU core. Threads and hyper-threads work must be ultimately processed by a core. An exception would be if threads are truly simultaneously processed (and this can be tested). Capacity planners will tell you that threads don't scale. They will tell you that once each thread is occupied, their advantages decline. Unless you have experimental evidence leading you differently, when predicting the CPU power of a database server, the conservative (some say honest and ethical) way to determine the CPU capacity is to use cores, not threads, or hyper-threads. As a simple internet search will demonstrate, there are many published references. I found a great reference that any DBA will understand on Dell's web-site here.

Calculating OS CPU Utilization

On most Oracle systems it is a simple matter to compute the OS CPU utilization based on the 10g view v$osstat. The key to understanding utilization is it is simply requirements divided by capacity...end of story. Requirements can take on many forms such as CPU time consumed, IOPS performed, memory currently held, etc. Capacity can also take on many forms such as available CPU time, sustained IOPS available, physical memory, etc. Over an interval of time, if I know the CPU time consumed and I know the number of CPU cores on the database server, I can easily calculate the average database server CPU utilization over the interval. Here's a basic utilization formula:

U = R / C

Where;

U is utilization
R is requirements
C is capacity

I need to mention that when, referring to a CPU subsystem the capacity is the number of cores multiplied by the interval. For example, over a 1 minute interval a single core box can provide a maximum of 60 seconds of CPU. Over a 2 minute interval a single core box can provide a maximum of 120 seconds of CPU. Over a 2 minute interval a dual core box can provide a maximum of 240 seconds of CPU. I think you get the pattern, which is C = cores X interval. Using a more realistic example, over a one hour interval, a 16 core database server can provide up to 57600 seconds of CPU power; 16 cores X 1 hour X 60 min/hour X 60 sec/min = 57600 core-seconds.

Let's put this together resulting in the CPU utilization. Suppose over a one-hour interval an 8 core database server was busy 10080 seconds. In other words, the CPU subsystem was required to work 10080 seconds, so R = 10080 sec. Over a one-hour interval 8 cores can provide a maximum of 28800 seconds of CPU power; C = 8 cores X  1 hour X 60 min/hour X 60 sec/min = 28800 core-seconds. Therefore, the average CPU utilization over the interval is 0.35 or 35%; 10080 seconds / 28800 seconds.

But where did I get the 10080 seconds figure?

We can get the seconds of CPU consumed directly from v$osstat. The v$osstat view was introduced in Oracle 10g. It contains a dizzying array of confusing statistics that seem to change from platform to platform and from release to release. The statistic busy_time is the total CPU consumed by all operating system processes in hundreds of a seconds, that is, centi-seconds. For example, if the busy time is 123456 then since the operating system (not the Oracle instance) has started, all operating system processes (Oracle and everything else) have consumed 1234.56 seconds of CPU. We are not making a statement about the speed of a CPU, but simply the processes consumed 1234.56 seconds of CPU since the server was last rebooted.

Where do we get the number of CPU cores or threads?

For our purposes today, we want to know the number of CPU cores and sometimes threads. While not perfect I'm sure, if you reference any core related v$osstat statistic that does not contain the word thread, and take the maximum value, that will be the number of CPU cores. I realize this is not real scientific, but it seems to work. (And a quick check using sar, top, or asking the sysadmin also helps build confidence!)

Let's do it!

The pl/sql block below will determine the average OS CPU utilization over a 60 second interval. You can easily change the interval time.

set tab on
set serveroutput on
declare
  interval_sec      integer;
  busy_sec_t0       integer;
  busy_sec_t1       integer;
  cpu_cores_t0      integer;
  cpu_cores_t1      integer;
  cpu_cores_avg     number;
  requirements_sec  number;
  capacity_sec      number;
  utilization       number;
begin
  interval_sec := 60;

  select (a.value/100), b.value
  into   busy_sec_t0, cpu_cores_t0
  from   ( select value from v$osstat where stat_name='BUSY_TIME' ) a,
         ( select value from v$osstat where stat_name='NUM_CPUS' ) b ;

  dbms_lock.sleep(interval_sec);

  select (a.value/100), b.value
  into   busy_sec_t1, cpu_cores_t1
  from   ( select value from v$osstat where stat_name='BUSY_TIME' ) a,
         ( select value from v$osstat where stat_name='NUM_CPUS' ) b ;

  requirements_sec := busy_sec_t1 - busy_sec_t0 ;
  cpu_cores_avg    := (cpu_cores_t1 + cpu_cores_t0) / 2 ;
  capacity_sec     := interval_sec * cpu_cores_avg ;
  utilization      := requirements_sec / capacity_sec ;

  dbms_output.put_line('OS CPU Utilization Calculation...');
  dbms_output.put_line('REQUIREMENTS----------');
  dbms_output.put_line('..busy_sec_t0 (sec)    :'||busy_sec_t0);
  dbms_output.put_line('..busy_sec_t1 (sec)    :'||busy_sec_t1);
  dbms_output.put_line('..requirements (sec)   :'||requirements_sec);
  dbms_output.put_line('CAPACITY----------');
  dbms_output.put_line('..interval (sec)       :'||interval_sec);
  dbms_output.put_line('..CPU cores            :'||cpu_cores_avg);
  dbms_output.put_line('..capacity (sec)       :'||capacity_sec);
  dbms_output.put_line('-------');
  dbms_output.put_line('Utilization (avg)      :'||round(utilization,2));
end;
/
Note: You will need to check your v$osstat CPU core statistic name as there is a good chance it will be different. Also, if you see the statistic avg_busy_time do NOT use that. (How Oracle determines the average time of something over an interval is beyond me.) Instead, always use the busy_time statistic.

On my Linux system the results where:

SQL> /
OS CPU Utilization Calculation...
REQUIREMENTS----------
..busy_sec_t0 (sec)    :307954
..busy_sec_t1 (sec)    :308015
..requirements (sec)   :61
CAPACITY----------
..interval (sec)       :60
..CPU cores            :4
..capacity (sec)       :240
-------
Utilization (avg)      :.25

PL/SQL procedure successfully completed.
This should match the average CPU utilization from an OS command, such as sar, vmstat, or top. Why? Because the v$osstat view pulls its data directly from the virtual filesystem /proc...and so does sar, vmstat, and top. For details, refer to the Operating System Contention chapter's Monitoring CPU Activity section (starting on page 110) in my Oracle Performance Firefighting book.

What's Next?

Most Oracle systems, using CPU cores to determine the OS CPU capacity works perfectly. But over the years, I have seen a few systems where it appears the CPU capacity is actually greater than the number of CPU cores could physically provide (in this universe). This implies threads are providing more power and a better CPU capacity calculation will include a mix of CPU cores and threads. This calls for an experiment...one that we all can run on our Oracle systems. So in part two of this series I will present the tool to gather the data, share some experimental results, and draw some conclusions.

Thanks for reading!

Craig.