Thursday, February 28, 2013

How many CPU cores do I really have?


How Many CPU Cores Do I Really Have?


The view operating system statistic view v$osstat is can be misleading with regards to CPU cores. Not that the information is incorrect, it's well... let's say troubling. If I ask ten people to email a sample AWR report, I'm likely to see CPU core-like statistics such as CPU_SOCKETS, NUM_CPUS, VCPU, LCPU, CPU_THREADS, and probably a variety of other names. Wow… what a mess!

But I'd still like to know because it's important for my work. For two reasons:

First, it helps me to understand how high the CPU utilization can go before performance starts to degrade. Based on queuing theory, the more processes a system can simultaneously process the higher the average utilization the system can sustain before performance begins to degrade. I write about this in the Operating System section in my Oracle Performance Firefighting book.

Second, I always check what the math and my observations indicate versus what the OS administrator and OS commands (such as vmstat, war, top) tell me. Paranoid perhaps, but doing Oracle work for 20-plus years has taught me a few things...

Call It a Server, Not a Core, Lcpu, Thread, etc.


To avoid the entire discussion about which provides the processing power; core or thread, let's simply call the unit of processing power a "server." Why? Two reasons. First, because it provides CPU service to processes, so it truly is a "server." Second, that's what capacity planners call something that services transactions; a server. In fact, its symbol is M (capital "m").

By the way, it is very easy to determine, on your system, what provides the true CPU processing power (cores, threads, or something else). I blogged about this in June of 2011.

So the question is, how many "servers" does your database host contain? That's what this posting is all about.

If you recall from my previous posting, I demonstrated two ways to calculate CPU utilization. Both follow the classic; requirements divided by capacity. But the capacity is where the two approaches differ.

Capacity Calculation Using "servers"


Using "servers" to calculate the capacity is simply the number of servers multiplied by the snapshot interval. So a 2 server (think two cores) host over a 60 minute period can provide a maximum of 120 minutes or 7200 seconds of CPU power.

Here's the utilization formula using the capacity approach:

U = R / C

where;

R = CPU consumption over the interval (seconds)
C = CPU "servers" X interval (seconds)

For example, looking at a real AWR report, over a 60 minute interval, the AWR's Operating System Statistics show show a BUSY_IIME of 1913617, IDLE_TIME of 7159367 and the NUM_CPUS of 24.

Therefore, the average CPU utilization over the interval is:

U = 19136.17 / ( 24 * 60 * 60 ) = 0.221 = 22%

Capacity Calculation Using Busy and Idle Time


In my previous posting I introduced using only v$osstat's BUSY_TIME and IDLE_TIME values to calculate the average CPU utilization over the snapshot interval. Here's the formula:

U = R / C = BUSY_TIME / ( BUSY_TIME + IDLE_TIME )

Using the above examples numbers;

U = 1913617 / ( 1913617 + 7159367 ) = 0.211 = 21%

Yes, the two utilization calculation results don't match perfectly but they are very close… close enough.

Calculating the Number of "Servers"


Notice that in the busy and idle time capacity calculation there is no reference to the number of servers. Suppose you don't trust the v$osstat CPU core-like statistics or are simply not sure which one is important. In other words, you want to understand the effective number of CPU "servers." Using the two utilization formulas and some algebra we can figure this out!

Making sure to use the same unit of time, here are two capacity calculations:

C = servers * interval
C = busy_time + idle_time

Let's put them together and solve for "servers".

servers * interval = busy_time + idle_time

servers = ( busy_time + idle_time ) / interval

OK… but does this really work? Let's give it a try! (I'm going to use seconds as my unit of time.)

effective servers = ( 19136.17 + 71593.67 ) / ( 60 * 60 ) = 25.2

The math tells us that based on the collected data, on average the system is operating with effectively 25 "servers." I know in this situation there are physically 24 CPU cores, so we're pretty close.

What to Do With AIX


While this "effective servers" formula has proven its worth in many systems, I still find it does not work well many times in an AIX environment. Sometimes it does, but not always. So do the math and compare it with vmstat or some other AIX based tool.

The Take-Aways


The big one:

servers = ( busy_time + idle_time ) / interval

Personally, I never initially trust the CPU number related v$osstat statistics. I always check with the OS administrator and also run a simple OS command like top or sar or do a "cat /proc/stat". It's always a good idea to casually check with the OS administrator. You don't want to be thinking and working with 12 "servers" when the administrator is thinking 24 "servers."

For me, knowing the number of CPU "servers" is important. And since I never blindly trust the v$osstat CPU statistics, this is a very fast and reliable way (so far at least) to check my work.

Thanks for reading!

Craig.


If you enjoy my blog, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting,  Advanced Oracle Performance Analysis, and my one-day Oracle Performance Research Seminar. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.

P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. Another option is to send an email to OraPub's general email address, which is currently orapub.general@comcast .net. 





No comments:

Post a Comment