In my last blog entry, Cores vs Threads: Util Difference...Part 2 I analyzed seven data sets to discover if there is a difference between calculating OS CPU utilization based on v$osstat CPU cores versus running vmstat. All but one sample set (that AIX sample set, AG1) clearly showed there was no significant difference.
But I had a concern. I wrote,
The only concern I have is none of the sample sets were gathered from a system with the CPU utilization greater than 65%. I would like to see some sample sets with a CPU constrained system. It is possible if the utilization differences are highly skewed and the residual slope is not flat (discussed below), the utilization difference (i.e., the gap) could become increasingly larger.
The week after I posted that entry, a willing participate (who I'll call AB) gathered and sent me data from a production Oracle Solaris system that peaked at 100% CPU utilization! So I was very excited to analyze the results. You can view the entire results in a single PDF file by clicking here.
The results were even more pronounced than the other six "no utilization difference" sample sets; clearly there was no significant difference between the utilization calculations. Why no difference? Read on...
The correlation between the two utilization sources is a jaw-dropping 0.99985. The statistical hypothesis test resulted in a p-value of 0.815 clearly exceeding our threshold of 0.05 and forcing us to accept the null hypothesis that the two sample sets are not statistically different.
Figure 1 above shows the Oracle utilization as red squares and the vmstat utilization as blue dots...but you can't see any blue dots because the Oracle red squares are in front of them...the utilizations are visually exactly the same! Even with the utilization at 90% and above, there is no visual difference...amazing.
Figure 2 above is the residual graph plotting the utilization difference versus Oracle utilization. Notice the difference is usually less than 1% and just as important the error does not increase as the utilization increases (which is common when forecasting Oracle performance). In fact, the slope of the trend line is 0.00285....flat. What this means is detailed in the previous blog entry; search for "trend line slope".
The AB1 data set clearly showed there was no real difference in the utilization calculations.
This is yet another data set demonstrating in many environments using v$osstat and CPU cores to calculate CPU utilization is a valid alternative to running vmstat. The best way to determine if it's OK in your environment is to simply gather a little data and plot some points.
I still would like to see more AIX data sets. I suspect (and the single AIX AG1 data set demonstrated this) the way AIX calculates utilization is different than using v$osstat and CPU cores. And as the CPU subsystem gets busier, the utilization difference increases. More data is needed before I can say more about this though.
Like I mentioned in my previous blog entry, I think the more intriguing question is why can there be a difference in utilization calculations. I would have posted that entry his week, but teaching last week in Boston (see pictures here) destroyed my voice and I was so tired at night I had no time to complete the entry...so stay tuned!
Thanks for reading!
If you enjoy my blog, I suspect you'll get a lot out of my courses; Oracle Performance Firefighting and Advanced Oracle Performance Analysis. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.
P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently firstname.lastname@example.org.