<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4169710303065679169</id><updated>2012-01-25T14:35:54.409-08:00</updated><category term='oracle index'/><category term='DBTA'/><category term='instrumentation'/><category term='segment header'/><category term='quantitative model'/><category term='non-parametric significant test'/><category term='exponential distribution'/><category term='user calls'/><category term='hash'/><category term='average'/><category term='consistent reads'/><category term='operations research'/><category term='latches'/><category term='experimental design'/><category term='insert'/><category term='population standard deviation'/><category term='pread64'/><category term='forecasting oracle performance'/><category term='v$sysstat'/><category term='cursor'/><category term='location test'/><category term='sql elapsed time'/><category term='poisson distribution'/><category term='strace'/><category term='readv'/><category term='commit rate'/><category term='mean'/><category term='log normal distribution'/><category term='v$session_event'/><category term='sql net round trip'/><category term='v$osstat'/><category term='hashing'/><category term='disperse latch contention'/><category term='put_line'/><category term='instance statistics'/><category term='queue time'/><category term='child latch'/><category term='visualization'/><category term='physical read total IO requests'/><category term='hot buffer'/><category term='threads'/><category term='mathematica'/><category term='library cache'/><category term='undo'/><category term='scalability'/><category term='index dump'/><category term='unexpected'/><category term='commit'/><category term='exponential'/><category term='duration'/><category term='fetch'/><category term='statistical distribution'/><category term='concurrency'/><category term='buffer header'/><category term='cache buffer chain latch contention'/><category term='v$sqlstats'/><category term='uniform distribution'/><category term='oracle wait interface'/><category term='oracle'/><category term='batch'/><category term='poisson'/><category term='v$event_histogram'/><category term='sweet spot'/><category term='parallelization'/><category term='advanced oracle performance analysis'/><category term='cbc'/><category term='child cursor'/><category term='consistent changes'/><category term='arrival rate'/><category term='count mismatch'/><category term='kruskal-Wallis'/><category term='wait event times'/><category term='oracle performance analysis'/><category term='CBC latch'/><category term='batch commit'/><category term='statistics'/><category term='number of servers'/><category term='requirements'/><category term='msolver'/><category term='correlation'/><category term='read consistency'/><category term='vmstat'/><category term='service time'/><category term='median'/><category term='capacity'/><category term='skewness'/><category term='log file sync'/><category term='OraPub Wait Event Distribution Analysis Tool'/><category term='cache buffer chain'/><category term='training schedule'/><category term='least recently used'/><category term='sql_id'/><category term='serialization'/><category term='batch commit size'/><category term='total_waits'/><category term='response time'/><category term='graph'/><category term='mutex latch spinning sched_yield serialization oracle'/><category term='chain of undo'/><category term='oracle performance training'/><category term='insert rate'/><category term='unit of work time based analysis'/><category term='singular latch contention'/><category term='oracle training'/><category term='buffer'/><category term='wolfram'/><category term='v$system_event'/><category term='time per work'/><category term='skew'/><category term='arrival pattern'/><category term='mutex'/><category term='index root block'/><category term='sql trace'/><category term='standard deviation'/><category term='limits'/><category term='normal distribution'/><category term='latch access pattern'/><category term='elapsed time'/><category term='batch size'/><category term='inter-arrival time'/><category term='v$sesstat'/><category term='altering M'/><category term='hot block'/><category term='histogram'/><category term='sample dispersion'/><category term='oracle performance firefighting'/><category term='recursive sql'/><category term='io read occur'/><category term='time based performance analysis'/><category term='run queue'/><category term='database trends magazine'/><category term='parse'/><category term='execute'/><category term='altering service time'/><category term='oracle performance'/><category term='precision'/><category term='SLA'/><category term='plan_hash_value'/><category term='latch acquisition pattern'/><category term='write list'/><category term='log normal'/><category term='bind variable'/><category term='response time analysis'/><category term='altering arrival rate'/><category term='fitness test'/><category term='cores'/><category term='commit time'/><category term='latch wait pattern'/><category term='msolve'/><category term='cpu utilization'/><category term='pattern'/><category term='latch'/><category term='parallel streams'/><category term='model'/><category term='consistent gets'/><category term='utilization'/><category term='teaching schedule'/><category term='queuing theory'/><category term='busy_time'/><category term='distribution'/><category term='sampling'/><category term='wait events'/><title type='text'>A   W i d e r    V i e w</title><subtitle type='html'>Experimentations and ruminations on Oracle performance management.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>44</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-3749769762116478143</id><published>2011-12-09T10:16:00.000-08:00</published><updated>2011-12-09T10:40:12.154-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='latch'/><category scheme='http://www.blogger.com/atom/ns#' term='child latch'/><category scheme='http://www.blogger.com/atom/ns#' term='hash'/><category scheme='http://www.blogger.com/atom/ns#' term='pattern'/><category scheme='http://www.blogger.com/atom/ns#' term='buffer header'/><category scheme='http://www.blogger.com/atom/ns#' term='cache buffer chain'/><category scheme='http://www.blogger.com/atom/ns#' term='buffer'/><category scheme='http://www.blogger.com/atom/ns#' term='cbc'/><title type='text'>Singular CBC Latch Acquisition Pattern Diagnosis</title><content type='html'>&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Quick Introduction&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In my &lt;a href="http://shallahamer-orapub.blogspot.com/2011/11/cbc-latch-diagnosis-acquisition.html"&gt;previous posting on this topic&lt;/a&gt; I focused on determining cache buffer chain (CBC) wait time &lt;i&gt;acquisition patterns; disperse or singular&lt;/i&gt;. A &lt;i&gt;disperse&lt;/i&gt;&amp;nbsp;pattern occurs when many CBC latches are active. In contrast, there are situations when only one or a few CBC child latches are extremely active, hence the&amp;nbsp;&lt;i&gt;singular&lt;/i&gt;&amp;nbsp;pattern. In my previous posting,&amp;nbsp;I also listed some disperse pattern solutions.&lt;br /&gt;&lt;br /&gt;In this posting I'm focusing on the additional diagnostic steps needed before a specific solution can be determined &lt;i&gt;with a singular wait acquisition pattern&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Important: To understand what I write below you need a basic understanding of Oracle buffer cache internals. In particular the buffer, latch, child latch, cache buffer chain, and buffer header. These are described (including nice diagrams) in my posting &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2010/09/buffer-cache-visualization-and-tool.html"&gt;here&lt;/a&gt;&lt;/b&gt;. In that posting a visualization tool is also used which you can download for free &lt;b&gt;&lt;a href="http://resources.orapub.com/SearchResults.asp?Search=bc+visual"&gt;here&lt;/a&gt;&lt;/b&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Situations with singular CBC acquisition patterns&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Two of the most common popular buffer concurrency situations that can stress a single CBC latch are:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;First, when a single block contains some application reference information and is not being changed often. The popular &lt;b&gt;buffer&lt;/b&gt; causes its &lt;b&gt;hash/cache buffer chain&lt;/b&gt; to be popular and then also the &lt;b&gt;CBC child latch&lt;/b&gt; covering the hash buffer chain. If concurrency is high enough, this can become an issue. However, if the buffer is being changed often, we'd likely see a buffer busy wait.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Second, relates to a popular index root block. Any time an index is accessed its single root block must first also be accessed, which repeatedly forces the question, "Is this index root block in the buffer cache?". If the concurrency becomes high enough, eventually you'll see CBC latch contention associated with a single CBC child latch.&lt;br /&gt;&lt;br /&gt;There are other singular CBC access situations. Next I'll present how to determine the acquisition pattern.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Diagnosing&amp;nbsp;the acquisition pattern&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We need to determine if the CBC child latch access pattern is disperse or singular. A way to do this is shown the SQL below, which is essentially my &lt;b&gt;&lt;a href="http://osmtoolkit.com/latchchild.sql.html"&gt;latchchild.sql&lt;/a&gt;&lt;/b&gt; OSM script. The script collects access details for latch number 150 (the CBC latch) over a 300 second period. Run the script a few times (i.e., collect a few sample sets) to ensure what you think is occurring actually is occurring.&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;def latch=150&lt;br /&gt;def sleep=300&lt;br /&gt;drop table op_interim&lt;br /&gt;/&lt;br /&gt;create table op_interim as&lt;br /&gt;select addr,gets,sleeps&lt;br /&gt;from   v$latch_children&lt;br /&gt;where  latch# = &amp;amp;latch&lt;br /&gt;/&lt;br /&gt;exec dbms_lock.sleep(&amp;amp;sleep);&lt;br /&gt;select t1.addr,&lt;br /&gt;       t1.gets-t0.gets delta_gets,&lt;br /&gt;       t1.sleeps-t0.sleeps delta_sleeps&lt;br /&gt;from   op_interim t0,&lt;br /&gt;       (&lt;br /&gt;         select addr,gets,sleeps&lt;br /&gt;         from   v$latch_children&lt;br /&gt;         where  latch# = &amp;amp;latch&lt;br /&gt;       ) t1&lt;br /&gt;where  t0.addr = t1.addr&lt;br /&gt;order by 3,2&lt;br /&gt;&lt;/code&gt;/&lt;/pre&gt;&lt;pre&gt;&lt;/pre&gt;&lt;pre&gt;ADDR             DELTA_GETS     DELTA_SLEEPS&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;-------- ------------------ ----------------&lt;br /&gt;...&lt;br /&gt;76C33818            19,8038                3&lt;br /&gt;76C644A8             8,1535                4&lt;br /&gt;76C7759C             9,6993                4&lt;br /&gt;76C2D1CC            14,5096                4&lt;br /&gt;76E03FEC            14,9355                4&lt;br /&gt;76CDD394            16,0718                4&lt;br /&gt;76C68904            18,2300                4&lt;br /&gt;76D69374             6,5250                5&lt;br /&gt;76D7800C            13,3134                5&lt;br /&gt;76DA2650            15,6578                5&lt;br /&gt;76C02B88            15,8293                5&lt;br /&gt;76CE59E0            14,5169                7&lt;br /&gt;76C7B9F8            15,8243                7&lt;br /&gt;76CFF120             6,5228                9&lt;br /&gt;76CFCF30            14,8187               11&lt;br /&gt;76D38668            96,6345               62&lt;br /&gt;76CDF508            98,1384               96&lt;br /&gt;76DAF26C          1,96,2752              187&lt;br /&gt;&lt;br /&gt;1024 rows selected.&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 1. A 300 second CBC child latch acquisition activity collection SQL and report. Notice the last three child latches are the most active and have the most sleeps (which we see as "waits").&amp;nbsp;&lt;/b&gt;&lt;/div&gt;&lt;br /&gt;Figure 1 shows three CBC child latches received a high proportional amount of activity, especially the 76DAF26C child latch. Therefore, we'll focus on the three child latches with addresses 76DAF26C, 76DF508, and 76D38668.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Determine the hot buffer(s) and buffer header(s)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;More precisely, we need to determine the hot buffer headers that are on the CBC chains protected by the unusually active CBC child latch(es). The &lt;b&gt;x$bh&lt;/b&gt; performance table contains information about each buffer header that has an associated buffer in the buffer cache. (&lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2010/09/buffer-cache-visualization-and-tool.html"&gt;More about buffer headers&lt;/a&gt;&lt;/b&gt;) There are four columns of particular interest to us:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;HLADDR&lt;/b&gt;&amp;nbsp;is the "hash latch address" hence the column name &lt;b&gt;hladdr&lt;/b&gt;. This is a foreign key to the &lt;b&gt;v$latch_children&lt;/b&gt; view's &lt;b&gt;addr&lt;/b&gt; column.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;FILE#&lt;/b&gt; is the buffer header's (also Oracle block and buffer) file number that links to many performance views and tables, such as &lt;b&gt;dba_data_files.file_id&lt;/b&gt; and &lt;b&gt;dba_segments.header_file&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;DBABLK&lt;/b&gt; is the buffer header's (also Oracle block and buffer) block number which may be referenced in &lt;b&gt;dba_segments.header_block&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;TCH&lt;/b&gt; is the buffer header's touch count number, which is a popularity indicator...with a few twists I'll detail below. In every Oracle release I have checked (including 11.2), the &lt;b&gt;tch&lt;/b&gt; column is not in &lt;b&gt;v$bh&lt;/b&gt;, only &lt;b&gt;x$bh&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;Keep in mind that each child latch will normally "cover" multiple cache buffer chains (perhaps &amp;gt;100) and each chain can have zero or more associated buffer headers (average chain length is usually 0 to 1). And of course each buffer header is related to a cached Oracle block residing in the buffer cache.&lt;br /&gt;&lt;br /&gt;In the report shown in Figure 1, we saw which CBC child latches are relatively busy;&amp;nbsp;76DAF26C, 76DF508, and 76D38668. Now we need to know which CBC chains and buffers are related to our relatively active CBC child latches. By querying &lt;b&gt;x$bh&lt;/b&gt; we can easily determine the buffer headers associated with a given CBC child latch. We can also get a clue as to the buffer header's popularity. The SQL below is once such query.&lt;/div&gt;&lt;pre&gt;&lt;code&gt;SQL&amp;gt; l&lt;br /&gt;  1  select hladdr, file#, dbablk, tch&lt;br /&gt;  2  from   x$bh&lt;br /&gt;  3  where  hladdr in ('76DAF26C','76CDF508','76D38668')&lt;br /&gt;  4* order by 4&lt;br /&gt;SQL&amp;gt; /&lt;br /&gt;&lt;br /&gt;HLADDR        FILE#     DBABLK        TCH&lt;br /&gt;-------- ---------- ---------- ----------&lt;br /&gt;...&lt;br /&gt;76D38668          1      70197          5&lt;br /&gt;76D38668          1      39365          5&lt;br /&gt;76DAF26C          1     117328        185&lt;br /&gt;76CDF508          1     117329        186&lt;br /&gt;&lt;br /&gt;47 rows selected.&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;As I wrote many years ago in my &lt;a href="http://resources.orapub.com/SearchResults.asp?Search=touch+count+algorithm"&gt;touch count algorithm paper&lt;/a&gt; and detailed in my &lt;a href="http://resources.orapub.com/Oracle_Performance_Firefighting_Book_p/ff_book.htm"&gt;Oracle Performance Firefighting book&lt;/a&gt;, Oracle uses the "touch count" algorithm to essentially tag popular buffers. There is a little twist though... a buffer's touch count can get reset to zero. So to truly determine the popular buffers, we must &lt;i&gt;repeatedly sample&lt;/i&gt; &lt;b&gt;x$bh&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;The SQL below can be used to repeatedly sample &lt;b&gt;x$bh&lt;/b&gt; finding the most popular buffers given their CBC latch address. If you look closely at the SQL, you'll notice it collects and stores 300, 1 second interval &lt;b&gt;x$bh&lt;/b&gt; samples. If you use a larger sleep time, you'll want to increase the number of samples collected. The final select statement reports the key &lt;b&gt;tch&lt;/b&gt; based popularity statistics.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;drop table op_interim;&lt;br /&gt;create table op_interim (hladdr raw(4), file# number, block# number, tch number);&lt;br /&gt;declare&lt;br /&gt;  i number;&lt;br /&gt;begin&lt;br /&gt;  for i in 1..300&lt;br /&gt;  loop&lt;br /&gt;    insert into op_interim &lt;br /&gt;      select hladdr,file#, dbablk, tch&lt;br /&gt;      from   x$bh&lt;br /&gt;      where  hladdr in ('76DAF26C','76CDF508','76BE93CC');&lt;br /&gt;    dbms_lock.sleep(1);&lt;br /&gt;  end loop;&lt;br /&gt;end;&lt;br /&gt;/&lt;br /&gt;select hladdr, file#, block#,&lt;br /&gt;       count(*) count, min(tch) min, median(tch) med,&lt;br /&gt;       round(avg(tch)) avg, max(tch) max&lt;br /&gt;from   op_interim&lt;br /&gt;group by hladdr, file#, block#&lt;br /&gt;order by 7&lt;br /&gt;/&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;/pre&gt;&lt;pre&gt;HLADDR    FILE#     BLOCK#      COUNT        MIN        MED        AVG        MAX&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;-------- ------ ---------- ---------- ---------- ---------- ---------- ----------&lt;br /&gt;...&lt;br /&gt;76BE93CC      1      39364        300          6          6          6          6&lt;br /&gt;76BE93CC      1      69730        300         13         14         14         14&lt;br /&gt;76CDF508      1     117329        600          1         36         64        181&lt;br /&gt;76DAF26C      1     117328        300         70        125        125        180&lt;br /&gt;76BE93CC      4    1552339        300         76        131        131        186&lt;br /&gt;&lt;br /&gt;42 rows selected.&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;The statistics that I'm most interested in are the median (MED) and the maximum (MAX). And I'm hoping there is a clear buffer header or very few buffer headers that are relatively really, really active (i.e., hot).&lt;br /&gt;&lt;br /&gt;As a side note, you may have noticed that one of the COUNTS is 600 while the others are 300. This could have occurred because there are two buffer headers related to buffer 1,117329. Perhaps one could be the current mode (CU) buffer and the other a consistent read (CR) buffer. The above SQL could hbe improved by adding the state column to more uniquely identify a buffer header by its &lt;b&gt;file#&lt;/b&gt;, &lt;b&gt;block#&lt;/b&gt;, and &lt;b&gt;state&lt;/b&gt;...but I digress.&lt;br /&gt;&lt;br /&gt;Based on the above SQL output, we know&amp;nbsp;the hot buffers (file#, block#) are: (1,117328) and (4,1552339). You could also argue to include 1,117329 but this buffer header is not consistently hot because its median is much lower than the other two. But in all honestly, if this was a real production system and because the above SQL could be improved, I would investigate. Now we need to understand why these two buffer headers are so popular.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Determine the hot buffer details&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I will investigate the two most popular buffers. But honestly, my focus in this posting is on the second buffer; 4,1552339.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;u&gt;Investigating Buffer 1, 117328&lt;/u&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;To determine the segment's name and type, I'm going to use my &lt;b&gt;dba_extents&lt;/b&gt; based OSM script, &lt;b&gt;&lt;a href="http://osmtoolkit.com/objfb.sql.html"&gt;objfb.sql&lt;/a&gt;&lt;/b&gt;. The SQL is a little tricky, so you may want to check it out.&amp;nbsp;Let's first look at block 1,117328.&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;SQL&amp;gt; @objfb 1     117328&lt;br /&gt;&lt;br /&gt;Database: prod18                                               18-NOV-11 09:45am&lt;br /&gt;Report:   objfb.sql            OSM by OraPub, Inc.                Page         1&lt;br /&gt;            Object Details For A Given File #(1) and block #(117328)&lt;br /&gt;&lt;br /&gt;File number    :1&lt;br /&gt;Block number   :117328&lt;br /&gt;Owner          :SYSTEM&lt;br /&gt;Segment name   :OP_LOAD_PARAMS&lt;br /&gt;Segment type   :TABLE&lt;br /&gt;Tablespace     :SYSTEM&lt;br /&gt;File name      :/u01/oradata/prod18/system01.dbf&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;OK, so we're dealing with a table. But this is strange because the &lt;b&gt;op_load_params&lt;/b&gt; table is used to interactively change the load intensity of my workload generator tool. (You can download an &lt;a href="http://resources.orapub.com/SearchResults.asp?Search=opload"&gt;older version here&lt;/a&gt;. I haven't posted the latest version... just lazy I guess.)&lt;br /&gt;&lt;br /&gt;When I'm investigating hot &lt;i&gt;table&lt;/i&gt; buffers, I also check if the buffer is the segment header block (header blocks contain special stuff...sorry to be so non-specific... but I digress). To determine if the buffer is a segment header block, I ran the below code snippet:&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;SQL&amp;gt; l&lt;br /&gt;  1  select owner, segment_name, segment_type&lt;br /&gt;  2  from   dba_segments&lt;br /&gt;  3  where  header_file=&amp;amp;hdr_file&lt;br /&gt;  4*   and  header_block=&amp;amp;hdr_block&lt;br /&gt;SQL&amp;gt; /&lt;br /&gt;Enter value for hdr_file: 1&lt;br /&gt;Enter value for hdr_block: 117328&lt;br /&gt;&lt;br /&gt;OWNER      SEGMENT_NAME              SEGMENT_TY&lt;br /&gt;---------- ------------------------- ----------&lt;br /&gt;SYSTEM     OP_LOAD_PARAMS            TABLE&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;Since a row was returned, this popular buffer header is indeed the &lt;b&gt;op_load_params&lt;/b&gt; table header block! This is not what I expected and not the focus of this blog posting... so I'll move on. But if this was a production system, you better believe I would figure it out!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;u&gt;Investigating Buffer 4, 1552339&lt;/u&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Now let's turn our attention to the other hot buffer header, namely buffer header&amp;nbsp;4,1552339. First I'll determine the object type by running my &lt;b&gt;&lt;a href="http://osmtoolkit.com/objfb.sql.html"&gt;objfb.sql&lt;/a&gt;&lt;/b&gt; script.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;SQL&amp;gt; @objfb 4    1552339&lt;br /&gt;&lt;br /&gt;Database: prod18                                               18-NOV-11 09:51am&lt;br /&gt;Report:   objfb.sql            OSM by OraPub, Inc.                Page         1&lt;br /&gt;            Object Details For A Given File #(4) and block #(1552339)&lt;br /&gt;&lt;br /&gt;File number    :4&lt;br /&gt;Block number   :1552339&lt;br /&gt;Owner          :MG&lt;br /&gt;Segment name   :SPECIAL_CASES_BOGUS&lt;br /&gt;Segment type   :INDEX&lt;br /&gt;Tablespace     :USERS&lt;br /&gt;File name      :/u01/oradata/prod18/users01.dbf&lt;br /&gt;&lt;br /&gt;1 row selected.&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;i&gt;Very&lt;/i&gt; interesting... an index. So we know this hot buffer header is an index block. I'm willing to bet it's the index's &lt;i&gt;root block&lt;/i&gt;! Why? Because every time an index is accessed, it's root &lt;b&gt;block&lt;/b&gt; &lt;b&gt;buffer header&lt;/b&gt;&amp;nbsp;(and buffer) is also accessed. And a very active index root block buffer header can cause its &lt;b&gt;hash chain&lt;/b&gt; to be very active which can cause problems when a process attempts to acquire the hash &lt;b&gt;chain's child latch&lt;/b&gt;. But how can we tell if an index block is the root block?&lt;br /&gt;&lt;br /&gt;As I wrote in my &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/11/are-you-sure-its-index-root-block.html"&gt;November 11, 2011 posting&lt;/a&gt;&lt;/b&gt;, we can expect an index's root block to have a block number that is one greater than its segment header block. If you look closely at the SQL below, the block number I entered is one less then the popular block (1552338 = 1552339-1). If this 1552339 block is indeed an index root block, then it's segment header will have a block number of 1552338... let's check it out!&lt;br /&gt;&lt;pre&gt;&lt;code&gt;SQL&amp;gt; /&lt;br /&gt;Enter value for hdr_file: 4&lt;br /&gt;Enter value for hdr_block: 1552338&lt;br /&gt;&lt;br /&gt;OWNER      SEGMENT_NAME              SEGMENT_TY&lt;br /&gt;---------- ------------------------- ----------&lt;br /&gt;MG         SPECIAL_CASES_BOGUS       INDEX&lt;br /&gt;&lt;br /&gt;1 row selected.&lt;br /&gt;&lt;br /&gt;SQL&amp;gt; l&lt;br /&gt;  1  select owner, segment_name, segment_type&lt;br /&gt;  2  from   dba_segments&lt;br /&gt;  3  where  header_file=&amp;amp;hdr_file&lt;br /&gt;  4*   and  header_block=&amp;amp;hdr_block&lt;br /&gt;SQL&amp;gt; &lt;/code&gt;&lt;/pre&gt;&lt;div&gt;Fantastic! A row was returned, which means the popular buffer header is indeed the index's root block!&lt;br /&gt;&lt;br /&gt;So the initial diagnosis is complete and now we need a solution.&amp;nbsp;&lt;i&gt;A poor solution&lt;/i&gt; would be to increase the number of CBC latches. Adding CBC latches does indeed significantly help during a disperse CBC latch acquisition pattern because each latch covers fewer chains. But when there exists a singular CBC acquisition pattern, the additional latches won't help much. What we need to do is &lt;i&gt;somehow make the popular index root block buffer header less popular&lt;/i&gt;.&amp;nbsp;One solution is to hash partition the index, which effectively creates multiple root blocks because each partition has a root block. At this point we have our diagnosis and a solution.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This posting has a number of objectives:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;How to identify CBC child latch acquisition patterns; disperse or singular&lt;/li&gt;&lt;li&gt;How to determine the hot buffer headers related to specific CBC child latches&lt;/li&gt;&lt;li&gt;What information is needed to further diagnose the hot buffer header&lt;/li&gt;&lt;li&gt;Focus on the situation (diagnosis and solution) when the hot buffer header is an index root block&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;This is one of my more complicated (and perhaps confusing) posts because to really understand what I'm getting at,&amp;nbsp;you must have a good understanding of Oracle buffer cache internals.&lt;br /&gt;&lt;br /&gt;Personally, the hot index root block situation is particularly satisfying. It's a very real application of when the top wait event is "latch: cache buffer chains" simply increasing the number of CBC child latches will not significantly improver performance. But with just a couple of extra diagnostic steps, we can nail the core Oracle contention area.&lt;br /&gt;&lt;br /&gt;It took a while to get there but I hope you&amp;nbsp;&lt;a href="http://www.youtube.com/watch?v=S5hgIuv3LVE"&gt;enjoyed the ride&lt;/a&gt;!&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; &lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt; and &lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;. I teach these classes around the world multiple times each year. For the latest schedule, &lt;a href="http://resources.orapub.com/Default.asp"&gt;click here&lt;/a&gt;. I also offer on-site training and consulting services.&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-3749769762116478143?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/3749769762116478143/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/11/singular-cbc-latch-acquisition-pattern.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/3749769762116478143'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/3749769762116478143'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/11/singular-cbc-latch-acquisition-pattern.html' title='Singular CBC Latch Acquisition Pattern Diagnosis'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-3135643722426537684</id><published>2011-12-05T08:43:00.001-08:00</published><updated>2011-12-06T07:59:43.152-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='DBTA'/><category scheme='http://www.blogger.com/atom/ns#' term='database trends magazine'/><category scheme='http://www.blogger.com/atom/ns#' term='unit of work time based analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle performance firefighting'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle performance analysis'/><title type='text'>Important article published in DB Trends magazine</title><content type='html'>Greetings,&lt;br /&gt;&lt;br /&gt;This is an unusual post because it's rare my work is published by traditional media. The magazine &lt;i&gt;DB Trends and Applications&lt;/i&gt; (DBTA) published an article I wrote entitled, &lt;i&gt;Uniting Operations Research With Time-Based DB Performance Analysis&lt;/i&gt;. If you receive the printed magazine it's on page 28 of the December 2011 issue but you can also &lt;b&gt;&lt;a href="http://www.dbta.com/Articles/Columns/A-Wider-View/Uniting-Operations-Research-with-Time-Based-DB-Performance-Analysis-79153.aspx"&gt;read it on-line here&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;What's the article about? In the short article, I introduce&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;i&gt;Unit of Work Time Based Analysis as&amp;nbsp;&lt;/i&gt;&lt;i&gt;the intersection of Oracle performance firefighting and Oracle forecasting and predictive analysis.&lt;/i&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;This intersection is one of the main themes in my two day class, &lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;&lt;i&gt;Advanced Oracle Performance Analysis&lt;/i&gt;&lt;/a&gt;. Understanding and applying the content unlocks deep performance insights and allows you to active a much high level of performance analysis. Why? Because you can objectively compare various performance solutions both numerically and visually. And this comparison can be done from a very high and abstract level (think: pictures) down to a very detailed operations research mathematical level (think: formulas). It's powerful.&lt;br /&gt;&lt;br /&gt;It's no coincidence my three classes are similarly entitled:&amp;nbsp;&lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt;, &lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;, and finally &lt;a href="http://training.orapub.com/content_forecasting.asp"&gt;Oracle Forecasting &amp;amp; Predictive Analysis&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Because I rarely teach my forecasting class, I want to mention that I will be offering my &lt;i&gt;&lt;b&gt;Oracle Forecasting &amp;amp; Predictive Analysis&lt;/b&gt;&lt;/i&gt; class in &lt;a href="http://www.training.orapub.com/location.asp#FRA"&gt;Frankfurt, Germany, February 6 - 8, 2012&lt;/a&gt;. I just finished teaching this class in Santa Clara, CA last week and we all had a great time for sure!&lt;br /&gt;&lt;br /&gt;My firefighting courses will also be taught in &lt;a href="http://www.training.orapub.com/location.asp#SWE"&gt;Sweden the week of January 30&lt;/a&gt; and also the &lt;a href="http://www.training.orapub.com/location.asp#SFO"&gt;week of February 27 in Santa Clara, California&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I hope you enjoy the article! And if you have any questions, feel free to email me at orapub.general@orapub.com.&lt;br /&gt;&lt;br /&gt;All the best in your performance endeavors!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-3135643722426537684?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/3135643722426537684/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/12/important-article-published-in-db.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/3135643722426537684'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/3135643722426537684'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/12/important-article-published-in-db.html' title='Important article published in DB Trends magazine'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-8310536389047430843</id><published>2011-11-18T13:55:00.001-08:00</published><updated>2011-11-22T10:08:34.384-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='singular latch contention'/><category scheme='http://www.blogger.com/atom/ns#' term='latch acquisition pattern'/><category scheme='http://www.blogger.com/atom/ns#' term='cache buffer chain'/><category scheme='http://www.blogger.com/atom/ns#' term='latch access pattern'/><category scheme='http://www.blogger.com/atom/ns#' term='cbc'/><category scheme='http://www.blogger.com/atom/ns#' term='cache buffer chain latch contention'/><category scheme='http://www.blogger.com/atom/ns#' term='latch wait pattern'/><category scheme='http://www.blogger.com/atom/ns#' term='disperse latch contention'/><title type='text'>CBC Latch Diagnosis &amp; Acquisition Patterns</title><content type='html'>&lt;br /&gt;Cache buffer chain (CBC) latch contention is a common top Oracle wait event. There are a number of interrelated causes but also a number of solutions. The trick is to properly diagnose the problem which results in a short list of solutions.&lt;br /&gt;&lt;br /&gt;The CBCs are created as a hashing structure and are primarily used to determine if a block currently resides in the buffer cache. (&lt;a href="http://shallahamer-orapub.blogspot.com/2010/09/buffer-cache-visualization-and-tool.html"&gt;More-&amp;gt;&amp;gt;&lt;/a&gt;) As you can image, even the smallest Oracle systems ask, "Is a block in the buffer cache?" a ga-zillion times each day. If CBC access continues to intensify, at some point the time to acquire the desired CBC latch will be a performance problem.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Is there a CBC issue?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When CBC latch contention is raging, you're system is likely to have a crippling CPU bottleneck because the application SQL is concurrently and repeatedly checking if specific blocks reside in the buffer cache. If the answer to, "Is the block in the buffer cache?" is usually, "Yes" then IO reads are minimized and memory structure access is maximized...hence the crippling CPU bottleneck and CBC latch contention.&lt;br /&gt;&lt;br /&gt;The OraPub System Monitor (&lt;a href="http://resources.orapub.com/OSM_OraPub_System_Monitor_p/osm.htm"&gt;OSM&lt;/a&gt; and &lt;a href="http://osmtoolkit.com/"&gt;OSM&lt;/a&gt;) script I use to interactively determine overall time situation is &lt;b&gt;&lt;a href="http://osmtoolkit.com/rtpctx.sql.html"&gt;rtpctx.sql&lt;/a&gt;&lt;/b&gt;. Here's an example of a 707 second interval.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-edgdVKPBcp0/TsbVO59b9ZI/AAAAAAAAASw/iOeVoH4oUlo/s1600/rtsys+broad+SS.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="228" src="http://2.bp.blogspot.com/-edgdVKPBcp0/TsbVO59b9ZI/AAAAAAAAASw/iOeVoH4oUlo/s320/rtsys+broad+SS.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 1. Typical CBC latch contention result based on rtsysx.sql, response time report.&lt;/b&gt;&lt;/div&gt;&lt;br /&gt;In Figure 1, the &lt;b&gt;rtsysx.sql&lt;/b&gt; output the "% WT" column shows the percentage of wait time by wait event over the report interval. The "% RT" column shows the percentage of the total response time (CPU time and Wait time). The values in Figure 1 are typical when there is a very serious CBC latch contention issue. An AWR or Statspack report will tell a similar story; the top wait event being "latch: cache buffers chains" and most of the system's CPU resources being consumed by Oracle.&lt;br /&gt;&lt;br /&gt;While there are a number of causes for CBC latch contention, I tend to see &lt;i&gt;two CBC acquisition patterns&lt;/i&gt;. The first pattern is when many CBC latches are very active, that is, the access pattern is very &lt;i&gt;disperse&lt;/i&gt;. The second pattern when a &lt;i&gt;single&lt;/i&gt; CBC latch is very active. So once you know there is a significant CBC issue, the next step is to determine the acquisition pattern characteristic. Read on!&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Determining CBC Wait Pattern&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To determine the CBC wait pattern, you can run a very simple script like this:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;select p1 latch_addr,&lt;br /&gt;       p2 latch_#&lt;br /&gt;from   v$session&lt;br /&gt;where  status    = 'ACTIVE'&lt;br /&gt;  and  wait_time = 0&lt;br /&gt;  and  event     = 'latch: cache buffers chains'&lt;br /&gt;/&lt;br /&gt;&lt;br /&gt;LATCH_ADDR    LATCH_#&lt;br /&gt;---------- ----------&lt;br /&gt;1993834384        150&lt;br /&gt;1993834384        150&lt;br /&gt;1993834384        150&lt;br /&gt;1993834384        150&lt;br /&gt;&lt;br /&gt;4 rows selected.&lt;/code&gt;&lt;/pre&gt;Notice that all four sessions are sleeping (i.e., they are posting the wait event) while trying to acquire the same CBC latch (note latch address is identical). While the above snippet and the result are interesting, you could easily misled from this single sample. A more statically sound method is to gather multiple samples. Using my OSM script, &lt;b&gt;&lt;a href="http://osmtoolkit.com/latchchild.sql.html"&gt;latchchild.sql&lt;/a&gt;&lt;/b&gt; we can a sample each second to glean from statistical analysis. The &lt;b&gt;latchchild.sql&lt;/b&gt; script essentially does this:&lt;br /&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;def latch=150&lt;br /&gt;def sleep=300&lt;br /&gt;drop table op_interim&lt;br /&gt;/&lt;br /&gt;create table op_interim as&lt;br /&gt;select addr,gets,sleeps&lt;br /&gt;from   v$latch_children&lt;br /&gt;where  latch# = &amp;amp;latch&lt;br /&gt;/&lt;br /&gt;exec dbms_lock.sleep(&amp;amp;sleep);&lt;br /&gt;select t1.addr,&lt;br /&gt;       t1.gets-t0.gets delta_gets,&lt;br /&gt;       t1.sleeps-t0.sleeps delta_sleeps&lt;br /&gt;from   op_interim t0,&lt;br /&gt;       (&lt;br /&gt;         select addr,gets,sleeps&lt;br /&gt;         from   v$latch_children&lt;br /&gt;         where  latch# = &amp;amp;latch&lt;br /&gt;       ) t1&lt;br /&gt;where  t0.addr = t1.addr&lt;br /&gt;order by 3,2&lt;br /&gt;/&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;Below is some actual output. The "delta" columns are simply the difference between the beginning and ending values for &lt;b&gt;gets&lt;/b&gt; and &lt;b&gt;sleeps&lt;/b&gt;. Notice there is not a massive gap between the &lt;b&gt;delta_gets&lt;/b&gt; and &lt;b&gt;delta_sleeps&lt;/b&gt;&amp;nbsp; and there is not a single (or a few) latch that is significantly more active then the others.&amp;nbsp;This would be classified as&amp;nbsp;&lt;i&gt;dispersed CBC latch contention&lt;/i&gt;.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&lt;code&gt;ADDR             DELTA_GETS     DELTA_SLEEPS&lt;br /&gt;-------- ------------------ ----------------&lt;br /&gt;...&lt;br /&gt;76D1ABC4            30,2356                5&lt;br /&gt;76D23210            30,9631                5&lt;br /&gt;76C8413C            32,4284                5&lt;br /&gt;76C75428            23,2780                6&lt;br /&gt;76CFCFAC            24,7324                6&lt;br /&gt;76DB79B0            24,7332                6&lt;br /&gt;76BFE7A8            25,3808                6&lt;br /&gt;76DB9C98            28,1330                6&lt;br /&gt;76C534AC            32,3395                6&lt;br /&gt;76D0BCC0            33,7938                6&lt;br /&gt;76C17DF0            20,3694                7&lt;br /&gt;76C04DF4            24,0050                7&lt;br /&gt;76DE64CC            29,5872                7&lt;br /&gt;76BF5EF0            23,2782                8&lt;br /&gt;76D05864            27,4886                8&lt;br /&gt;&lt;br /&gt;1024 rows selected.&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;To really grasp the situation, a visual histogram based on the sleeps is very useful.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-dFmclG9Kge0/TsbXqniw4vI/AAAAAAAAAS4/2JpZfr6-4PM/s1600/Basic+Stats+1c+broad.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="190" src="http://1.bp.blogspot.com/-dFmclG9Kge0/TsbXqniw4vI/AAAAAAAAAS4/2JpZfr6-4PM/s320/Basic+Stats+1c+broad.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 2. Histogram of the number of CBC latch address and their respective sleep activity.&lt;/b&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;Figure 2 is a standard histogram I copied from the statistical analysis Mathematica notepad (you can download below). Just over 600 CBC latches have zero sleeps while only two CBC latches had eight sleeps.&amp;nbsp;Notice that while there are differences in the number of sleeps, we don't see a pattern with a massive jump like; 0,0,0,1,2,3,5,6,7,1021. Again, this is an example of &lt;i&gt;dispersed CBC latch contention&lt;/i&gt;. But sometimes the situation is not very dispersed, but singular towards just a couple or perhaps even one single CBC latch!&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-4z336anP-2Y/TsbYHAoWckI/AAAAAAAAATE/x-j8Ip_nGpA/s1600/Basic+Stats+1c+single.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="190" src="http://3.bp.blogspot.com/-4z336anP-2Y/TsbYHAoWckI/AAAAAAAAATE/x-j8Ip_nGpA/s320/Basic+Stats+1c+single.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 3.&amp;nbsp;&lt;/b&gt;&lt;b&gt;Histogram of the number of CBC latch address and their respective sleep activity.&lt;/b&gt;&lt;/div&gt;Figure 3 is the result, as we'll see, of three very popular buffers which are each related to three different three buffer chains.&amp;nbsp;While nearly 1000 CBC latches have zero sleeps (far left vertical bar in histogram), just as important is there is an obvious large &lt;b&gt;delta_sleeps&lt;/b&gt; gap near the most active&amp;nbsp;&lt;i&gt;delta_sleeps&lt;/i&gt; CBC latches. For some people, the histogram tells a better story, but other others the below numeric snippet better captures the situation. (I personally like to use both.)&lt;br /&gt;&lt;pre&gt;&lt;code&gt;ADDR             DELTA_GETS     DELTA_SLEEPS&lt;br /&gt;-------- ------------------ ----------------&lt;br /&gt;...&lt;br /&gt;76C33818            19,8038                3&lt;br /&gt;76C644A8             8,1535                4&lt;br /&gt;76C7759C             9,6993                4&lt;br /&gt;76C2D1CC            14,5096                4&lt;br /&gt;76E03FEC            14,9355                4&lt;br /&gt;76CDD394            16,0718                4&lt;br /&gt;76C68904            18,2300                4&lt;br /&gt;76D69374             6,5250                5&lt;br /&gt;76D7800C            13,3134                5&lt;br /&gt;76DA2650            15,6578                5&lt;br /&gt;76C02B88            15,8293                5&lt;br /&gt;76CE59E0            14,5169                7&lt;br /&gt;76C7B9F8            15,8243                7&lt;br /&gt;76CFF120             6,5228                9&lt;br /&gt;76CFCF30            14,8187               11&lt;br /&gt;76D38668            96,6345               62&lt;br /&gt;76CDF508            98,1384               96&lt;br /&gt;76DAF26C          1,96,2752              187&lt;br /&gt;&lt;br /&gt;1024 rows selected.&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;What is important to us is that there is a substantial &lt;i&gt;delta_sleeps&lt;/i&gt; gap separating the top three CBC latches from the rest of the pack.&lt;br /&gt;&lt;br /&gt;I'm hoping you can see the differences in these access patterns. I would suggest running the &lt;b&gt;&lt;a href="http://osmtoolkit.com/latchchild.sql.html"&gt;latchchild.sql &lt;/a&gt;&lt;/b&gt;script on one of your systems to see this for yourself.&lt;br /&gt;&lt;br /&gt;If you want to see the &lt;b&gt;latchchild.sql&lt;/b&gt; output (latchchild.txt) and the statistical details for above CBC activity, here are the links:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Disperse CBC latch contention (&lt;a href="http://filezone.orapub.com/Research/201111_CbcPatterns/lcDisperseFull.txt"&gt;latchchild.txt&lt;/a&gt;, &lt;a href="http://filezone.orapub.com/Research/201111_CbcPatterns/BasicStats1cDisperse.pdf"&gt;PDF&lt;/a&gt;, &lt;a href="http://filezone.orapub.com/Research/201111_CbcPatterns/BasicStats1cDisperse.nb"&gt;Mathematica Notepad&lt;/a&gt;)&lt;/li&gt;&lt;li&gt;Singular CBC latch contention&amp;nbsp;(&lt;a href="http://filezone.orapub.com/Research/201111_CbcPatterns/lcSingularFull.txt"&gt;latchchild.txt&lt;/a&gt;, &lt;a href="http://filezone.orapub.com/Research/201111_CbcPatterns/BasicStats1cSingular.pdf"&gt;PDF&lt;/a&gt;, &lt;a href="http://filezone.orapub.com/Research/201111_CbcPatterns/BasicStats1cSingular.nb"&gt;Mathematica Notepad&lt;/a&gt;)&lt;/li&gt;&lt;/ol&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Solutions for Disperse CBC Latch Contention&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Figure 2 and the code snippet directly above Figure 2 show a typical example of what you'll likely see when many CBC latches are active enough to cause a significant performance problem. Here's the likely situation: There is an intense CBC latch situation along with a raging CPU bottleneck, and you can probably easily see the heavy logical IO (sysstat.session logical IO) SQL as well. There are a number of solutions, with some of them listed below.&lt;br /&gt;&lt;br /&gt;An &lt;b&gt;Oracle focused&lt;/b&gt; solution is to increase the number of CBC latches by increasing the hidden instance parameter, &lt;b&gt;&lt;a href="http://osmtoolkit.com/ipcbc.sql.html"&gt;_db_block_hash_latches&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;An &lt;b&gt;application focused&lt;/b&gt; solution is to find the most logical IO intensive SQL and reduce the LIO's by executing it less often or tuning it. Either way, your objective is to reduce the LIOs generated during times of critical performance.&lt;br /&gt;&lt;br /&gt;An &lt;b&gt;operating system&lt;/b&gt; focused solution is to increase CPU resources by removing CPU consuming process if possible, adding more CPU cores, or increasing CPU speed.&lt;br /&gt;&lt;br /&gt;There are of course other solutions, but I think you get the idea.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Solutions for Singular CBC Latch Contention&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Figure 3 and the output snippet directly below it are typical when there is intense CBC contention focused on one or perhaps a few CBC latches. When this is the situation, additional diagnosis is needed to determine specifically why the intense singular activity is occuring. &lt;i&gt;This is the topic of my next posting...&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The Cache buffer chain (CBC) structure is used to answer the question, "Is this block in the buffer cache?" At some point, this question can get asked enough to cause significant performance problems known as CBC latch contention.&amp;nbsp;While there are a number of causes for CBC latch contention, I tend to see two CBC acquisition patterns: The first pattern is when many CBC latches are very active, that is, the access pattern is very disperse. The second pattern when a single CBC latch is very active. So once you know there is a significant CBC issue, the next step is to determine the&amp;nbsp;acquisition&amp;nbsp;pattern characteristic.&lt;br /&gt;&lt;br /&gt;In this posting I focused on how to determine the CBC latch contention&amp;nbsp;acquisition&amp;nbsp;pattern; disperse or singular. I then presented some disperse CBC latch contention solutions. In my next posting I'll focus on additional steps to diagnose singular CBC latch contention, two common situations, and some possible solutions.&lt;br /&gt;&lt;br /&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; &lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt; and &lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;. I teach these classes around the world multiple times each year. For the latest schedule, &lt;a href="http://resources.orapub.com/Default.asp"&gt;click here&lt;/a&gt;. I also offer on-site training and consulting services.&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-8310536389047430843?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/8310536389047430843/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/11/cbc-latch-diagnosis-acquisition.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/8310536389047430843'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/8310536389047430843'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/11/cbc-latch-diagnosis-acquisition.html' title='CBC Latch Diagnosis &amp; Acquisition Patterns'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-edgdVKPBcp0/TsbVO59b9ZI/AAAAAAAAASw/iOeVoH4oUlo/s72-c/rtsys+broad+SS.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-9089583009045565800</id><published>2011-11-11T13:15:00.001-08:00</published><updated>2011-11-14T10:01:23.740-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='index dump'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle performance'/><category scheme='http://www.blogger.com/atom/ns#' term='segment header'/><category scheme='http://www.blogger.com/atom/ns#' term='hot block'/><category scheme='http://www.blogger.com/atom/ns#' term='cache buffer chain'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle index'/><category scheme='http://www.blogger.com/atom/ns#' term='cache buffer chain latch contention'/><category scheme='http://www.blogger.com/atom/ns#' term='index root block'/><category scheme='http://www.blogger.com/atom/ns#' term='hot buffer'/><title type='text'>Are you sure it's the index root block?</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Situation&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Suppose you want to check if a specific Oracle block is an index root block. Why? Here are two very real situations. You notice a specific block is very active and want to know if it's an index root block. Even more common is, perhaps there is a very active cache buffer chain latch related to a specific block/buffer and you want to know if this hot buffer is an index root block. Besides these very real examples, it's also an interesting journey into Oracle internals!&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Folklore States...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Some very respectable blogs and a simple test I ran indicate an index root block is the block after it's segment header block.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-1l5bjMITY9o/TsFUaqD1MJI/AAAAAAAAASo/qgHxRNjaAM4/s1600/Index+Seg+Pic.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="222" src="http://4.bp.blogspot.com/-1l5bjMITY9o/TsFUaqD1MJI/AAAAAAAAASo/qgHxRNjaAM4/s320/Index+Seg+Pic.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;i&gt;Figure 1. Diagram of an Oracle index segment, highlighting the index root block.&lt;/i&gt;&lt;/div&gt;Figure 1 is a diagram of an Oracle index segment. If it wasn't for the index root block, Figure 1 would be a good diagram for any Oracle segment. The light blue colored block is the segment header block. Notice the orange colored index root block follows the segment header bock.&lt;br /&gt;&lt;br /&gt;As mentioned above, folklore says if the segment is indeed an index, then the orange block will be the index root block.&amp;nbsp;And not just now, but&lt;i&gt; for the life of the index&lt;/i&gt;! Wow...&amp;nbsp;This is a pretty strong statement and one that needs to be tested. So that's what I did and what this posting is all about.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;It's Kind of Complicated&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We need to determine if the block following an index segment header block is the index root block... for always and forever until the index is dropped. First, just dump the index and locate the root block's data block address (DBA). Second, get the DBA for the block following the index segment header block. And finally, compare them. If they match, then we have shown a situation where the block following the index segment header block is indeed the index root block.&amp;nbsp;So let's do that.&lt;br /&gt;&lt;br /&gt;Once we get the &lt;b&gt;object_id&lt;/b&gt; from &lt;b&gt;dba_segments&lt;/b&gt;, here's how to dump an index:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;alter session set events 'immediate trace name treedump level :ObjectId';&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;And here's a snippet of the trace file from near the top.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;...&lt;br /&gt;----- begin tree dump&lt;br /&gt;branch: 0x4c5461 5002337 (0: nrow: 8, level: 2)&lt;br /&gt;   branch: 0x4c575e 5003102 (-1: nrow: 141, level: 1)&lt;br /&gt;      leaf: 0x4c5462 5002338 (-1: nrow: 96 rrow: 96)&lt;br /&gt;      leaf: 0x4c63b7 5006263 (0: nrow: 78 rrow: 78)&lt;br /&gt;...     &lt;br /&gt;      leaf: 0x4c554d 5002573 (139: nrow: 100 rrow: 100)&lt;br /&gt;   branch: 0x4c63c7 5006279 (0: nrow: 213, level: 1)&lt;br /&gt;      leaf: 0x4c629d 5005981 (-1: nrow: 88 rrow: 88)&lt;br /&gt;      leaf: 0x4c554e 5002574 (0: nrow: 60 rrow: 60)&lt;br /&gt;      leaf: 0x4c62a0 5005984 (1: nrow: 54 rrow: 54)&lt;br /&gt;...&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;The first/top mentioned "branch" block is the index's root block. In this case, the index root block has a data block address (DBA) of 5002337. Now let's get the data block address for the block after the index's segment header block. But first we need to get the file number and block number of the index segment header block.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;SQL&amp;gt; select header_file, header_block&lt;br /&gt;  2  from dba_segments&lt;br /&gt;  3  where segment_name = 'CH_6_IRB_I';&lt;br /&gt;&lt;br /&gt;HEADER_FILE HEADER_BLOCK&lt;br /&gt;----------- ------------&lt;br /&gt;          1       808032&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;Now let's get the data block address (DBA) for the block just following the header block. We must remember to add one to the header block number, so the block number we are interested in is 808033.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;SQL&amp;gt; select dbms_utility.make_data_block_address(1,808033) from dual;&lt;br /&gt;&lt;br /&gt;DBMS_UTILITY.MAKE_DATA_BLOCK_ADDRESS(1,808033)&lt;br /&gt;----------------------------------------------&lt;br /&gt;                                       5002337&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;Do you see it? The DBA just above (with the header block + 1) matches the first/top "branch" block's DBA&amp;nbsp;(5002337)&amp;nbsp;from the index trace file! So now we know how to check if the block following the index's segment header block is truly the index root block.&lt;br /&gt;&lt;br /&gt;Now the question becomes, does it &lt;i&gt;&lt;b&gt;always&lt;/b&gt;&lt;/i&gt; remain this way?&amp;nbsp;For example, what if create the table, create the index, and then insert rows into the table? Or what if we create the table, then insert rows, and finally create the index? If that's not enough, how about this: What if the index grows and splits? Or how about if we delete all the table's rows, insert rows until the index splits? Or how about if it we truncate the related table? As you can see, there are an infinite number of possibilities and there is no way we can test all of them.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Experimental Setup&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I created a number of tests that could be repeatedly run and easily modified and extended. There are two related scripts. The driving script is a SQL script called, &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201111_IdxRtBlk/doRbExpr.sql"&gt;doRbExpr.sql&lt;/a&gt;&lt;/b&gt; and takes a single argument, called the &lt;i&gt;prefix&lt;/i&gt;. This &lt;i&gt;prefix&lt;/i&gt; is the begining name of all the objects the script creates. This allows you to quickly and easily re-run the script without first removing all the objects from the previous run. The second script, &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201111_IdxRtBlk/getIdxRtBlk.sql"&gt;getIdxRtBlk.sql&lt;/a&gt;&lt;/b&gt;, retrieves the index root block's DBA from both the data dictionary and by dumping the index, and then nicely displays them so you can easily see if there is a difference. I also show the index depth (blevel) as an added test to help ensure I'm looking at the current statistics.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Experimental Results&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201111_IdxRtBlk/doRbExpr.txt"&gt;Click here to see the results&lt;/a&gt;&lt;/b&gt;. As you can see, in every case the DBA of the index segment header block plus one, matches the index trace file's root block. I have rerun this test many times, and the results are always the same.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;What Does This Prove?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Actually the experiments prove very little, yet they yield a tremendous value. The experiments clearly and repeatedly demonstrate that I have &lt;b&gt;&lt;i&gt;not&lt;/i&gt;&lt;/b&gt; found a way to disprove an index root block is the block immediately following its segment header block. All it would take is just one of my experiments to break the "block after" rule... but I could not break the rule! &amp;nbsp;If you can devise a situation to break the rule, please let me know and I'll post it.&lt;br /&gt;&lt;br /&gt;So next time you need to check if a particular block is an index root block, simply get it's segment header file and block number, add one to the block number, and compare. In my opinion, that's much easier and faster than dumping the index, parsing it, etc.&lt;br /&gt;&lt;br /&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; &lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt; and &lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;. I teach these classes around the world multiple times each year. For the latest schedule, &lt;a href="http://resources.orapub.com/Default.asp"&gt;click here&lt;/a&gt;. I also offer on-site training and consulting services.&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-9089583009045565800?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/9089583009045565800/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/11/are-you-sure-its-index-root-block.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/9089583009045565800'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/9089583009045565800'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/11/are-you-sure-its-index-root-block.html' title='Are you sure it&apos;s the index root block?'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-1l5bjMITY9o/TsFUaqD1MJI/AAAAAAAAASo/qgHxRNjaAM4/s72-c/Index+Seg+Pic.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-6682657574196543606</id><published>2011-10-11T18:31:00.000-07:00</published><updated>2011-10-11T18:31:00.591-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='wait event times'/><category scheme='http://www.blogger.com/atom/ns#' term='skew'/><category scheme='http://www.blogger.com/atom/ns#' term='OraPub Wait Event Distribution Analysis Tool'/><category scheme='http://www.blogger.com/atom/ns#' term='log normal distribution'/><category scheme='http://www.blogger.com/atom/ns#' term='v$event_histogram'/><title type='text'>Understanding Wait Event Time Patterns</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: large;"&gt;What's The Point?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If the average&amp;nbsp;&lt;b&gt;db file sequential read&lt;/b&gt;&amp;nbsp;is 20ms, it is likely the typical value is something more like 5ms.&amp;nbsp;If you look at your system's wait event time distributions, you will see (details below) that an &lt;i&gt;average&lt;/i&gt; Oracle wait event time is not likely to be the &lt;i&gt;typical&lt;/i&gt; wait time and the distribution of wait times is not likely to be normally distributed. So when you approach your IO team about the 20ms sequential read time they may look at you strangely and can honestly say, "When we watch the IO times, we see nothing like this." This posting takes a closer look at the typical wait event time, wait event time distributions/patterns, how YOU can determine the typical wait times (plus other statistical data), and plot a histogram... all based on a standard Statspack or AWR report. ...good stuff!&lt;br /&gt;&lt;br /&gt;I first need to mention that many Oracle IO requests do not result in a wait event occurrence. I blogged about this in my "&lt;a href="http://shallahamer-orapub.blogspot.com/2011/01/io-read-wait-occurrence-mismatch-part-2.html"&gt;IO Read Wait Occurrence Mismatch - Part 2&lt;/a&gt;" posting back in January of 2011.&amp;nbsp;So that in itself can be enough to drive a wedge between you and the IO team.&lt;br /&gt;&lt;br /&gt;In this posting, I'm going to focus on wait event times, not why the IO subsystem response times may not match Oracle IO related wait event times. It's an interesting discussion for sure, but not the focus of this posting.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Journey&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Way back on November 20, 2010 I posted an entry entitled, &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2010/11/average-challenge-part-1.html"&gt;The Average Challenge... Part 1&lt;/a&gt;&lt;/b&gt; focused on why using the statistical average to describe common Oracle performance happenings can lead to a gross misunderstanding of reality. To demonstrate my point, I operating system traced (Linux: &lt;b&gt;strace&lt;/b&gt;) server processes, the log writer, and the database writer. I created histograms based on the data collected and posted those graphics in the blog entry. In this posting, I will present below how &lt;b&gt;you&lt;/b&gt; can do the same thing using data from &lt;b&gt;v$event_histgram&lt;/b&gt; and from an AWR or Statspack report.&lt;br /&gt;&lt;br /&gt;The challenge is, averages are usually reported to us and they are easy to calculate, so we tend to use them. Worse though, is when we say something like, "The average multi-block read wait time is 20ms." most people immediately assume most values are pretty close to 20ms and it's just as likely a value will be greater than the average then less than average. That is far from reality. Not even close. This means we are effectively misleading people by failing to communicate properly. And as a consultant and a teacher, it really really really bothers me when I'm not being clear or correctly understood.&lt;br /&gt;&lt;br /&gt;All this thinking about averages led me on a fantastic quest into some very interesting and surprisingly practical study. First, I needed to gain a better understanding of statistical distributions. To document my research, I posted a blog entry entitled, &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html"&gt;Important Statistical Distributions...Really&lt;/a&gt;&lt;/b&gt;.&amp;nbsp;I also spent some time investigating &lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/sql-statement-elapsed-times.html"&gt;"average" SQL elapsed times&lt;/a&gt; and also &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/sql-arrival-patterns-and-impact.html"&gt;SQL arrival rate patterns&lt;/a&gt;&lt;/b&gt;. The results where stunning, disappointing, and enlightening all at once! Currently, I am presenting some of the results of this research in my conference presentation, &lt;b&gt;&lt;a href="http://resources.orapub.com/SearchResults.asp?Search=sql+elapsed"&gt;SQL Elapsed Time Analysis&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;At this point in this journey, it's time to blog about wait times. Not a specific wait event, but the actual wait times and how to describe what's actually occurring. Knowing that saying the average wait time is Xms can be seriously misunderstood, the practicality of this is huge. I am no longer satisfied in speaking about averages. We need to get a much better understanding of what the typical value is and understand the wait time pattern (i.e., distribution). &lt;u&gt;That's exactly what this blog entry is all about; understanding wait time patterns, how to get good data, understanding the data, how to numerically and visually show the data, and how to better communicate what the data actually means.&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Data Source&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Oracle does a us a big favor by capturing wait event times and placing them into buckets or bins of a specific time range. For example, there are bins from 0ms to 1ms and from &amp;gt;1ms to 2ms, from &amp;gt;2ms to 4ms, and on up in powers of two. The good news is the data is being automatically captured, but the bad news is:&lt;br /&gt;&lt;br /&gt;1. The bin size can be quite large. For example, a bin size of &amp;gt;8ms to 16ms. That's a bin size of 8ms! Plus much of the really interesting times center around 10ms.&lt;br /&gt;&lt;br /&gt;2. The bin sizes change, that is, they increase in powers of two. This makes visualizing the distribution (think: histogram) of the wait times extremely difficult for us humans. That's why I created a tool to help us accurately visualize the data... more below.&lt;br /&gt;&lt;br /&gt;For every wait event, Oracle stores the wait event occurrence in the 10g performance view &lt;b&gt;v$event_histogram&lt;/b&gt;. The view is very straightforward here are a couple of links (&lt;b&gt;&lt;a href="http://osmtoolkit.com/swhist.sql.html"&gt;swhist.sql &lt;/a&gt;&lt;/b&gt;and &lt;b&gt;&lt;a href="http://osmtoolkit.com/swhistx.sql.html"&gt;swhistx.sql&lt;/a&gt;&lt;/b&gt;) &amp;nbsp;to tools in my &lt;b&gt;&lt;a href="http://osmtoolkit.com/"&gt;OSM Toolkit&lt;/a&gt;&lt;/b&gt; that pull directly from this view. In addition to directly pulling from the source &lt;b&gt;v$event_histogram&lt;/b&gt; view, more recent Statspack and AWR reports provide this information in varying levels of completeness. The formatting is a little bizarre, but with a little practice you can grab the histogram bin values necessary to create the statistics and the histogram we need.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/--qhXzCn_Iz8/TpIC4A9lKgI/AAAAAAAAAR8/0aqZK7wPN4c/s1600/WE+Hist+All+SS.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/--qhXzCn_Iz8/TpIC4A9lKgI/AAAAAAAAAR8/0aqZK7wPN4c/s320/WE+Hist+All+SS.jpg" width="206" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 1. Example of Oracle wait time occurrences placed in time-based bins from an AWR report.&lt;/b&gt;&lt;/div&gt;&lt;br /&gt;Figure 1 above is an example of &lt;b&gt;v$event_histogram&lt;/b&gt; data for the wait event, &lt;b&gt;cursor: pin S wait on X&lt;/b&gt;. (my ref: 20110425_0800_0900.html) &amp;nbsp;If you look at the top part of the Figure 1 you'll notice the average wait time is 5,111ms, which is 5.111 seconds! So what is the typical value and what does the distribution look like? To answer these questions, I created a &lt;i&gt;Mathematica&lt;/i&gt; based tool that uses the bin inputs from &lt;b&gt;v$event_histogram&lt;/b&gt;, Statspack, or an AWR report. How to get the tool and use the tool is explained next.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Finding the Typical Value and Plotting the Histogram&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I created a nice little tool called, &lt;i&gt;OraPub's Wait Event Time Distribution Analysis Tool&lt;/i&gt; which can be downloaded and used for free &lt;b&gt;&lt;a href="http://resources.orapub.com/SearchResults.asp?Search=wait+event+distribution+tool"&gt;HERE&lt;/a&gt;&lt;/b&gt;. The interface is not as clean as I'd like, but to make the tool freely available without you having to license &lt;i&gt;Mathematica&lt;/i&gt;, this is what I had to do.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Retrieving the Data from the AWR report&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The AWR event histogram data is provided in percentages. For example, if you look closely at Figure 1, you'll see that 6.4% of the wait event occurrences for the &lt;b&gt;cursor: pin S wait on X&lt;/b&gt; were between 8ms and &amp;lt;= 16ms. It's a little strange the first few times you retrieve the wait time data, but you'll get the hang of it. For the event, &lt;b&gt;cursor: pin S wait on X&lt;/b&gt; in Figure 1, the bin data is as follows (0ms to &amp;lt;=1ms, 1ms to &amp;lt;=2ms, 2ms &amp;lt;=4ms, ..., 4096ms to &amp;lt;=8192): 0, 0, 0, 0, 6.4, 19.4, 7.9, 3.7, 5.9, 8.1, 3.2, 1.1, 2.3, 5.0, 33.7, &amp;nbsp;and 3.3. Remember that these values are percentages and should add up to 100, that is, 100%. The situation shown in Figure 1 is indeed an unusual distribution of values. Most wait event occurrences tend to be heavy near the beginning (e.g., 1ms) and extremely light at the end (e.g., 8sec).&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Using the Analysis Tool&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Data Entry&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Figure 2 below shows most of the data entry area expanded along with the data entered. This version of my tool doesn't provide entry for the final 3.3% of the wait occurrences, which are wait times between 8192ms and &amp;lt;= 16384ms.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-unSfBSA2ukA/TpIJ2_OwbnI/AAAAAAAAASA/X3B1HNfiq3U/s1600/Tool+SS+1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://2.bp.blogspot.com/-unSfBSA2ukA/TpIJ2_OwbnI/AAAAAAAAASA/X3B1HNfiq3U/s320/Tool+SS+1.png" width="278" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 2. Partial data entry area of analysis tool.&lt;/b&gt;&lt;/div&gt;Figure 2 above shows the upper portion of the data entry area of the tool. Rather than using the slider bar, data entry is much easier if you click the "+" sign for all the entries areas and then actually type in the values. Every time you press the "tab" key the tool will recalculate, so try to avoid this. For the &lt;b&gt;Total Waits&lt;/b&gt; entry, start with 100 and then go to 500 or 1000 for the final graphs. Do NOT enter the actual number of waits that occurred as this will take too long to render the histograms. Plus once you get over a couple hundreds samples, the results will be basically the same.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Statistical Output&lt;/b&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-64n-8HQxJrs/TpILBzdRN-I/AAAAAAAAASE/fffd5lg7YIg/s1600/Tool+SS+2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://4.bp.blogspot.com/-64n-8HQxJrs/TpILBzdRN-I/AAAAAAAAASE/fffd5lg7YIg/s320/Tool+SS+2.png" width="188" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 3. Statistical data and three of the possible five histograms based on Figure 1 and Figure 2 data.&lt;/b&gt;&lt;/div&gt;Figure 3 above shows some key statistics (e.g., median) and three of the five possible histograms. The first two histograms did not render. The first &lt;i&gt;$Aborted&lt;/i&gt; histogram has the bin size set to 1ms and it was not possible to show over 8000 histogram bars! The second &lt;i&gt;$Aborted&lt;/i&gt; histogram has the bin size set to 2ms and again, it was not possible to show over 5000 histogram bars.&lt;br /&gt;&lt;br /&gt;Let's look at the resulting statistics. Figure 3 shows the following statistics:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Tot Pct&lt;/b&gt; is the total percentage of wait time entered. You may recall the tool did not have a data entry field of wait times between 8 and 16 seconds, which in this case was 3.3% of the values. Since the wait event percentages we entered adds up about 97%, it appears we entered the data correctly.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Average&lt;/b&gt; is the statistical mean. This should be close to the average wait time reported in the AWR report. The AWR reports shows the average wait time is 5111ms and the data I entered combined with the limitation in bin details, the tool determined the average was 4792ms. That's a 6% difference. Considering the crazy event time dispersion, that doesn't bother me.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Median&lt;/b&gt; is the statistical median, which when there is a single mode (i.e., histogram peak), is usually the typical value. The value is 413ms, which is over 4 seconds &lt;i&gt;&lt;u&gt;less&lt;/u&gt;&lt;/i&gt; then than the average! Wow, what a difference and this is why taking the time to do this can be very worthwhile! I will write more about this below.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Std Dev&lt;/b&gt; is the standard deviation. This is gives us an idea of the data dispersion, but when the data is not normally distributed (think: bell curve), which this data set not, this can be very valuable in communicating the data value dispersion.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;p-Value&lt;/b&gt; is the statistical p-Value comparing the data with a normal distribution. Basically, if the p-Value is greater than 0.05 then the data is likely normally distributed (think: bell curve). I have yet to see wait event times be normally distributed. It's no surprise then, even at 5 decimal places (which you can't tell from the display), the p-Value rounds to zero.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Histograms&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Figure 3 above also shows the resulting histograms. Usually, five histograms will plot. The first three are standard histograms, but with different bin sizes. The first and second, which aborted, I set bin sizes of 1ms and 2ms respectively. Usually, it is useful to see the distribution at this level of detail. Usually, the&amp;nbsp;wait event times are relatively short and I want to see the details at the short duration times, hence my setting the bin sizes to 1ms and 2ms.&amp;nbsp;However, in this case the interesting times are much larger than one or two milliseconds. The third histogram &lt;i&gt;Mathematica&lt;/i&gt; automatically sets the bin size. Usually all three histograms are created.&lt;br /&gt;&lt;br /&gt;The fourth histogram is a &lt;i&gt;Probability Histogram&lt;/i&gt;. In Figure 3, the Probability Histogram tells us that about 60% of the values occurred within the first bin, which is between 0ms and &amp;lt;= 1,667ms (5000ms/3 bars), that is between 0 and &amp;lt;= 1.7 second.&lt;br /&gt;&lt;br /&gt;The fifth histogram is the &lt;i&gt;Cumulative Count Histogram&lt;/i&gt;. This is just another way to get an understanding of how the data is distributed; based on the number of wait occurrences, not the percentage.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Analysis of Our Data&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Now let's use the tool to learn something interesting, and yes useful, about the &lt;b&gt;cursor: pin S wait on X&lt;/b&gt; wait times. First, notice the mean and median are very different. This is typical. It is common for the median (usually the "typical value") to be less than half of the average. In this case it is a factor &amp;nbsp;of 10. The average is around 5 seconds and the typical value is around 1/2 second... a massive difference.&lt;br /&gt;&lt;br /&gt;This is important: Notice that the AWR report (shown in Figure 1) and the subsequent data entered into the tool (not shown in Figure 2) shows that 33.7% of the wait time occurrences are between 8 and 16 seconds. In our minds, it is common to think there is a massive buildup around 8 to 16 seconds. But there is not! We need to remember that 33.7% of the wait occurrences are &lt;i&gt;spread, that is dispersed, over 8 seconds&lt;/i&gt;! Not 1ms, 2ms, or 4ms as with near the left portion of the histogram. This is why it is so important to SEE the data in a histogram. And the difference between the mean and median also helps accentuate this.&lt;br /&gt;&lt;br /&gt;Sixty percent of the wait occurrences occur within 1ms. That's pretty good. However, what is disturbing is if the wait does not stop at 1ms, it can end up lasting over 2 seconds. This indicates there is a problem because the way Oracle latches and mutexes are designed to work is for most of the sleeps (that is what mutex/latch wait time is, "sleep" time) to quickly reduce as the wait time increases. I don't want to get into Oracle serialization and mutex internals, but I will say this is a problem that should not be ignored. (I realize that was a gross understatement.)&lt;br /&gt;&lt;br /&gt;To summarize, instead of saying, "The average wait time is about 5 seconds and we need to look into this." which horribly simplifies the complexity of the situation, we could say something like this: While CPU consumption and single block buffer IO reads must first be addressed, there is an unusual and potentially significant issue regarding library cache serialization. If it weren't troubling enough the average wait time is around 5 seconds, typically 60% of the time the waits are less than 1ms, yet if the wait is not satisfied within 1ms, the wait time can easily be between 8 to 16 seconds. This intense "get it now or never" situation can result in an extremely volatile performance situation and could be the result of an Oracle mutex bug..." Oh yeah... I would also show the standard histogram as well. (That's the first histogram shown in Figure 3.)&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;In Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is just one example where Oracle wait event times are not normally distributed and therefore speaking in terms of "average" miscommunicates what is really occurring. When you need to have a deeper understanding of the timing situation and visually see the situation, you can use &lt;b&gt;v$event_histogram&lt;/b&gt;&amp;nbsp;data combined with &lt;b&gt;&lt;a href="http://resources.orapub.com/SearchResults.asp?Search=wait+event+distribution+tool"&gt;OraPub's Wait Event Time Distribution Analysis Tool&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;Have fun and all the best in your Oracle performance endeavors!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; &lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt; and &lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;. I teach these classes around the world multiple times each year. For the latest schedule, &lt;a href="http://resources.orapub.com/Default.asp"&gt;click here&lt;/a&gt;. I also offer on-site training and consulting services.&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-6682657574196543606?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/6682657574196543606/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/10/understanding-wait-event-time-patterns.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/6682657574196543606'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/6682657574196543606'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/10/understanding-wait-event-time-patterns.html' title='Understanding Wait Event Time Patterns'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/--qhXzCn_Iz8/TpIC4A9lKgI/AAAAAAAAAR8/0aqZK7wPN4c/s72-c/WE+Hist+All+SS.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-4757888390875908319</id><published>2011-09-07T03:13:00.000-07:00</published><updated>2011-09-07T03:13:00.185-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sql elapsed time'/><category scheme='http://www.blogger.com/atom/ns#' term='quantitative model'/><category scheme='http://www.blogger.com/atom/ns#' term='unit of work time based analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='time per work'/><category scheme='http://www.blogger.com/atom/ns#' term='response time'/><title type='text'>Anticipating SQL Elapsed Times</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: large;"&gt;Getting More Practical&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Many of my postings can be considered quite theoretical. But in this posting I'm going to make a clear application of quantitatively modeling response time and understanding why Oracle systems behave as Operations Research queuing theory state (which was &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/08/why-tuning-oracle-works-and-modeling-it.html"&gt;the topic of my previous posting&lt;/a&gt;&lt;/b&gt;).&lt;br /&gt;&lt;br /&gt;In this posting I'm building from two recent posts. In July I demonstrated that &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/07/cbc-latches-cpu-consumption-and-wait.html"&gt;tuning Oracle can reduce the CPU and the total time it takes to process a single LIO&lt;/a&gt;&lt;/b&gt;. In August I demonstrated that a &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/08/why-tuning-oracle-works-and-modeling-it.html"&gt;standard Operations Research queuing theory quantitative model can be used to anticipate the time it takes to process a single LIO&lt;/a&gt;&lt;/b&gt;. Now it's time to go from micro to macro!&lt;br /&gt;&lt;br /&gt;In this posting I will develop a simple SQL statement elapsed time model that takes two input parameters; the time to process a single piece of work (e.g., 1 ms per logical IO) and also the number of pieces of work to process (e.g., 1000 logical IOs). As you will see, what we know from our personal experience will play out in this quantitative model; adding confidence that Oracle systems do in fact behave as Operations Research queuing theory would have us believe.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Very Simple SQL Elapsed Time Model&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In my July posting, I introduced a very simple SQL statement elapsed time model. The model shows that the time to run a SQL statement can be represented as the amount of work to be processed multiplied by the time it takes to process each piece of work. (This is a very simplistic and limited model.) For example, if a SQL statement must process 1000 LIOs and each LIO takes 1.0 ms, then the elapsed time will be 1,000 ms.&lt;br /&gt;&lt;br /&gt;To make this a little more elegant, let's create an equation.&lt;br /&gt;&lt;br /&gt;E = Work X Time per work&lt;br /&gt;&lt;br /&gt;where:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;E&lt;/i&gt; is the elapsed time (e.g., 1 second)&lt;br /&gt;&lt;i&gt;Work&lt;/i&gt; is the amount of work to be completed (e.g., 1000 LIOs)&lt;br /&gt;&lt;i&gt;Time per Work&lt;/i&gt; is the time it takes to process a single piece of work (e.g., 1.0 ms/LIO)&lt;br /&gt;&lt;br /&gt;This simple (and limited) model indicates we can reduce a statement's elapsed time by reducing either or both the amount of work to be processed or the time to process a single piece of work. Before we get into the actual experiment, let's first review each of these terms.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;Important&lt;/u&gt;: This is key: I define&amp;nbsp;&lt;i&gt;response time&lt;/i&gt;&amp;nbsp;as the time to process a single piece of work.&amp;nbsp;&lt;i&gt;Elapsed time&lt;/i&gt;&amp;nbsp;is the time to process many pieces of work.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Representing and quantifying work.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The amount of work to be processed is very straightforward. We can measure this by gathering statistics from &lt;b&gt;v$sesstat&lt;/b&gt;, SQL tracing, and perhaps other ways. How we decide to represent work, is typically based on what constrains the SQL statement elapsed time. For example, if a SQL statement must process 1M LIOs and 1K PIOs (block reads) and there is a raging CPU bottleneck, choosing LIOs to represent the work is probably a good choice. But if there is a raging IO read bottleneck, then choosing PIOs is probably a better choice. This topic is known as &lt;i&gt;choosing the unit of work&lt;/i&gt; and was presented in &lt;a href="http://shallahamer-orapub.blogspot.com/2011/08/why-tuning-oracle-works-and-modeling-it.html"&gt;my previous posting's section &lt;i&gt;Unit of Work Time&lt;/i&gt;&lt;/a&gt;. I have discussed this in many of blog postings and cover it extensively in my &lt;b&gt;&lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;&lt;/b&gt; class.&lt;br /&gt;&lt;br /&gt;While I'm not going to detail my experimental results in this posting, as you might expect, if you reduce the amount of work to be processed by 50% you are likely to see a similar reduction in elapsed time. However, I do present experimental evidence of this occurring in my conference presentation, &lt;i&gt;&lt;b&gt;&lt;a href="http://resources.orapub.com/SearchResults.asp?Search=sql+elapsed+time"&gt;SQL Elapsed Time Analysis&lt;/a&gt;&lt;/b&gt;&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Quantifying Time per Work.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Our models work very nicely if we classify SQL elapsed time into either CPU consumed or non-idle wait time. As you may know, it is rather simple to gather this time from either the instance statistic (&lt;b&gt;v$sesstat&lt;/b&gt;) or the time system model (&lt;b&gt;v$ses_time_model&lt;/b&gt;) views. To determine the time to process a single piece of work, we simply divide the total time (i.e., cpu + non-idle wait time) by the total amount of work (e.g., 1000 LIOs). This can be done for a specific SQL statement, a process, or over an interval of time (think: Statspack/AWR).&lt;br /&gt;&lt;br /&gt;In &lt;a href="http://shallahamer-orapub.blogspot.com/2011/08/why-tuning-oracle-works-and-modeling-it.html"&gt;my previous posting&lt;/a&gt;, I detailed how tuning Oracle reduced the response time for a LIO and how that response time reduction occurred as Operations Research queuing theory states.&lt;br /&gt;&lt;br /&gt;In this posting, I'm going to focus on the impact of response time on a SQL staetment's elapsed time. Specifically, I want to experimentally and demonstrably see if &lt;i&gt;reducing response time&lt;/i&gt; by 50% will reduce the elapsed of a SQL statement also by 50%... just as the above simple elapsed time model indicates.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Experimental Setup&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For this posting, I am going to use the exact same data set I used in my previous blog entry. While I didn't present this in the previous posting, in addition to gathering the response time related metrics, I also gathered the elapsed time for a very LIO dependent SQL statement. All the experimental data is included in the &lt;i&gt;Mathematica&lt;/i&gt; analysis notebook and can be viewed and download from my previous posting. The data collection script can also be viewed on-line from the previous posting. &lt;a href="http://shallahamer-orapub.blogspot.com/2011/08/why-tuning-oracle-works-and-modeling-it.html"&gt;Link to previous posting.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;But to summarize the situation:&amp;nbsp;I created a massive CPU bottleneck by having a number of Oracle sessions run SQL where all their blocks reside in the buffer cache, that is the SQL is logical IO (&lt;b&gt;v$sysstat: session logical reads&lt;/b&gt;) dependent. I gathered 30 ten minute samples with the CBC latches set to 256 and then to 32768. During these ten minute collection periods, I sampled the elapsed time of a specific LIO dependent SQL statement. &amp;nbsp;With 256 CBC latches, around 23 elapsed times where collected. With 32768 CBC latches, around 71 elapsed times where collected. The difference in the number of elapsed time samples was the result of the SQL completing sooner when there was 32768 CBC latches.&amp;nbsp;The SQL elapsed times where gathered&amp;nbsp;using the&amp;nbsp;&lt;b&gt;&lt;a href="http://resources.orapub.com/SQL_Elapsed_Time_Sampler_p/sqlesampler.htm"&gt;OraPub SQL Elapsed Time Sampler&lt;/a&gt;&lt;/b&gt;&amp;nbsp;(beta 3c).&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Experimental Results Analyzed&lt;/span&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-RxOQT1OJtfA/TlLgb_tfXyI/AAAAAAAAAR0/DvVmR_0PiUo/s1600/Fig1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="63" src="http://1.bp.blogspot.com/-RxOQT1OJtfA/TlLgb_tfXyI/AAAAAAAAAR0/DvVmR_0PiUo/s320/Fig1.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 1.&lt;/b&gt;&lt;/div&gt;Figure 1 contains a summary of the experimental results. The &lt;i&gt;Instance Rt (ms/lio)&lt;/i&gt; is simply the &lt;i&gt;total&lt;/i&gt; time (CPU time plus non-idle wait time) divided by the total number of LIOs processed during &lt;i&gt;the sample interval&lt;/i&gt;.&amp;nbsp;The results are stunningly straightforward: By adding additional CBC latches to combat the massive CBC latch contention, the instance LIO response time decreased by 85% and the elapsed time for the specific LIO dependent SQL statement being monitored decreased by 86%. That is truly amazing!&lt;br /&gt;&lt;br /&gt;Will this always occur? For this particular SQL statement probably YES, but for other SQL statements, it depends. The key is understanding what resource is in short supply and understanding if the SQL statement is dependent on this resource. Because there was massive CBC latch contention caused by an intense LIO workload and limited CPU power, I knew that SQL statements who performance was massively affected by LIO would therefore also have their elapsed time highly impacted by LIO response time. If I had selected a statement that was all about physical IO or inserts or updates, and not LIO, their performance (i.e., elapsed time) would likely not have been effected.&lt;br /&gt;&lt;br /&gt;This may sound very theoretical but you probably can relate this type of situation to your personal Oracle performance experiences. Relating the OS bottleneck to both the Oracle instance situation and the application SQL is extremely important to more fully understanding the performance situation and also to derive multiple appropriate solutions. For example, if there is a massive IO read bottleneck, you can expect SQL that is heavily IO read centric to be affected while LIO dependent SQL will likely not be affected.&amp;nbsp;I discuss this in the &lt;i&gt;Methods &amp;amp; Madness&lt;/i&gt; chapter (1) in my book entitled, &lt;a href="http://resources.orapub.com/Oracle_Performance_Firefighting_Book_p/ff_book.htm"&gt;Oracle Performance Firefighting&lt;/a&gt;. This posting helps build a quantitative framework for what we have experienced in real production Oracle systems.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;In Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Understanding the experimental results should at a minimum add confidence to what you have likely already experienced but hopefully also opens the door towards anticipating SQL statement elapsed times when tuning either Oracle (reducing response time) or tuning a SQL statement (reducing work to process).&lt;br /&gt;&lt;br /&gt;In this short posting, I've demonstrated a heavily LIO dependent SQL statement's elapsed was reduced nearly the same percentage as the LIO response time reduction. The LIO response time reduction was due to adding additional CBC latches.&lt;br /&gt;&lt;br /&gt;A broder application of this posting is we can increase SQL statement performance by tuning it (i.e., reduce the pieces of work it must process) and also by tuning Oracle (i.e., reduce LIO response time). While not mentioned in this entry, it should make sense that an OS centric solution would be use faster CPUs as this would also reduce LIO response time! Notice, we have a solution from an Oracle, an application, and an OS perspective. I call this approach the &lt;i&gt;OraPub 3-Circle Method&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; &lt;b&gt;&lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt;&lt;/b&gt; and &lt;b&gt;&lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;&lt;/b&gt;.  I teach these classes around the world multiple times each year. For  the latest schedule, click here. I also offer on-site training and  consulting services.&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond with a comment or you have a question,  please feel free to email me directly at craig@orapub .com. I use a  challenge-response spam blocker, so you'll need to open the challenge  email and click on the link or I will not receive your email. Another  option is to send an email to OraPub's general email address, which is  currently orapub.general@gmail .com.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-4757888390875908319?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/4757888390875908319/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/09/anticipating-sql-elapsed-times.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/4757888390875908319'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/4757888390875908319'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/09/anticipating-sql-elapsed-times.html' title='Anticipating SQL Elapsed Times'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-RxOQT1OJtfA/TlLgb_tfXyI/AAAAAAAAAR0/DvVmR_0PiUo/s72-c/Fig1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-7691898491870539050</id><published>2011-08-23T10:14:00.000-07:00</published><updated>2011-08-23T10:58:33.991-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='location test'/><category scheme='http://www.blogger.com/atom/ns#' term='quantitative model'/><category scheme='http://www.blogger.com/atom/ns#' term='latches'/><category scheme='http://www.blogger.com/atom/ns#' term='queuing theory'/><category scheme='http://www.blogger.com/atom/ns#' term='queue time'/><category scheme='http://www.blogger.com/atom/ns#' term='advanced oracle performance analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='operations research'/><category scheme='http://www.blogger.com/atom/ns#' term='cache buffer chain'/><category scheme='http://www.blogger.com/atom/ns#' term='altering service time'/><category scheme='http://www.blogger.com/atom/ns#' term='response time'/><title type='text'>Why tuning Oracle works and modeling it</title><content type='html'>Have you ever wondered why tuning Oracle improves performance? There are of course obvious answers, but then there are the deeper answers. More profound answers. It's like answering the question, "Why is the sky blue?" Sure you can say, it because the sun's light rays are scattered when they hit the Earth's atmosphere. But then why does scattering the light rays turn the sky blue? And it goes on and on. It can be just like that with Oracle performance.&lt;br /&gt;&lt;br /&gt;Last month I blogged about &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/07/cbc-latches-cpu-consumption-and-wait.html"&gt;CBC latches, CPU consumption, and wait time&lt;/a&gt;&lt;/b&gt;. In that posting I demonstrated that by adding cache buffer chain (CBC) latches to a CBC latch constrained system the CPU consumption per logical IO decreased.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;In this posting I want to demonstrate how a change in CPU consumed per logical IO causes a corresponding change in the time it takes to process a logical IO...just as&amp;nbsp;Operations Research queuing theory states.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Note: When I write, "tuning Oracle" I am referring to altering instance parameters that do not influence the optimizer to change a SQL statement's execution path. For this posting, I'm typically referring to instance parameters that alter the number of cache buffer chains and latches.&lt;br /&gt;&lt;br /&gt;For many of you, this posting will be immensely satisfying because we will have quantified and modeled the Oracle system, taken a tuning solution and quantitatively observed and understood why it altered the system, and then we observed the result closely matched our quantitative model.&amp;nbsp;If this still seems overly theoretical, in the next blog entry you will see how we can use this understanding to anticipate the impact on a SQL statement's elapsed time!&lt;br /&gt;&lt;br /&gt;In my previous&amp;nbsp;&lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/07/cbc-latches-cpu-consumption-and-wait.html"&gt;CBC latches, CPU consumption, and wait time&lt;/a&gt;&lt;/b&gt;&amp;nbsp;posting I defined and used a few terms that &lt;i&gt;must&lt;/i&gt; be understood before this blog posting will make any sense. The terms are unit of work, service time, queue time, response time, arrival rate, and elapsed time. Response time is the time to complete a single unit of work and elapsed time is the time to process multiple units of work. If this is somewhat confusing, please refer to that previous blog entry.&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Experimental Setup&lt;/span&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;To meet my objectives &amp;nbsp;I created an experiment that is easily repeatable. I created a&amp;nbsp;massive CPU bottleneck by having a number of Oracle sessions run SQL where all their blocks reside in the buffer cache, that is SQL is logical IO (&lt;b&gt;v$sysstat: session logical reads&lt;/b&gt;) dependent. I gathered 30 ten minute samples with the CBC latches set to 256 and then to 32768. During these ten minute collection periods, I sampled the elapsed time of a specific LIO dependent SQL statement. &amp;nbsp;With 256 CBC latches, around 23 elapsed times where collected. With 32768 CBC latches, around 71 elapsed times where collected. The difference in the number of elapsed time samples was the result of the SQL completing sooner when there was 32768 CBC latches.&lt;br /&gt;&lt;br /&gt;If you look closely at my data collection script, you can easily see how I captured, stored, and retrieved the performance data.&amp;nbsp;You can download the&amp;nbsp;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201108_rt_E_change/DataCollectionScript1f.txt"&gt;data collection script here&lt;/a&gt;&lt;/b&gt;. The SQL elapsed times where gathered&amp;nbsp;using the&amp;nbsp;&lt;b&gt;&lt;a href="http://resources.orapub.com/SQL_Elapsed_Time_Sampler_p/sqlesampler.htm"&gt;OraPub SQL Elapsed Time Sampler&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;You can download the raw data text files (&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201108_rt_E_change/Final_256.txt"&gt;256 latches&lt;/a&gt;&lt;/b&gt;,&amp;nbsp;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201108_rt_E_change/Final_32768.txt"&gt;32768 latches&lt;/a&gt;&lt;/b&gt;) and the Mathematica analysis notebook (&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201108_rt_E_change/TuningAnalysis1a.pdf"&gt;PDF&lt;/a&gt;&lt;/b&gt;,&amp;nbsp;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201108_rt_E_change/TuningAnalysis1a.nb"&gt;notebook&lt;/a&gt;&lt;/b&gt;).&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Concepts/Terms Quickly Reviewed&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Unit of Work Time&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Current Oracle performance analysis focuses much on the time involved (CPU plus non-idle wait time) related to SQL statement completion, process completion, or an Oracle instance over an specified interval (think: Statspack/AWR). That's great and is a fantastic analysis leap forward from ratio analysis and wait event analysis because it better reflects what a user is experiencing and it includes both wait time and CPU consumption. But to unite Oracle time based analysis with Operations Research queuing theory, we need the time related to a specific piece (or unit) of work. When we do this, we gain the advantages of our Oracle analysis plus all the years of proven Operations Research! Yeah... it's a big deal.&lt;br /&gt;&lt;br /&gt;There are many ways to describe the work being processed in an Oracle system. When we say, "The LIO workload is unusually high today" we are relating performance to the LIO workload. Or how about, "Parsing is hammering performance!" or "Disk reads are intense and really slow today and it's affecting some very key SQL statements." Each of these statements is speaking and relating system performance to a type of work; namely logical IO (session logical reads), parsing (parse count (hard)), and disk reads (physical reads).&lt;br /&gt;&lt;br /&gt;We can use this natural way of relating work and performance in very profound ways.&amp;nbsp;What I'm going to show you is how to quantify these performance statements and then demonstrate how tuning Oracle changed the underlying Operations Research queuing theory parameters and then in my next posting how this affects SQL elapsed times.&lt;br /&gt;&lt;br /&gt;&lt;div style="font-weight: normal; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;How Oracle Tuning Reduces CPU per Unit Of Work&lt;/b&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Think of it like this: Acquiring a latch or mutex consists of repeatedly checking a memory address (which consumes CPU) and possibly sleeping (which can be implemented in a number of ways). If there are 100 sessions requesting a latch and there is only one latch, you can see there will be a lot more spinning and sleeping compared to if there was 100 latches. By increasing the number of latches, we are effectively reducing the number of spins involved to process a LIO, which translates into reducing the CPU involved to process a LIO (on average).&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Let's get quantitative. For example, if over a one hour period Oracle processes consumed 1,000 seconds of CPU time while processing 5,0000,000 logical IOs, then the average CPU time to process a logical IO is 0.20 ms/lio.&lt;/div&gt;&lt;br /&gt;Here are some additional terms quickly defined:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Response Time&lt;/b&gt; (Rt or R) is the time it takes to process a &lt;b&gt;single&lt;/b&gt; unit of work. Queuing theory states that response time is service time (defined below) plus queue time (defined below).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Service Time&lt;/b&gt; (St or S) is the CPU consumed to process a single unit of work. We get this data from &lt;b&gt;v$sys_time_model&lt;/b&gt;, summing the&amp;nbsp;&lt;b&gt;DB CPU&lt;/b&gt; and&amp;nbsp;&lt;b&gt;background cpu time&lt;/b&gt;&amp;nbsp;columns. For those of you who are familiar with service time, while I don't detail this in this blog entry, Oracle service time, that is the CPU it takes to process a unit of work, is nearly constant regardless of the arrival rate... just as the theory indicates.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Queue time&lt;/b&gt; (Qt or Q) is the non-idle wait time related to processing a single unit of work. We get this data from &lt;b&gt;v$system_event&lt;/b&gt;. For those of you familiar with queue time, when response time increases, it is because the queue time increases, not because service time increases... and Oracle systems behave like the theory indicates.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Arrival Rate &lt;/b&gt;(L) is the number of units of work that arrive into the Oracle system per unit of time. For example, 120 physical IOs per second or 120 pio/sec. In a stable system, the arrival rate will equal the workload, which is why I commonly use the word &lt;i&gt;workload&lt;/i&gt;. This is avoid introducing yet another term and confusing people. The symbol &lt;b&gt;L&lt;/b&gt; is used because the arrival rate is always depicted using the greek symbol lambda.&lt;br /&gt;&lt;br /&gt;Now that I've covered the experimental setup and the key terms and concepts, let's take a look at the actual experimental results.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Experimental Results Analyzed&lt;/span&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;The objective of the posting is to&amp;nbsp;demonstrate that tuning Oracle by adding CBC latches in a CPU bound system with significant CBC latch contention system:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;/div&gt;&lt;ol&gt;&lt;li style="text-align: left;"&gt;Reduces the CPU consumed per logical IO (service time),&lt;/li&gt;&lt;li style="text-align: left;"&gt;Reduces response time as Operations Research queueing theory states.&lt;/li&gt;&lt;/ol&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;The Drop in CPU Consumed per Logical IO.&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;As I demonstrated in my &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/07/cbc-latches-cpu-consumption-and-wait.html"&gt;CBC latches, CPU consumption, and wait time&lt;/a&gt;&lt;/b&gt; posting, in a system that is CPU constrained experiencing massive CBC latch contention, one of the possible solutions is to increase the number of CBC latches. This causes a decrease in the CPU consumed while processing a LIO, that is the service time (CPU ms/lio or simply ms/lio). (This solution will only work if CBC latch access is not specific to a few CBC latches. &lt;b&gt;&lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Why?&lt;/a&gt;&lt;/b&gt;)&amp;nbsp;This blog posting's experiment also easily demonstrates this phenomenon.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-MP1_nrdrUlc/TlPRu7LB9xI/AAAAAAAAAR4/s31MnFzLtrk/s1600/Fig1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="41" src="http://2.bp.blogspot.com/-MP1_nrdrUlc/TlPRu7LB9xI/AAAAAAAAAR4/s31MnFzLtrk/s400/Fig1.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;b&gt;Figure 1.&lt;/b&gt;&lt;/div&gt;Figure 1 above shows the Operations Research queuing theory parameter results. Notice the 72% decrease in average service time when the number of latches was increased from 256 to 32768. &amp;nbsp;Numerically, it looks like a very real decrease in service time!&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-5HV6vIwbWDo/TlKsTahQIaI/AAAAAAAAARc/p1tfC_saMvo/s1600/Fig2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="193" src="http://3.bp.blogspot.com/-5HV6vIwbWDo/TlKsTahQIaI/AAAAAAAAARc/p1tfC_saMvo/s320/Fig2.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: center;"&gt;&lt;b&gt;Figure 2.&lt;/b&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Figure 2 above is a histogram of the service times. The red-like color bars are the sample service times when there was 32768 CBC latches and the blue-like bars are the sample times when there was 256 latches. Visually, it looks like when adding CBC latches the service time decrease is very significant!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;Just to make sure that statistically the service times are significantly different, I performed a significance test. Since the sample distributions where not normally distributed (obvious with the blue-like bars), a location significant test was performed. &lt;i&gt;Mathematica&lt;/i&gt; choose the Kruskal-Wallis test and the resulting P-value was 44.3x10-19, which is far below my chosen alpha of 0.05. Therefore, statistically there is a significant difference (and in this case a decrease) in the service times. You can view all these details in the Mathematica notepad and it's PDF. The link is provided in the &lt;i&gt;Experimental Setup&lt;/i&gt; section above.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;b&gt;Response Time Decreases as Queuing Theory States&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In this experiment I captured both the CPU time (service time, St, S) and the non-idle wait time (queue time, Qt, Q) related to a LIO. This is the time it takes to process a LIO (CPU time plus non-idle wait time), which can be called the response time (Rt, R). Referring once again to Figure 1 above, notice the response time dropped 85% from 0.0633 ms/lio (w/256 CBC latches) down to 0.0093 ms/lio (32768 CBC latches). As with service time, I performed a significance test and the P-Value was 3.0x10-11. The histogram looks very much like Figure 2. You can see the histogram in the Mathematica files (link in &lt;i&gt;Experimental Setup&lt;/i&gt; section above.)&lt;br /&gt;&lt;br /&gt;That's all good, but this section is really focused on asking the question, "Is this decrease in response time consistent with queuing theory?" Read on!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Develop a Simple Response Time Model&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;To answer this question, I'm going to develop a very simple quantitative response time model based on the Oracle system when it was configured with only 256 CBC latches. The classic Operations Research queuing theory response time model for a CPU subsystem is:&lt;br /&gt;&lt;br /&gt;R = S / ( 1 - ( L*S/M)^M )&lt;br /&gt;&lt;br /&gt;where:&lt;br /&gt;&lt;br /&gt;R is the response time (ms/lio)&lt;br /&gt;S is the service time (ms/lio)&lt;br /&gt;L is the arrival rate (lio/ms)&lt;br /&gt;M is the number of effective servers (will be close to the number of CPU cores or &lt;a href="http://shallahamer-orapub.blogspot.com/2011/06/cores-vs-threadspart-3.html"&gt;perhaps threads in an AIX system&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;Referring to Figure 1 above, notice we have values for all variables except M, the number of effective servers. In a CPU subsystem, M is the number of CPU cores or perhaps threads. Since we have real data, we can derive the number of &lt;i&gt;effective&lt;/i&gt; servers. If the system is CPU bound, the number of effective servers is typically pretty close the number of actual servers (i.e., CPU cores). Let's check it out!&lt;br /&gt;&lt;br /&gt;You cannot solve for M using standard Algebra... it won't work. &lt;a href="http://www.wolframalpha.com/input/?i=R+%3D+S+%2F+%28+1+-+%28+L*S%2FM%29%5EM+%29%2C+solve+M"&gt;Even Mathematica's WolframAlpha will tell you this&lt;/a&gt;! What is needed is some cyclical process that converges on M. In 2010 I created a simple web application, that anyone can access on-line, to solve for M. I call it the OraPub M-Solver and here is the URL:&amp;nbsp;&lt;b&gt;&lt;a href="http://filezone.orapub.com/cgi-bin/msolve.cgi"&gt;http://filezone.orapub.com/cgi-bin/msolve.cgi&lt;/a&gt;&lt;/b&gt; If you search Google for "msolver" and especially "orapub msolver" it will be the top result.&lt;br /&gt;&lt;br /&gt;Placing the values from our system into OraPub's M-Solver, you will see what is shown in Figure 3.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-uAhIA1W7Mu4/TlKvkiyIc8I/AAAAAAAAARg/ucrbUvCTeK4/s1600/Fig3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="214" src="http://2.bp.blogspot.com/-uAhIA1W7Mu4/TlKvkiyIc8I/AAAAAAAAARg/ucrbUvCTeK4/s320/Fig3.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 3.&lt;/b&gt;&lt;/div&gt;Press the submit button to solve for M and in a few seconds you will receive what is shown in Figure 4.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-9tGtLauuHm4/TlKwHHeL4mI/AAAAAAAAARk/XVcz-_sXHrE/s1600/Fig4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="279" src="http://2.bp.blogspot.com/-9tGtLauuHm4/TlKwHHeL4mI/AAAAAAAAARk/XVcz-_sXHrE/s320/Fig4.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 4.&lt;/b&gt;&lt;/div&gt;Figure 4 shows M at 4.598. There are four physical CPU cores in this system... not bad and very typical difference. (While I'm not going to go down this path, notice at the bottom of the Figure 4 there is a link to plot the resulting response time curve.)&amp;nbsp;As Figure 4 shows, we now have all the variable values for the response time formula; M, L, S, Q, and R.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Testing the Response Time Model&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The question before us is, does the change in the service time (S) produce a corresponding change in the response time (R) as queuing theory states? Let's check!&lt;br /&gt;&lt;br /&gt;Placing the modified service time (S) into our response time (R) formula along with the initial arrival rate (L) and effective servers (M):&lt;br /&gt;&lt;br /&gt;R = S / ( 1 - ( L*S/M)^M )&lt;br /&gt;&amp;nbsp; &amp;nbsp; =&lt;a href="http://www.wolframalpha.com/input/?i=+0.0087205+%2F+%28+1+-+%28+126.851*0.0087205%2F4.59756%29%5E4.59756+%29"&gt; 0.0087205 / ( 1 - ( 126.851*0.0087205/4.59756)^4.59756 )&lt;/a&gt;&lt;br /&gt;&amp;nbsp; &amp;nbsp;=&amp;nbsp;0.008733&lt;br /&gt;&lt;br /&gt;Our model anticipates the response time to be 0.008733 ms/lio. The experimentally observed response time was 0.0093257 ms/lio. That's really close! As Figure 5 shows, the difference is only 6.4%.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-mbD3jbGw1zE/TlLDzGzBICI/AAAAAAAAARo/C7MxQvnxoUA/s1600/Fig5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-mbD3jbGw1zE/TlLDzGzBICI/AAAAAAAAARo/C7MxQvnxoUA/s1600/Fig5.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 5.&lt;/b&gt;&lt;/div&gt;Figure 5 shows that when additional CBC latches were added and only incorporating the service time change into our response time model, the predicted response time differed only 6.4%. Considering the simplicity of our model, this is outstanding!&lt;br /&gt;&lt;br /&gt;You may have noticed that in Figure 1 when the additional latches where added and the system stabilized, the arrival rate increased by 175%. To be correct (and fair) to our response time model, we need to account for this change in the arrival rate. As most of you know, when we increase the arrival rate the resulting response time can also increase. So be fair, we need to incorporate the arrival rate increase into our model.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-1D-lnTCjQE8/TlLHwtKiEuI/AAAAAAAAARw/bXCKm2TKcrY/s1600/Fig6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-1D-lnTCjQE8/TlLHwtKiEuI/AAAAAAAAARw/bXCKm2TKcrY/s1600/Fig6.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 6.&lt;/b&gt;&lt;/div&gt;Figure 6 shows the results when incorporating both the change in service time (S) and arrival rate (L) into the response time model. In this case, our prediction was off by 10%. Again, considering the simplicity of our model (which can be greatly enhanced as I discuss in my &lt;b&gt;&lt;a href="http://training.orapub.com/content_forecasting.asp"&gt;Oracle Forecasting &amp;amp; Predictive Analysis&lt;/a&gt;&lt;/b&gt; course), this is outstanding!&lt;br /&gt;&lt;br /&gt;Very cool, eh? What we have seen is that by tuning Oracle we have reduced the time it takes to process a logical IO (response time) and this reduction is as our classic CPU queuing theory based model indicates.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;To Summarize...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The main point of this posting is to demonstrate that when we tuned Oracle by adding additional CBC latches, we effectively altered the Oracle kernel code path making it more efficient AND the resulting LIO response time changed as Operation Research queuing theory states!&lt;br /&gt;&lt;br /&gt;In a little more detail, this is what occurred:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;There was an intense CPU bottleneck along with raging CBC latch contention.&lt;/li&gt;&lt;li&gt;We observed the CPU time to process a single LIO (S) was 0.0313 ms and the total time to process a LIO (R) was 0.0633 ms.&lt;/li&gt;&lt;li&gt;We increased the number of CBC latches from 256 to 32768.&lt;/li&gt;&lt;li&gt;We restarted the system and let it stabilize.&lt;/li&gt;&lt;li&gt;We observed the CPU time to process a single LIO (S) decreased by 72%, the arrival rate (L) increased by 175%, and the total time to process a LIO (R) decreased by 85%.&lt;/li&gt;&lt;li&gt;Our response time model predicted, with the decrease in CPU time to process a LIO (S) and also the increase in the arrival rate (L), a response time (R) that was 10% greater than what actually occurred.&lt;/li&gt;&lt;/ol&gt;For many of you, this will be immensely satisfying because we have quantified and modeled an Oracle system, taken a tuning solution and quantitatively observed the resulting change and understood why it altered the system, and then we demonstrated the observed result closely matched our quantitate model.&lt;br /&gt;&lt;br /&gt;If this seems overly theoretical, in the next blog entry you'll see how we can use this information to anticipate the impact on a SQL statement's elapsed time!&lt;br /&gt;&lt;br /&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; &lt;b&gt;&lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt;&lt;/b&gt; and &lt;b&gt;&lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;&lt;/b&gt;. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-7691898491870539050?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/7691898491870539050/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/08/why-tuning-oracle-works-and-modeling-it.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/7691898491870539050'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/7691898491870539050'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/08/why-tuning-oracle-works-and-modeling-it.html' title='Why tuning Oracle works and modeling it'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-MP1_nrdrUlc/TlPRu7LB9xI/AAAAAAAAAR4/s31MnFzLtrk/s72-c/Fig1.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-9195115724944754093</id><published>2011-08-03T08:40:00.000-07:00</published><updated>2011-08-03T08:40:34.482-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sql trace'/><category scheme='http://www.blogger.com/atom/ns#' term='instrumentation'/><category scheme='http://www.blogger.com/atom/ns#' term='precision'/><category scheme='http://www.blogger.com/atom/ns#' term='sql_id'/><category scheme='http://www.blogger.com/atom/ns#' term='elapsed time'/><category scheme='http://www.blogger.com/atom/ns#' term='sampling'/><title type='text'>True SQL Elapsed Times... gathering</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: large;"&gt;Getting laughed at is no fun...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here's the situation: You run a Statspack or AWR report, determine the key SQL statements, and based on the total elapsed time and the number of executions you determine their average elapsed times. You then tell your user's you can see for their key SQL statement the average elapsed time is X. Then you notice they are kind of snickering because while it does sometimes take this long, it usually doesn't. And in their minds, your reputation and their trust in your skills sinks a little lower. If this situation makes you feel uncomfortable, then read on!&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Average elapsed times do not tell us much.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you have been following my blogs, you may recall that last February (2011) I blogged about &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/sql-statement-elapsed-times.html"&gt;SQL Statement Elapsed Times&lt;/a&gt;&lt;/b&gt;. One of the key take aways was &lt;i&gt;SQL statement elapsed times are not normally distributed&lt;/i&gt;. That is, for a given statement, if you gather a bunch of elapsed time samples, there will not be an equal number of samples below and above the average. Said another way, that nice bell curve histogram most people envision when we say, "The statement takes an average of X seconds to run" is not close to the truth. Not &lt;i&gt;even&lt;/i&gt; close.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Since SQL statement elapsed times are not normally distributed, knowing the average elapsed time is not all that useful and can easily mislead people.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Collecting real elapsed times.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To demonstrate this, in the February blog posting I created a procedure to gather elapsed times for a given &lt;b&gt;sql_id&lt;/b&gt; and &lt;b&gt;plan_hash_value&lt;/b&gt;. The problem was (and still is) the procedure can consume an entire CPU core while collecting data! Perhaps this is OK when gathering experimental data (sometimes), but obviously it's not going to be acceptable when you're performance firefighting on a CPU bottlenecked system and need to get a truer understanding of the elapsed times without hammering your system in the process. But if you'd like, you can download this free tool &lt;b&gt;&lt;a href="http://resources.orapub.com/SearchResults.asp?Search=sql+distribution"&gt;here&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;For months now, I have been planning on creating better SQL elapsed time collector. A tool that ever-so-lightly collects elapsed times for a given statement's &lt;b&gt;sql_id&lt;/b&gt;. At the time of this blog entry, a number of DBAs are beta testing and I have some preliminary data from some of them already.&lt;br /&gt;&lt;br /&gt;But you may be thinking: What about other methods of gathering elapsed times? There are other methods and below I'll discuss both instrumentation and tracing. And then, I'll compare them with the new tool that's in beta as I type.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;What about instrumentation and tracing?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Both instrumentation and tracing can produce very good elapsed time samples. When using instrumentation, the SQL you are interested in must (obviously) be instrumented. Not the module, action, user, or chunk of code, but the actual SQL statement. This is what I'm talking about:&lt;br /&gt;&lt;br /&gt;Get time T0&lt;br /&gt;Run SQL statement&lt;br /&gt;Get time T1&lt;br /&gt;Save elapsed time as T1-T0&lt;br /&gt;&lt;br /&gt;Tracing is another option that produces solid elapsed times. To prove this to myself, I enabled tracing for a specific &lt;b&gt;sql_id&lt;/b&gt; like this:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;alter system set events 'sql_trace [sql:5hy19uf6q4unx]';&lt;br /&gt;exec dbms_lock.sleep(60);&lt;br /&gt;alter system set events 'sql_trace off';&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;I also wrote a bourne shell script trace file processor, which given the &lt;b&gt;sql_id&lt;/b&gt; will produce all the &lt;b&gt;sql_id&lt;/b&gt; elapsed times contained with the trace file. It was written for Linux based on an Oracle 11.2 trace file, but I suspect can be easily modified if necessary. &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_true_sql_E_gathering/GetElapsedTraceV4.txt"&gt;Click here to view the shell script&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;To really prove to myself that these methods worked, I performed an experiment. You can &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_true_sql_E_gathering/MasterExperiment.txt"&gt;download the master experiment script here&lt;/a&gt;&lt;/b&gt;. I created a CPU bottleneck (top wait event was latch: cache buffer chain), instrumented the SQL and ran it as I just mentioned above (i.e., get time t0, etc.) while also tracing just that statement. I did the entire experiment twice; each time with a different average sleep time between the executions for the SQL statement I was gathering elapsed times for. The sleep times were log normal distributed, which is much more realistic than a contestant (i.e., uniform) or normally distributed sleep times. Each of the two sample times were around 5 minutes. The experimental setup worked wonderfully.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-tcZAMJf-KqQ/Ti30cUHtIwI/AAAAAAAAAQI/i6ky4O82fsY/s1600/Table+Low+Trace+Instr.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="55" src="http://3.bp.blogspot.com/-tcZAMJf-KqQ/Ti30cUHtIwI/AAAAAAAAAQI/i6ky4O82fsY/s320/Table+Low+Trace+Instr.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 1.&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-odWWibYehWM/Ti33ypjndKI/AAAAAAAAAQQ/so-CIvGBkcs/s1600/Table+Norm+Trace+Instr.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="55" src="http://3.bp.blogspot.com/-odWWibYehWM/Ti33ypjndKI/AAAAAAAAAQQ/so-CIvGBkcs/s320/Table+Norm+Trace+Instr.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 2.&lt;/b&gt;&lt;/div&gt;Figure 1 shows a summary of data collected when the sleep time averaged around 2 seconds and Figure 2 data was collected when the sleep times averaged around 1.8 seconds. The load was also a little more intense during the Figure 2 collection, hence the elapsed times are slightly longer. Within Figure 1 and Figure 2, notice how similar the captured elapsed times are! Statistically speaking, there is no difference between them (alpha=0.05, p-value 0.980) This means both methods will yield the same test results... you pick which one works the best for you!&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-SaQNl0QZNME/Ti35E7hw_UI/AAAAAAAAAQU/n6qJbkcUgHE/s1600/Hist+Norm+Trace+Instr.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="195" src="http://1.bp.blogspot.com/-SaQNl0QZNME/Ti35E7hw_UI/AAAAAAAAAQU/n6qJbkcUgHE/s320/Hist+Norm+Trace+Instr.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 3.&lt;/b&gt;&lt;/div&gt;Figure 3 really shows the story! Figure 3 above contains two smoothed histograms based on the data in Figure 2 above. Notice the lines are difficult to distinguish. This means both the instrumentation and SQL tracing elapsed time strategies resulted in nearly the exact same results. And that is what the statistics also told us.&lt;br /&gt;&lt;br /&gt;The point of Figure 1, Figure 2, and Figure 3 is &lt;i&gt;both instrumentation and SQL tracing produce good and the same elapsed time data.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;That's great if you're application is instrumented or you can (or want to) enable SQL tracing... even at the &lt;b&gt;sql_id&lt;/b&gt; level. But perhaps this is not a production system option? Is there another option? Read on...&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Another option: The OraPub Elapsed Time Sampler&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you need good SQL statement elapsed times and the SQL of interest is not instrumented and tracing is not a viable option, then consider the OraPub Elapsed Time Sampler. The sampling overhead is minuscule, it is simple to use, it is customizable, and it is amazingly accurate. &lt;b&gt;&lt;a href="http://resources.orapub.com/product_p/SQLESampler.html"&gt;Here is the link to the tool's web-page&lt;/a&gt;&lt;/b&gt;.&amp;nbsp; As the web-site states (at this time), if you would like to beta test the tool, please email me at orapub.general@gmail.com. Once the beta testing is completed, the tool will be available on my web-site, but for a few dollars.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Little sampling overhead.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;All performance tools place some overhead on the systems they are monitoring. I decided to actually observe and honeslty report the overhead. The product incorporates several strategies to reduce the overhead. In the initial beta version there are also three precision options, which impacts the gathering overhead. The precision options are low, normal, and high.&lt;br /&gt;&lt;br /&gt;When the product is looking for the specific SQL to complete, the CPU impact is around 1.2%, 1.2% (not a typo) and 20% on a &lt;i&gt;single core&lt;/i&gt; for the low, normal, and high precision options respectively. &lt;i&gt;This means if you have a 4 core server and gathering at the low precision the collection server process would be consuming 0.3% of the CPU resources. &lt;/i&gt;When the tool is looking for the SQL to monitor, the CPU overhead is 12%, 56%, and 100% on &lt;i&gt;a single CPU core&lt;/i&gt; for the low, normal and high precision options respectively. This means if you have a 4 core server and gathering at the low precision, the collection server process would be consuming 3% of the CPU resources (3% of a core = 12% / 4 cores).&lt;br /&gt;&lt;br /&gt;I typically use the low precision setting because, as you'll see below, even at this level the results are stunningly accurate. Keep in mind, these numbers are for the initial beta version (3c) and I'm still working to reduce the overhead. I also planning on creating a super-low precision/impact setting.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Simple to use.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;First you get the &lt;b&gt;sql_id&lt;/b&gt; you want to monitor. (If you found a problem SQL statement in a Statspack or AWR report, the &lt;b&gt;sql_id&lt;/b&gt; will be right there in the report.) Then determine the sampling precision and duration. And finally, execute the sampling like this:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;exec op_sample_elapsed_v3.sample(600,'5hy19uf6q4unx','low','none','key');&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;When the sampling is complete, simply query from the sample data table, like this:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;SQL&amp;gt; select elapsed_time_s from op_elapsed_samples;&lt;br /&gt;&lt;br /&gt;ELAPSED_TIME_S&lt;br /&gt;--------------&lt;br /&gt;      1.767666&lt;br /&gt;      1.518642&lt;br /&gt;      1.518561&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;      ...&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;b&gt;Simple to install.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;You create the sample data table and create a single package. Installation done!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Amazingly accurate.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This is the really cool part! My core objectives where low overhead and to be accurate "enough." As I mentioned above, the overhead is virtually zero. It's easy to collect accurate data when the elapsed times are long, perhaps greater than 20 seconds. But for statements just a second or two long...it becomes very difficult to maintain the balance of low overhead and accuracy.&lt;br /&gt;&lt;br /&gt;You can download all the experimental data, some of what I show in this blog entry: NORMAL precision data and analysis (&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_true_sql_E_gathering/Elapsed_1aNormal.pdf"&gt;PDF&lt;/a&gt;&lt;/b&gt;, &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_true_sql_E_gathering/Elapsed_1a_Normal.nb"&gt;Mathematica notebook&lt;/a&gt;&lt;/b&gt;) and LOW precision data and analysis (&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_true_sql_E_gathering/Elapsed_1aLow.pdf"&gt;PDF&lt;/a&gt;&lt;/b&gt;, &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_true_sql_E_gathering/Elapsed_1a_Low.nb"&gt;Mathematica notebook&lt;/a&gt;&lt;/b&gt;). You can also &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_true_sql_E_gathering/MasterExperiment.txt"&gt;view/download the my master experiment text file&lt;/a&gt;&lt;/b&gt;, which is what I copy and paste from when I ran the experiment.&lt;br /&gt;&lt;br /&gt;The tool's collected elapsed times are extremely accurate. While I didn't mention this above (would have been a distraction), while gathering the sample data summarized in Figure 1, Figure 2, and Figure 3, I was also gathering elapsed time samples using this tool... I just didn't show that data in those figures. (sneaky, I know) Here is data in both table and histogram format.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-SnUt97VaUCM/Ti4BqC9a7iI/AAAAAAAAAQY/v8DViUh2hmE/s1600/Table+Normal+All.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="73" src="http://2.bp.blogspot.com/-SnUt97VaUCM/Ti4BqC9a7iI/AAAAAAAAAQY/v8DViUh2hmE/s320/Table+Normal+All.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 4. Normal precision and overhead.&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-XtPqAIO386E/Ti4B0OHNdGI/AAAAAAAAAQc/ZJaGBTVLnA8/s1600/Hist+Normal+All.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="196" src="http://4.bp.blogspot.com/-XtPqAIO386E/Ti4B0OHNdGI/AAAAAAAAAQc/ZJaGBTVLnA8/s320/Hist+Normal+All.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 5. Normal precision and overhead.&lt;/b&gt;&lt;/div&gt;Figure 4 and Figure 5 contains all the data from a 5 minute sample interval at the &lt;i&gt;normal&lt;/i&gt; precision level. The sleep time between the SQL statement I was looking for was around 2 seconds (log normal distributed). As you probably inferred by looking at Figure 4 and Figure 5, statistically there is no difference between the elapsed time gathering methods. The smoothed historgram (Figure 5) clearly shows there is virtually no difference in the collection methods.&lt;br /&gt;&lt;br /&gt;But what about the low precision setting? After all, the lowest precision setting places a near undectable load on the system. Figure 6 and Figure 7 show the results.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-Os9dbuYnjZs/Ti4EJ3heCBI/AAAAAAAAAQg/F6S8OE9BANw/s1600/Table+Low+all.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="73" src="http://4.bp.blogspot.com/-Os9dbuYnjZs/Ti4EJ3heCBI/AAAAAAAAAQg/F6S8OE9BANw/s320/Table+Low+all.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 6. Low precision and overhead.&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-Y8q6Dg4qVKo/Ti4EaOvxcnI/AAAAAAAAAQk/tklT3_0PZ9M/s1600/Hist+Low+All.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="196" src="http://1.bp.blogspot.com/-Y8q6Dg4qVKo/Ti4EaOvxcnI/AAAAAAAAAQk/tklT3_0PZ9M/s320/Hist+Low+All.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 7. Low precision and overhead.&lt;/b&gt;&lt;/div&gt;By looking at Figure 4, Figure 5, Figure 6 and Figure 7 you probably inferred that any of the collection methods will work fine and produce the same results (statistically speaking, alpha 0.05). Yes, this is correct!&lt;br /&gt;&lt;br /&gt;To summarize the &lt;i&gt;OraPub SQL Elapsed Time Sampler&lt;/i&gt; option: If your SQL is not instrumented and SQL tracing is not a production option and you have the spending authority of around a box of candy, then this product should satisfy your requirements. In fact, any of the precision settings will produce accurate results along with a shockingly (or perhaps refreshingly) low sampling overhead.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Again, &lt;a href="http://resources.orapub.com/product_p/SQLESampler.html"&gt;here's the link to the tool's web page&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Send me your data!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you would like, you can email me your elapsed time data and I'll run it through a &lt;i&gt;Mathematica&lt;/i&gt; notebook, which will crank out a number of graphs and tables. Usually the results are very informative and immensely satisfying.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;In Summary...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;My objective in this blog entry was not to push my new tool (really). I was just as interested in understanding the accuracy and similarity of all three collection methods (instrumentation, tracing, and OraPub's sampling tool). As I mentioned at the top, its easy to get the SQL statement average elapsed time...but that can be very misleading and not all that helpful. What is needed are good elapsed time samples. This blog entry presents three ways to get really, really good elapsed times with relatively low overhead and at virtually no cost.&lt;br /&gt;&lt;br /&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; &lt;b&gt;&lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt;&lt;/b&gt; and &lt;b&gt;&lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;&lt;/b&gt;. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-9195115724944754093?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/9195115724944754093/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/08/true-sql-elapsed-times-gathering.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/9195115724944754093'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/9195115724944754093'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/08/true-sql-elapsed-times-gathering.html' title='True SQL Elapsed Times... gathering'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-tcZAMJf-KqQ/Ti30cUHtIwI/AAAAAAAAAQI/i6ky4O82fsY/s72-c/Table+Low+Trace+Instr.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-6086147956701844529</id><published>2011-07-27T13:52:00.000-07:00</published><updated>2011-07-27T13:52:53.354-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oracle training'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle performance training'/><title type='text'>Focus on training...rest of 2011</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-BE3AkIO-5Jk/Ti9BS8gNEhI/AAAAAAAAAQs/U62JD6aqYBE/s1600/40.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-BE3AkIO-5Jk/Ti9BS8gNEhI/AAAAAAAAAQs/U62JD6aqYBE/s1600/40.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;This posting is &lt;i&gt;very&lt;/i&gt; unusual. There is no grand experiment I'm performing, no exciting statistical analysis, and no profound Oracle insights uncovered. However, if you are thinking about getting some training before the year is over and there is some budget left, then perhaps one of OraPub's courses is the perfect fit. If that's a possibility for this year or next, then read on!&lt;br /&gt;&lt;br /&gt;In this posting, I'm going to summarize each of my courses and also list the locations that have been confirmed for the remainder of 2011. There is also a lot of course&amp;nbsp;information online at &lt;b&gt;&lt;a href="http://training.orapub.com/"&gt;OraPub's training web-site&lt;/a&gt;&lt;/b&gt;. For reference, you can see &lt;b&gt;&lt;a href="http://resources.orapub.com/OraPub_Class_Pictures_and_Videos_s/74.htm"&gt;pictures from my past courses&lt;/a&gt;&lt;/b&gt; and also &lt;b&gt;&lt;a href="http://resources.orapub.com/articles.asp?id=148"&gt;student testimonials&lt;/a&gt;&lt;/b&gt; online. If you have any questions, please feel free to email me at orapub.general@gmail.com.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-MhvBJGbTJqk/Ti9BgJQyaCI/AAAAAAAAAQw/aSe8O_nY7LY/s1600/51.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-MhvBJGbTJqk/Ti9BgJQyaCI/AAAAAAAAAQw/aSe8O_nY7LY/s1600/51.jpg" /&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Locations.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;While I will be speaking at various user groups and conferences, my training courses will be offered in Atlanta, Chicago, Salt Lake City, and Portland (Oregon). The first half of 2012 I'm hoping to offer my courses in Europe and also in Brazil. If you live in these areas, let me know if you are want my training in your country. The more I hear from people the better the chance I can bring my training to you!&amp;nbsp;&lt;b&gt;&lt;a href="http://training.orapub.com/reg/shop/category.asp?catid=2&amp;amp;sortby=itemno"&gt;Registration is open&lt;/a&gt;&lt;/b&gt; for each of the course offerings shown below.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;September - Atlanta, GA&lt;/b&gt;&lt;br /&gt;Sep 19-21. Oracle Performance Firefighting&lt;br /&gt;Sep 22-23. Advanced Oracle Performance Analysis&lt;br /&gt;&lt;br /&gt;&lt;b&gt;October - Chicago, IL area&lt;/b&gt;&lt;br /&gt;Oct 10-12. Oracle Performance Firefighting&lt;br /&gt;Oct 13-14. Advanced Oracle Performance Analysis&lt;br /&gt;&lt;br /&gt;&lt;b&gt;November - Salt Lake City, UT&lt;/b&gt;&lt;br /&gt;Nov 14-16. Oracle Performance Firefighting&lt;br /&gt;Nov 17-18. Advanced Oracle Performance Analysis&lt;br /&gt;&lt;br /&gt;&lt;b&gt;November - Either San Francisco area or &amp;nbsp;Washington, DC area (contact me!)&lt;/b&gt;&lt;br /&gt;Nov 30- Dec 2. Oracle Forecasting &amp;amp; Predictive Analysis&lt;br /&gt;&lt;br /&gt;&lt;b&gt;December - Portland, OR&lt;/b&gt;&lt;br /&gt;Dec 12-14. Oracle Performance Firefighting&lt;br /&gt;Dec 15-16. Advanced Oracle Performance Analysis&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;My Courses.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/-wMUc_zjf7-0/Ti9DtRMrvlI/AAAAAAAAARA/0DkoSL-GNDc/s1600/21.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-wMUc_zjf7-0/Ti9DtRMrvlI/AAAAAAAAARA/0DkoSL-GNDc/s1600/21.jpg" /&gt;&lt;/a&gt;I have three core courses: Oracle Performance Firefighting, Advanced Oracle Performance Analysis, and finally Oracle Forecasting &amp;amp; Predictive Analysis. What my students generally tell me about my courses is they are based on solid research and experience. I'm able to take very technical and diverse topics and make them practical in their performance work. Students also like my teaching style. If you have seen my presentations at a user group or at a conference, you have an idea of my teaching style.&lt;br /&gt;&lt;br /&gt;Here is a summary for each course. For more information, &lt;b&gt;&lt;a href="http://training.orapub.com/content.asp"&gt;click here&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Oracle Performance Firefighting.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-5xrkRlvemVU/Ti9Bq0rqIgI/AAAAAAAAAQ0/ryiec8lGY5Q/s1600/8.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-5xrkRlvemVU/Ti9Bq0rqIgI/AAAAAAAAAQ0/ryiec8lGY5Q/s1600/8.jpg" /&gt;&lt;/a&gt;The Firefighting course is my foundational course that has been running since 1999! My firefighting course is all about quick and methodical diagnosis, deep Oracle internals, resolution strategies, and learning to communicate your recommendations. While it's easy to think there are other courses available like this, in reality there is not. My students tell me that what sets my course apart is a true response time analysis that is presented and practiced in class, analysis perspectives (Oracle, application, and OS) combined into a single consistent story, the in-class case studies, and the way I teach. If I state something that I can't demonstrate with an experiment then I will tell you so. It's a wonderful class and, I love teaching it.&lt;br /&gt;&lt;br /&gt;The class is three days in length. For more information, &lt;b&gt;&lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;click here&lt;/a&gt;&lt;/b&gt;. If you have taken the course within the past four years and would like to take it again, there is a 50% refresher discount.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Advanced Oracle Performance Analysis.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-QoHs0taCnqk/Ti9CToP7MnI/AAAAAAAAAQ8/75-zpT2GyGc/s1600/47.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-QoHs0taCnqk/Ti9CToP7MnI/AAAAAAAAAQ8/75-zpT2GyGc/s1600/47.jpg" /&gt;&lt;/a&gt;The Advanced Analysis course is two days and is offered after the Firefighting course providing DBAs with a full week of intensive training. This class is relatively new and of all my courses, it has undergone the most radical change over the past two years. If your management is not satisfied with the "just trust me, I know what I'm doing" statement and wants an honest understandable story about the situation, your solution, visual and numeric data, and &lt;u&gt;the expected outcome&amp;nbsp;for each solution&lt;/u&gt; then this class is for you. The Firefighting course produces a list of technical viable solutions that make sense. The Advanced Analysis course enables you to confidently rank and have an honest and knowledgable conversation about which solution should be implemented first AND to state the expected impact on the system. Users and even individual SQL statements. Yeah... it's pretty cool and very practical. One of the highest praises I receive from my students about this class is that I'm able to take performance theory and make it very practical and useful.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Oracle Forecasting &amp;amp; Predictive Analysis.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-xKcdghp7b5Y/Ti9B2EcZLzI/AAAAAAAAAQ4/KkZdgFoY-_4/s1600/58.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-xKcdghp7b5Y/Ti9B2EcZLzI/AAAAAAAAAQ4/KkZdgFoY-_4/s1600/58.jpg" /&gt;&lt;/a&gt;The Predictive Analysis course is very different from my other courses. Both the Firefighting and the Advanced Analysis courses assume you're in the middle of a roaring performance firefight. In stark contrast, the Predictive Analysis course assumes performance is currently acceptable and you're looking into the future; asking questions about &lt;i&gt;when&lt;/i&gt; performance will become unacceptable, &lt;i&gt;where&lt;/i&gt; it will be at greatest risk, and how can &lt;i&gt;risk be mitigated&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;If you have read my Forecasting Oracle Performance book you'll have a good idea about the course contents, but the course goes much further than the book. This is especially true in the modeling area. A significant portion of the course is spent developing and working with models based in MS-Excel.&lt;br /&gt;&lt;br /&gt;While many DBAs and their managers feel "forecasting" is the responsibility of some other group, every DBA has been asked questions like, "We are adding 50 more users to the accounting application. That's not going to be a problem, right?" That's a predictive question! Or, when managing virtualization, requirements still need to be forecasted for budget and planning purposes. Certainly you need to ensure there is enough capacity to meet the expected requirements. So, forecasting is just as important as it ever has been.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/-wughJNgPYBA/Ti9BB2WE81I/AAAAAAAAAQo/pgtuE2r-kAg/s1600/30.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-wughJNgPYBA/Ti9BB2WE81I/AAAAAAAAAQo/pgtuE2r-kAg/s1600/30.jpg" /&gt;&lt;/a&gt;For most Predictive Analysis students, the course opens up a whole new world of IT and enables them to not only do their existing job better but can jump start them into a new branch of expertise. If you have never taken my forecasting class before, I hope one day you can get the chance.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Questions?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you have any questions about my training courses, please feel free to email me directly. The best email is OraPub's general, which is orapub.general@gmail.com.&lt;br /&gt;&lt;br /&gt;Thanks for reading and I look forward to hearing from you.&lt;br /&gt;&lt;br /&gt;Craig.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-6086147956701844529?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/6086147956701844529/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/07/focus-on-trainingrest-of-2011.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/6086147956701844529'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/6086147956701844529'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/07/focus-on-trainingrest-of-2011.html' title='Focus on training...rest of 2011'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-BE3AkIO-5Jk/Ti9BS8gNEhI/AAAAAAAAAQs/U62JD6aqYBE/s72-c/40.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-1931454229838943604</id><published>2011-07-15T08:03:00.000-07:00</published><updated>2011-07-15T08:03:27.067-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='latch'/><category scheme='http://www.blogger.com/atom/ns#' term='queue time'/><category scheme='http://www.blogger.com/atom/ns#' term='hashing'/><category scheme='http://www.blogger.com/atom/ns#' term='cache buffer chain'/><category scheme='http://www.blogger.com/atom/ns#' term='elapsed time'/><category scheme='http://www.blogger.com/atom/ns#' term='service time'/><category scheme='http://www.blogger.com/atom/ns#' term='response time'/><category scheme='http://www.blogger.com/atom/ns#' term='CBC latch'/><title type='text'>CBC latches, CPU consumption, and wait time</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: large;"&gt;What's this about?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When a system is experiencing severe cache buffer chain (CBC) latch contention and many of the child latches are active, one of the common Oracle focused solutions is to increase the number of CBC latches. This typically yields some benefit because both the CPU time and the non-idle wait time per SQL statement execution decreases. But why is CPU time and non-idle wait time reduced? This is what this blog entry is all about...&lt;br /&gt;&lt;br /&gt;If you just want the "bottom line," scroll down to the last section.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Getting a time perspective.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For the experimental results to really make sense, we need to look at performance from a &lt;i&gt;time&lt;/i&gt; and also a &lt;i&gt;unit of work time&lt;/i&gt; perspective.&lt;br /&gt;&lt;br /&gt;One way to categorize the time related to processing a piece of Oracle work (e.g., &lt;b&gt;buffer get&lt;/b&gt;, SQL statement) is to place the time into two buckets; CPU time and non-idle Oracle wait time (i.e., wait time). For example, if on average a SQL statement consumes 1 second of CPU time and 2 seconds of wait time per execution, then the elapsed time is 3 seconds.&lt;br /&gt;&lt;br /&gt;For reference, a &lt;b&gt;buffer get&lt;/b&gt; is sometimes call a &lt;i&gt;logical IO&lt;/i&gt;&amp;nbsp;or for short LIO or lio. This statistic can be gathered from &lt;b&gt;v$sesstat&lt;/b&gt; and &lt;b&gt;v$sysstat&lt;/b&gt; with the statistic name&amp;nbsp;&lt;b&gt;session logical reads&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;It turns out that the CPU consumption per piece of Oracle work is very consistent regardless of the workload intensity. (This is a key subject in my Oracle performance courses and I will not get into the details in this blog entry.) What continues to slow performance is as the workload intensity increases the wait time increases.&lt;br /&gt;&lt;br /&gt;So it follows that there are three fundamental ways to reduce the time it takes to process a buffer get, SQL statement, or a group SQL statements. You can either reduce the amount of work required (i.e., workload intensity), the CPU consumption, or the non-idle Oracle wait time. Increasing parallelism is also another option, but I'll save that discussion for another time.&lt;br /&gt;&lt;br /&gt;When we talk about time, it needs to be related to a task. Like the time related to driving to work, mowing the lawn, having a conversation with someone, or completing a SQL statement. These tasks are actually made up of many small &lt;i&gt;movements&lt;/i&gt;&amp;nbsp;or &lt;i&gt;pieces of work&lt;/i&gt;. When referring to driving to work, a &lt;i&gt;movement&lt;/i&gt; that makes practical sense can be the number tire revolutions. When referring to a SQL statement, a practical &lt;i&gt;movement&lt;/i&gt; could be a buffer get. I commonly call a &lt;i&gt;movement&lt;/i&gt; a &lt;i&gt;&lt;b&gt;unit of work&lt;/b&gt;&lt;/i&gt; or a piece of work.&lt;br /&gt;&lt;br /&gt;Let's focus on a buffer get as the movement, that is a unit of work. The elapsed time of a SQL statement can be &lt;i&gt;expressed&lt;/i&gt; as the sum of the time for doing all the necessary buffer gets, that is, all the movements as expressed as buffer gets. For example, if a SQL statement's elapsed time is 3 seconds and 10000 buffer gets are required to complete the statement, then the average time to complete each buffer get is 0.0003 seconds or simply 0.0003 seconds per buffer get.&lt;br /&gt;&lt;br /&gt;One of the advantages of working at the &lt;i&gt;unit of work&lt;/i&gt; level is we can take advantage of all the research, theory, and mathematics related to &lt;i&gt;&lt;b&gt;Operations Research&lt;/b&gt;&lt;/i&gt;. In my next blog entry, I will demonstrate how this blog's experimental results match Operations Research.&lt;br /&gt;&lt;br /&gt;As mentioned above, there are three fundamental ways to reduce the time it takes to process a piece of work, that is, a unit of work. We can reduce the CPU it takes to process the piece of work, reduce the wait time related to the piece of work, or we can reduce the number of pieces of work we must process.&amp;nbsp;For example, suppose a SQL statement's elapsed time is 3 seconds, 10000 buffer gets are required to complete the statement, 2 seconds of CPU are consumed, and there is 1 second of Oracle non-idle wait time. With just these bits of information we know, can generalize, and can quantify the performance situation like this:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;10000 buffers were processed over the 3 second elapsed time. This means the workload was 3333.333 buffers/second or 3.333 buffers/ms. (10000/3 buffers/sec X 1/1000 sec/ms = 3.333 buffers/ms)&lt;/li&gt;&lt;li&gt;It took 2 seconds of CPU to process all 10000 buffers. This means the CPU time required to process a single buffer was 0.0002 seconds or 0.0002 sec/buffer or 0.020 ms/buffer. (2/10000 seconds/buffer X 1000/1 ms/sec = 0.200 ms/buffer)&lt;/li&gt;&lt;li&gt;There was 1 second of non-idle Oracle wait time involved when processing the 10000 buffers. This means the wait time associated with processing each buffer was 0.0001 seconds or 0.0001 sec/buffer or 0.100 ms/buffer. (1/10000 seconds/buffer X 1000/1 ms/sec = 0.100 ms/buffer)&lt;/li&gt;&lt;/ul&gt;Therefore, if we can reduce the workload, the CPU time, or the wait time then the elapsed time will likely be reduced. To reduce the words, let's create an equation.&lt;br /&gt;&lt;br /&gt;E = Work X Time per Work&lt;br /&gt;&lt;br /&gt;where:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;E&lt;/i&gt; is the elapsed time (e.g., 2 seconds)&lt;br /&gt;&lt;i&gt;W&lt;/i&gt; is the amount of work to be completed (e.g., 10000 buffer gets)&lt;br /&gt;&lt;i&gt;Time per Work&lt;/i&gt; is the CPU and the wait time associated with processing a single piece of work. (e.g., 3 seconds.&lt;br /&gt;&lt;br /&gt;Taking this a step further:&lt;br /&gt;&lt;br /&gt;E = Work X ( CPU time per work + wait time per work )&lt;br /&gt;&lt;br /&gt;And there we have it! We can reduce the elapsed time (E) by either reducing the work (Work), the CPU time per unit of work, or by reducing the wait time per unit of work.&amp;nbsp;While this may seem rather theoretical (and it is to some degree), as I will detail below this is important to understanding why adding CBC latches to a system that is experiencing severe CBC latch contention can improve performance.&lt;br /&gt;&lt;br /&gt;A warning: While this a very simple way to model elapsed time and valid in many cases, it is also limited...much like a paper airplane models a commercial jet or a benchmark models a real production Oracle system. But just as with many models, we can learn a tremendous amount about a specific area of interest in a very complicated system...that for practical purposes is impossible to replicate.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;How does this relate to adding CBC latches?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Note: This is probably the most important section in this blog, so please read it carefully.&lt;br /&gt;&lt;br /&gt;Answer: Because when we add CBC latches, Oracle does not have to spin as many times and/or sleep as many times when acquiring a latch. Spinning on a latch consumes CPU, therefore reducing spinning reduces CPU consumption, which reduce CPU consumption per unit of work. Sleeping less often reduces non-Oracle wait time (&lt;b&gt;latch: cache buffer chains&lt;/b&gt;), which reduces wait time per unit of work. Said another way, when we spin less, we consume less CPU. And when we sleep less, we wait less.&lt;br /&gt;&lt;br /&gt;Another way of looking at this is to understand that when a process spins, it is actually executing lines of Oracle code. For example, suppose each spin executes 5 lines of code and to process a buffer it is currently taking around 4000 spins. However, through Oracle tuning it is now only taking 100 spins to process a buffer. This means the lines of code executed went from 20000 (5 X 4000) down to 500 (5 X 100). Since each line of code executed consumes CPU, we have purposefully and truly reduced the CPU required to process a buffer get and consequently also a buffer get dependent SQL statement's elapsed time. But it gets better! If a process spins less they are more likely to sleep less also, reducing wait time.&lt;br /&gt;&lt;br /&gt;From a conceptual perspective, think of it like this: If there are more latches available yet the number of sessions competing for the latches remains the same, a session will be competing with fewer sessions when attempting to acquire the latch. In addition to this, a session is less likely to be asking for a latch that another process already has acquired. This results in less spinning (CPU reduction) and sleeping (wait time reduction).&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;But is this what really happens in Oracle systems?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Short answer: Yes. To demonstrate this an experiment is needed.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Experimental Design.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;/b&gt;I created a system with a severe cache buffer chain load. The workload consisted of many queries in which all the blocks reside in the buffer cache. The server consists of a single 4 core Intel CPU, running Red Hat Linux, and Oracle 11.2.&lt;br /&gt;&lt;br /&gt;I actually did two experiments. They are related but with one key difference. One only alters the number of CBC latches (Experiment 1) and the second alters both the CBC latches and the number of chains (Experiment 2).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Downloads.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Here are the download links:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_cbc_time/LIO_St_Analysis1b.txt"&gt;Click here&lt;/a&gt;&lt;/b&gt;, to download and view online the text file containing the various data collection and reporting scripts.&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_cbc_time/E1_CBC_Analysis_1d.pdf"&gt;Click here&lt;/a&gt;&lt;/b&gt;, to download and view online the PDF file of the &lt;i&gt;Mathematica&lt;/i&gt; notebook for Experiment 1.&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_cbc_time/E2_CBC_Analysis_1d.pdf"&gt;Click here&lt;/a&gt;&lt;/b&gt;, to download and view online the&amp;nbsp;PDF&amp;nbsp;file of the &lt;i&gt;Mathematica&lt;/i&gt; notebook for Experiment 2.&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_cbc_time/E1_CBC_Analysis_1d.nb"&gt;Click here&lt;/a&gt;&lt;/b&gt;, to download the source &lt;i&gt;Mathematica&lt;/i&gt; file for Experiment 1.&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_cbc_time/E2_CBC_Analysis_1d.nb"&gt;Click here&lt;/a&gt;&lt;/b&gt;, to download the source &lt;i&gt;Mathematica&lt;/i&gt; file for Experiment 2.&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/201107_cbc_time/AnalysisPack.zip"&gt;Click here&lt;/a&gt;&lt;/b&gt;, to download all the above five file in a single zip file.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-weight: normal;"&gt;&lt;b&gt;Experimental &lt;u&gt;Design -&amp;nbsp;&lt;/u&gt;&lt;/b&gt;&lt;/span&gt;&lt;u&gt;Experiment 1&lt;/u&gt;: Altering the CBC latches&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The workload consisted of 20 processes running queries in which all the blocks reside in the buffer cache. This created a massive CPU bottleneck with an OS CPU run queue consistently between 12 and 20 with the CPU utilization pegged at 100%.&lt;br /&gt;&lt;br /&gt;I was disappointed I could not simply reduce the number of CBC latches to as low as I wanted. This would allow a very nice trend to develop...but Oracle is not interested in my experiments...they want to ensure a DBA does not make a CBC change that will clearly, as we'll see, hurt performance.&lt;br /&gt;&lt;br /&gt;For this experiment I altered the number of CBC latches (parameter, &lt;b&gt;_db_block_hash_latches&lt;/b&gt;); 1024 (minimum Oracle would allow), 2048, 4096, 8192, 16384, and 32768. For each CBC latch setting I gathered 90 samples at 180 seconds each. This means 540 samples were gathered for a total of 97,200 seconds...but this does not include the instance restart time, stabilization time, etc. ...I gathered lots of samples!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-weight: normal;"&gt;&lt;b&gt;Experimental &lt;u&gt;Design -&amp;nbsp;&lt;/u&gt;&lt;/b&gt;&lt;/span&gt;&lt;u&gt;Experiment 2&lt;/u&gt;: Altering both CBC latches and Cache Buffer Chains&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The workload consisted of 12 processes running queries in which all the blocks reside in the buffer cache. This created a massive CPU bottleneck with an OS CPU run queue consistently between 5 and 12 with the CPU utilization hovering around 94% to 99%. The bottleneck intensity was not nearly as severe as in &lt;i&gt;Experiment &lt;/i&gt;1 and probably more realistic then the Experiment 1 bottleneck.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;By setting the number of cache buffer &lt;i&gt;chains&lt;/i&gt; below 1024 (parameter, &lt;b&gt;_db_block_hash_buckets&lt;/b&gt;), I was also able to set the number of CBC latches below 1024 (parameter, &lt;b&gt;_db_block_hash_latches&lt;/b&gt;).&amp;nbsp;It turns out that Oracle will respect the number of chain settings (although it will round up to the next power of two) and then allow me to set the number of CBC latches to the number of chains. For example, if I set the number of chains and latches to 100, when the instance is restarted the actual number of chains and latches will be 256. And&amp;nbsp;if I set the number of chains and latches to 300, when the instance is restarted the actual number of chains and latches will be 512.&lt;br /&gt;&lt;br /&gt;For this experiment I altered the number of CBC latches and chains to; 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, and 65536. For each CBC latch setting I gathered 60 samples at 180 seconds each. Which means 540 samples were gathered for a total of 97,200 seconds...but this does not include the instance restart time, stabilization time, etc. ...again, I gathered lots of samples.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Experimental Results &lt;u&gt;Summary - Experiment 1&lt;/u&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;/b&gt;With each increase in the number of latches, both the CPU and wait time &lt;i&gt;per buffer get&lt;/i&gt; decreased. (The decrease is statistically significant with an alpha of 0.05.) While the CPU time per buffer get decreased around 4%, the wait time decreased from 19% to 55% resulting in a total time per buffer get decrease from 11% to 28%.&lt;br /&gt;&lt;br /&gt;To keep things simple and to the point, I only included four figures below. You can view/download the entire statistical analysis, which contains around 30 graphs and all data samples by clicking on the "view PDF for Experiment 1" link in &lt;i&gt;The Downloads&lt;/i&gt; section above. Here are the details...&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-2yPSP1X_Ft4/Tgui9mwyyTI/AAAAAAAAAPQ/Ku4cWg3ndMA/s1600/Table+latches+basic+stats.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="77" src="http://3.bp.blogspot.com/-2yPSP1X_Ft4/Tgui9mwyyTI/AAAAAAAAAPQ/Ku4cWg3ndMA/s400/Table+latches+basic+stats.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 1.&lt;/b&gt;&lt;/div&gt;Figure 1 above is a statistical summary for the experimental results. Here is a quick description of the columns.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;CBC latches&lt;/b&gt; is the number of latches during the sample gathering.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Avg L&lt;/b&gt; is the average number of buffer gets processed per millisecond. &lt;b&gt;L&lt;/b&gt; stands for &lt;i&gt;Lambda&lt;/i&gt;, which is the common symbol for the arrival rate.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Avg St&lt;/b&gt; is the average CPU consumed per buffer get processed. This is also called the service time, hence St. This is calculated as the total number CPU consumed divided by the total number of buffer gets during the sample interval of 180 seconds.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Avg Qt&lt;/b&gt; is the average non-idle wait time per buffer get. This is also called the queue time, hence Qt. This is calculated as the total non-idle wait time divided by the total number of buffer gets during the sample interval of 180 seconds.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Avg Rt&lt;/b&gt; is the time to process a single buffer get. This is also called the response time, hence Rt. It is calculated as simply the service time plus the queue time, that is, the CPU time plus the non-idle wait time per buffer get.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-fBsCBsX8DZM/TgujMYNAGnI/AAAAAAAAAPU/wDgQeLd7pd8/s1600/Plot+Rt+vs+latches.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="201" src="http://2.bp.blogspot.com/-fBsCBsX8DZM/TgujMYNAGnI/AAAAAAAAAPU/wDgQeLd7pd8/s320/Plot+Rt+vs+latches.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 2.&lt;/b&gt;&lt;/div&gt;Figure 2 above shows the CPU time (blue line) and the wait time added to that (red-like line) per buffer get versus the number of latches. While there is an initial drop in the CPU time per buffer get, the only significant decrease occurs when going from 1024 latches to 2048 latches. In this experimental system, Oracle was not able to achieve additional efficiencies by increasing the number of CBC latches. As I mentioned above, during the entire experiment the average CPU utilization was 100% and the run queue was usually well above 12.... which is over 3X the number of CPU cores!&lt;br /&gt;&lt;br /&gt;Figure 3 below is a &lt;i&gt;response time graph&lt;/i&gt; based on our experimental data (shown in Figure 1 above) integrated with queuing theory. The three plotted points are based entirely on our sample data's arrival rate (buffer get per ms, column &lt;b&gt;Avg L&lt;/b&gt;) and response time (CPU time and wait time ms per buffer get, column &lt;b&gt;Avg Rt&lt;/b&gt;) for 1024 latches (blue point), 2048 latches (red point), and 4096 latches (orange point). When we integrate key Oracle performance metrics with queuing theory, we can create the classic response time curve, which is what you see in Figure 3 below.&amp;nbsp;(This is one of the topics, including constructing the below graph, we delve into in my&amp;nbsp;&lt;b&gt;&lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;&lt;/b&gt;&amp;nbsp;course. This is also introduced in the last chapter of my book, &lt;b&gt;&lt;a href="http://resources.orapub.com/Oracle_Performance_Firefighting_Book_p/ff_book.htm"&gt;Oracle Performance Firefighting&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;In contrast to the typical "big bar" graph which shows total time over an interval or snapshot, the response time graph shows the &lt;i&gt;time related to complete a single unit of work&lt;/i&gt;.&amp;nbsp;In our case, the single unit of work is a single buffer get. The response time is the sum of both the CPU time and the wait time to process a single buffer get.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-zaX9MkS1A2s/Tg3X9iVOaCI/AAAAAAAAAPc/Izcje6GYpY4/s1600/CBC+Rt+Compare+SS.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="195" src="http://2.bp.blogspot.com/-zaX9MkS1A2s/Tg3X9iVOaCI/AAAAAAAAAPc/Izcje6GYpY4/s320/CBC+Rt+Compare+SS.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 3.&lt;/b&gt;&lt;/div&gt;Notice that the CPU time per buffer get only significantly drops from the blue line to the red line. This is easier to see when looking at the flat lines near the left of the graph and when zooming into the graph. The much larger response time drop occurs because the wait time per buffer get decreases. Therefore, based on our experimental data, a small decrease in CPU per buffer get results into much larger decrease in the wait time and also the resulting response time.&amp;nbsp;While the details are out of scope for this blog entry, queuing theory states that when the CPU per unit of work decreases, the response time curve drops and shifts to the right... and this is exactly what we see in Figure 3!&lt;br /&gt;&lt;br /&gt;Also, notice that the blue dot is further to the left then both the red and orange dots. If the workload did not increase when the number of latches was increased, the response time improvement would have been much more dramatic. However, the system was allowed to stabilize and in this case, more worked flowed through the system. This workload increase&amp;nbsp;diminishes the response time improvement...&amp;nbsp;which probably makes the business very happy and possible the users as they "get more work done."&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-klBrQGPN1TY/TgujR1QrcdI/AAAAAAAAAPY/4ub9u2F0cWI/s1600/Hist+by+Rt.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="171" src="http://2.bp.blogspot.com/-klBrQGPN1TY/TgujR1QrcdI/AAAAAAAAAPY/4ub9u2F0cWI/s320/Hist+by+Rt.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 4.&lt;/b&gt;&lt;/div&gt;Figure 4 above is simply a histogram containing the response times for all 90 samples for each latch sample sets (1024, 2048, etc.). The far left histogram is related to 32768 latches and the far right histogram is related to 1024 latches. While sample set five (16384 latches, counting right to left) does not look that different from sample set four (8192 latches), statistically there is a difference. And as you might expect then, there is a statistically significant difference between each sample sets CPU time plus wait time per buffer get. If you really want to dig into the statistics, which is actually pretty cool for this experiment,&amp;nbsp;click on the "view PDF for Experiment 1" link in&amp;nbsp;&lt;i&gt;The Downloads&lt;/i&gt;&amp;nbsp;section above&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;Experimental Results &lt;u&gt;Summary - Experiment 2&lt;/u&gt;&lt;/b&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;&lt;/b&gt;With each increase in the number of cache buffer latches &lt;i&gt;and&lt;/i&gt; chains,&amp;nbsp;all but the final test comparing 32656 and 65536 latches (and chains) clearly show&amp;nbsp;both the CPU and wait time per buffer get decreased. (The decrease is statistically significant with an alpha of 0.05.) And the decrease is dramatic. Especially when the number of chains and latches are relatively low.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;For me, Experiment 2 is more dramatic (and more personally satisfying) then &lt;i&gt;Experiment 1&lt;/i&gt;. The experimental setup is also different. First, I reduced the number of load processes from 20 to 12. While there was still a severe and clear CPU bottleneck and intense CBC latch contention, it wasn't nearly as ridiculously intense as in &lt;i&gt;Experiment 1&lt;/i&gt;. Second, I was also able to reduce the number of CBC latches down to 256. This allows us to see the impact of adding latches when there are initially relatively few.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Why the number of cache buffer (CB) chains is related to performance.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;It's actually pretty simple. The cache buffer chain structure is accessed to determine if a block exists in the buffer cache. Therefore, each block cached in the buffer cache must be represented in the cache buffer chain structure. Oracle chose a hashing algorithm and associated memory structure to enable extremely consistent fast searches (usually). To drastically simplify, the hashing structure includes a number of chains. Since the number of buffers in the buffer cache is constant (let's forget about Oracle's ability to dynamically resize the buffer cache...) as the number of chains increase, then the chain length decreases (on average). Therefore, one way to increase the chain length and &lt;i&gt;also increase search time&lt;/i&gt;&amp;nbsp;(bad for performance) is to decrease the number of chains. Of course you would never do this in a production system, without a very, very good reason.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;And it follows that increasing the number of chains will decrease search time, because the average chain length will be shorter. This is true and Oracle Corporation knows it. So much so, the default number of chains is greater than the number of buffers, resulting in an average chain length of less than one. However, if we severely limit the number of CB chains causing longer chains and concurrency issues, like I have done in this experiment, as the number of chains is increased we should see a significant performance improvement.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Additionally, keeping one CBC latch per chain ensures processes will not be competing for different chains protected by the same CBC latch! This is called &lt;i&gt;false contention&lt;/i&gt;.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;If you would like to visually see how the CBC structure works and also the buffer cache in general, I have created a free interactive visual tool you can experiment with.&amp;nbsp;&lt;b&gt;&lt;a href="http://resources.orapub.com/SearchResults.asp?Search=bc+visual"&gt;You can download it here&lt;/a&gt;&lt;/b&gt;. It's pretty cool! I also blogged about the initial release (think of this as the user guide) &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2010/09/buffer-cache-visualization-and-tool.html"&gt;here&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Now let's go a little deeper and relate the experimental results with the architecture and memory structures.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;To keep things simple and to the point, I only included four figures plus two others for those who want a little more detail. You can view/download the entire statistical analysis, which contains well over 30 graphs and all data samples&amp;nbsp;clicking on the "view PDF for Experiment 2" link in&amp;nbsp;&lt;i&gt;The Downloads&lt;/i&gt;&amp;nbsp;section above.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-hO5gpSlH8Qc/Th78P33MYAI/AAAAAAAAAPg/veS_Cm_5nas/s1600/E2+Table+Latches+basic+stats.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="82" src="http://2.bp.blogspot.com/-hO5gpSlH8Qc/Th78P33MYAI/AAAAAAAAAPg/veS_Cm_5nas/s320/E2+Table+Latches+basic+stats.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;Figure 5.&lt;/b&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Figure 5 above is a statistical summary for the experimental results.&amp;nbsp;Figure 5 clearly shows that with each increase (except the last) in CB chains and latches, both the CPU time and wait time per buffer get decrease. However, it is also equally obvious the improvement diminishes as the number of CB chains and latches increase. On my experimental system, the performance improvement while statistically significant, pretty much becomes a non-issue once there are 4096 CB latches and chains. Interestingly, with the 1.5GB buffer cache I used, without my specifically setting the number of CBC latches and chains, Oracle automatically created 8192 CBC latches and 262144 CB chains. Not bad!&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-WHPhPteyEag/Th78wEvFF5I/AAAAAAAAAPk/y9sy6_GtkgI/s1600/E2+Plot+Rt+vs+latches.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="199" src="http://3.bp.blogspot.com/-WHPhPteyEag/Th78wEvFF5I/AAAAAAAAAPk/y9sy6_GtkgI/s320/E2+Plot+Rt+vs+latches.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;Figure 6.&lt;/b&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Figure 6 above shows the CPU time (blue line) and the wait time added to that (red-like line) per buffer get versus the number of latches; this is the response time per buffer get. I did not include the last two sample sets because it diminishes our view of the key area of the graph (far left). There was a dramatic drop in both CPU consumption and wait time per buffer get (i.e., response time) going from 256, 512, and to 1024 latches and buckets. But then the benefit quickly diminishes, yet still significant until we have more than 4096 latches and buckets.&lt;br /&gt;&lt;br /&gt;In this experimental system, Oracle was clearly able to achieve additional efficiencies by increasing the number of CBC latches up to 4096 latches and buckets.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;Figure 7 below is a response time graph based on our experimental data (Figure 5) and queuing theory. I explained this graph in some detail above related to Figure 3. In &lt;i&gt;Experiment 2&lt;/i&gt;, the blue line is based our 256 latches and chains data, the red line on our 512 latches and chains data, and the orange line is based on 1024 latch and chain data. The results are similar to &lt;i&gt;Experiment 1&lt;/i&gt; in pattern, but the drop in the CPU time per buffer get, the drop in the wait time per buffer get and the resulting workload when the system stabilized is much more dramatic. I suspect this is due to the fact that the initial number of latches was significantly lower (256 compared to 1024), the number of chains was significantly lower (256 compared to probably over 100,000), and the load was not so ridiculously intense as in &lt;i&gt;Experiment 1&lt;/i&gt;.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-9XPhVvf5gj0/Th7_dkxm8wI/AAAAAAAAAPs/cV5ni7zNZdI/s1600/E2+CBC+Rt+Compare+SS.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="197" src="http://3.bp.blogspot.com/-9XPhVvf5gj0/Th7_dkxm8wI/AAAAAAAAAPs/cV5ni7zNZdI/s320/E2+CBC+Rt+Compare+SS.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;Figure 7.&lt;/b&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Looking closely at Figure 7 above, notice that for each increase in latches and buckets (going from the blue, red, to orange lines) while the CPU consumption &lt;i&gt;and&lt;/i&gt; wait time per buffer get (i.e., response time) continued to decrease (the plotted point) while the arrival rate (i.e., the workload and labeled&amp;nbsp;&lt;b&gt;Avg L&lt;/b&gt;&amp;nbsp;in Figure 5) continued to increase. This means the system was able to process more work AND process it faster! From my years of consulting, this is very characteristic of what happens in real production systems.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-efiz5fZ_9WI/Th8wkih_OYI/AAAAAAAAAP8/D674ZS5N5cg/s1600/E2+Hist+Rt+Side-by-side.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="181" src="http://3.bp.blogspot.com/-efiz5fZ_9WI/Th8wkih_OYI/AAAAAAAAAP8/D674ZS5N5cg/s320/E2+Hist+Rt+Side-by-side.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;b&gt;Figure 8.&lt;/b&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Figure 8 above is simply a histogram containing all 90 samples for each of the nine latch sample sets (going right to left: 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, and 65536). You can easily see the dramatic difference in response time and their diminishment as the number of latches and chains increases! All the final response time set (65536 latches and chains) differed significantly from the previous lower latch setting. (details below)&lt;br /&gt;&lt;br /&gt;If you really want to dig into the statistics, which is actually pretty cool for this experiment, click&amp;nbsp;on the "view PDF for Experiment 2" link in&amp;nbsp;&lt;i&gt;The Downloads&lt;/i&gt;&amp;nbsp;section above.&lt;br /&gt;&lt;br /&gt;Significance Details (for those who can't get enough)&lt;br /&gt;&lt;br /&gt;If you are fascinated with statistical significance testing or just plain confused about how a P-Value relates to reality, take a look at the two figures below; Figure 9 and Figure 10. For background, if two sample sets match exactly (basically comparing a sample set to itself) and a significance test is performed, the resulting P-Value will be 1.0. If the P-Value is less than 0.05 (a standard threshold or cutoff value) then we will say there is a statistically significant difference between our sample sets. Also, while many of the sample sets were not normally distributed, the response times for sample set 7, 8, and 9 (16384, 32768, and 65536 latches and chains respectively) are indeed normally distributed. &lt;i&gt;Mathematica&lt;/i&gt; automatically chose the &lt;i&gt;T test&lt;/i&gt; for the significant test (specially what it calls, K-Sample T).&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-djBd8erb_ac/Th8BKRmCljI/AAAAAAAAAP4/VMjjcUiEigw/s1600/E2+Hist+Rt+Side-by-side.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="96" src="http://3.bp.blogspot.com/-djBd8erb_ac/Th8BKRmCljI/AAAAAAAAAP4/VMjjcUiEigw/s320/E2+Hist+Rt+Side-by-side.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 9. &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Figure 10.&lt;/b&gt;&lt;/div&gt;Both Figure 9 and Figure 10 are smoothed histograms of different sample sets based on the experimental data's CPU time plus wait time per buffer get, that is, the response time (ms/lio). The difference between the two figures is Figure 9 is based on the response time samples from sample set 7 and 8 (16384 and 32657 CBC latches and&amp;nbsp;chains) and in contrast, Figure 10 is based on the response time samples from sample set 8 and 9 (32657 and 65536 CBC latches and chains). We can visually see there is a clear difference between sample set 7 and 8 (Figure 9) but there does not appear to be a difference between sample set 8 and 9 (Figure 10). But visual analysis alone can bring us to an incorrect conclusion.&lt;br /&gt;&lt;br /&gt;In our two cases, the visual comparison leads us to the &lt;i&gt;correct&lt;/i&gt; conclusion because the P-Value comparing&amp;nbsp;sample set 7 and 8 (Figure 9) is 0.0000 whereas the P-Value comparing sample set 8 and 9 (Figure 10) is .64605. &amp;nbsp;The P-Value for Figure 9 clearly is below our 0.05 cut off and is therefore our sample sets are deemed different. However, the P-Value for Figure 10 is clearly above our 0.05 cut off and therefore statistically there is no real difference between sample set 8 and 9. If you look closely at Figure 5, Figure 6, Figure7, and Figure 8 above we can also infer this, but doing the actual significance test is always better... especially if you're going to stand up in front of management and make a statement.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Just "Give me the facts!" summary...bottom line&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;1. A CPU bottlenecked CBC constrained Oracle system's performance will likely be improved by increasing the number of CBC latches beyond Oracle's default values... up to a point.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The bottom line for my experimental system is this: With a clear and very intense CPU bottleneck along with severe CBC latch contention, the CPU time and non-idle wait time per buffer get decreased as the number of CBC latches increased and also as the number of CB latches and chains increased... up to a point.&lt;br /&gt;&lt;br /&gt;The bottom line for any Oracle system is this:&amp;nbsp;With a clear and very intense CPU bottleneck along with severe CBC latch contention, the CPU time and non-idle wait time per buffer get decreased as the number of CBC latches was increased beyond the Oracle default value. But the benefit diminished as the number of CBC latches continued to increase.&amp;nbsp;This statement is based on my experiments and field observations and is in no way a performance guarantee.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;2. Performance improves because&amp;nbsp;when we add CBC latches, Oracle does not have to spin as many times (CPU consumption reduced) and/or sleep as many times (wait time reduced) when acquiring a CBC latch.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;3. If a SQL statement is constrained by CBC latch contention and buffer gets, as the time to process buffer get decreases then so will the SQL statement elapsed time.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;4. Oracle tuning is required for optimal performance.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This experiment should reaffirm that tuning Oracle can make a significant performance difference. Of course tuning SQL, running SQL less often, and increasing OS capacity can increase performance. But if you want &lt;i&gt;optimal&lt;/i&gt; performance, with very little effort in this case I would argue, you need to be able to tune Oracle as well.&lt;br /&gt;&lt;br /&gt;I hope you enjoyed this blog entry as much as I did when creating it!&lt;br /&gt;&lt;br /&gt;Thanks for reading,&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If you enjoyed this blog entry, I suspect you'll get a lot out of my courses; &lt;b&gt;&lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt;&lt;/b&gt; and &lt;b&gt;&lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;&lt;/b&gt;. I teach these classes around the world multiple times each year. For the latest schedule, click here. I also offer on-site training and consulting services.&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond with a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub.general@gmail .com.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-1931454229838943604?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/1931454229838943604/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/07/cbc-latches-cpu-consumption-and-wait.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/1931454229838943604'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/1931454229838943604'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/07/cbc-latches-cpu-consumption-and-wait.html' title='CBC latches, CPU consumption, and wait time'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-2yPSP1X_Ft4/Tgui9mwyyTI/AAAAAAAAAPQ/Ku4cWg3ndMA/s72-c/Table+latches+basic+stats.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-5498395531656179804</id><published>2011-06-23T06:00:00.000-07:00</published><updated>2011-06-23T06:00:22.674-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='consistent changes'/><category scheme='http://www.blogger.com/atom/ns#' term='non-parametric significant test'/><category scheme='http://www.blogger.com/atom/ns#' term='kruskal-Wallis'/><category scheme='http://www.blogger.com/atom/ns#' term='read consistency'/><category scheme='http://www.blogger.com/atom/ns#' term='consistent reads'/><category scheme='http://www.blogger.com/atom/ns#' term='chain of undo'/><category scheme='http://www.blogger.com/atom/ns#' term='undo'/><category scheme='http://www.blogger.com/atom/ns#' term='consistent gets'/><title type='text'>Impact of Consistent Reads</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Purpose&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For sure there is a cost for read consistency. But is that cost significant? Can users feel the difference? Does it require significant computing resources? Like many things in Oracle, perhaps this entire discussion is more academic and just makes for interesting presentations and internals discussions... Delving in these issues is what this blog entry is all about!&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Some Internals Background&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you've studied Oracle internals you will eventually learn that by default, query results reflect the committed state of the database system when the query started. (Oracle can also provide this capability at the transaction level.) This implies that if a needed buffer has a "change date" (think: SCN)&amp;nbsp;&lt;b&gt;AFTER&lt;/b&gt; the query started or &amp;nbsp;&lt;b&gt;BEFORE&lt;/b&gt; the query started yet contains uncommitted data,&amp;nbsp;Oracle must rebuild an image of the buffer containing committed data consistent with the query start time.&lt;br /&gt;&lt;br /&gt;But it can get more complicated. In addition to the above, Oracle can cache multiple images of blocks, representing the block at various times (think: SCN) in both the buffer cache and a server process's PGA. Oracle can decide to use any appropriate buffer to fulfill a read consistent query. This is important to understand, because it is common to see consistent read activity when the data you are querying has been committed well before the query begin or even more surprising after the instance has been cycled! This is why you can see &lt;b&gt;cr&lt;/b&gt; (consistent read) activity when running a &lt;b&gt;tkprof&lt;/b&gt; even though you know there is no dml occurring related to the table. If you really want to dig into this, start by looking at a handful of &lt;b&gt;&lt;a href="http://download.oracle.com/docs/cd/E11882_01/server.112/e17110/stats002.htm#i375475"&gt;instance statistics&lt;/a&gt;&lt;/b&gt; starting with "consistent".&lt;br /&gt;&lt;br /&gt;I need to interject a couple definitions. The rebuilt or copied buffers are commonly called cloned buffers because they are a clone of the buffer at a point in time. When a cloned buffer is &lt;i&gt;created&lt;/i&gt;, at a minimum, the instance statistic&amp;nbsp;&lt;b&gt;consistent change&lt;/b&gt;&amp;nbsp;and &lt;b&gt;consistent get&lt;/b&gt;&amp;nbsp;are incremented. If an already buffered clone is accessed the &lt;b&gt;consistent get&lt;/b&gt; statistic is increased and probably other consistent related instance statistics.&lt;br /&gt;&lt;br /&gt;Even with multiple Oracle instances, every block that is buffered has a single changeable latest and greatest version in the entire database system, called the current buffer or the &lt;b&gt;cu&lt;/b&gt; buffer. If you have done a &lt;b&gt;tkprof&lt;/b&gt; and see values in the &lt;b&gt;cu&lt;/b&gt; column you know your SQL has accessed &lt;b&gt;cu&lt;/b&gt; buffers.&lt;br /&gt;&lt;br /&gt;The performance implications of clones are massive. For example and to summarize, to make a clone a free buffer must be secured, the &lt;b&gt;cu&lt;/b&gt; buffer is copied into cloned (&lt;b&gt;cr&lt;/b&gt;) buffer,&amp;nbsp;the most recent buffer chance undo address is retrieved from the clones'&amp;nbsp;&lt;b&gt;itl&lt;/b&gt;, and the server process checks is to see if the undo block is in the buffer cache. If the undo block is not in the buffer cache, the server process must find a free buffer for it, make an IO request (and probably wait a bit), place the block into the buffer cache, update various internal structures, and finally apply the undo to the cloned buffer. Now suppose after applying the undo to the cloned buffer it is discovered that additional undo is required! Then this process cycles again and potentially again and again. This "again and again" is following what is called a "chain of undo."&lt;br /&gt;&lt;br /&gt;So you can see that creating a lot of &lt;b&gt;cr&lt;/b&gt; buffers will increase both CPU and IO requirements. There is a cost for read consistency. But is that cost significant? Or perhaps its more academic and just makes for interesting presentations and internals discussions... Let's find out!&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Experimental Design&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;You can download the entire script that is somewhat documented &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CR_Analysis/CR_AnalysisScript1a.txt"&gt;HERE&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;The experiment is actually quite simple. I created and loaded a table with 18M rows and committed. (The actual experimental 18M table did have DML and commits against it before the experiment began.) To buffer the data, I ran a full table scan query on the table five times. I then took my initial sample and called this my baseline sample, since updates have not yet occurred. The baseline&amp;nbsp;query required no&amp;nbsp;&lt;b&gt;cr&lt;/b&gt;&amp;nbsp;buffers as a result of uncommitted data or because of buffers changing after the query started and then clones had to be created. However, as I mentioned in the internals discussion above, Oracle can keep older versions of the buffer and when they are accessed they count as consistent reads. This indeed occurred because after an update occurred, the session committed (and even the instance cycled), Oracle recorded consistent read activity when the table was queried.&lt;br /&gt;&lt;br /&gt;Before the second, third, and fourth loops were run, in a different Oracle session I updated 1M rows twice, but did not commit. It's important to notice the same 1M are being updated twice. This will likely force a "chain of undo" which will intensify read consistent activity.&lt;br /&gt;&lt;br /&gt;As you can see below, when I "run the query" I actually run the full stats collection and actual query 60 times, providing me with 60 samples for each of the four loops.&lt;br /&gt;&lt;br /&gt;I am expecting to see an increasingly larger elapsed time gap between the baseline query and each subsequent query.&lt;br /&gt;&lt;br /&gt;So the statistics collection and query looping looks something like this:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;load data...(18M rows)&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;cache data: select sum(update_seq) from rc_test; 5X&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Times; white-space: normal;"&gt;&lt;pre style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;/pre&gt;&lt;div&gt;&lt;code&gt;&lt;span class="Apple-style-span" style="font-family: Times;"&gt;&lt;/span&gt;&lt;/code&gt;&lt;code&gt;&lt;span class="Apple-style-span" style="font-family: Times;"&gt;&lt;div&gt;&lt;code&gt;&lt;span class="Apple-style-span" style="font-family: Times;"&gt;&lt;/span&gt;&lt;/code&gt;&lt;code&gt;&lt;span class="Apple-style-span" style="font-family: Times;"&gt;&lt;div&gt;&lt;/div&gt;&lt;/span&gt;&lt;/code&gt;&lt;/div&gt;&lt;/span&gt;&lt;/code&gt;&lt;/div&gt;&lt;/span&gt;for i in 1..4loop&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;  for sample in 1..60&lt;br /&gt;    gather time t0 performance data &lt;br /&gt;    select sum(update_seq) into bogus from rc_test;&lt;br /&gt;    gather time t1 performance data&lt;br /&gt;    calculate deltas and store&lt;br /&gt; &amp;nbsp;end loop&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;  pause this script and do the update in another session&lt;br /&gt;end loop&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;The performance data gathered was:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;time stamp&lt;/b&gt; based on the timestamp data type for maximum precision.&lt;/li&gt;&lt;li&gt;&lt;b&gt;instance CPU consumption.&lt;/b&gt; I want to know the impact on the entire instance, not just my session. The data source is the &lt;b&gt;v$sys_time_model,&lt;/b&gt;&amp;nbsp;statistics "DB CPU" and "background cpu time".&lt;/li&gt;&lt;li&gt;&lt;b&gt;instance non-idle wait time&lt;/b&gt;. Again, I wanted to know the impact on the entire instance, not just my session. The data source is &lt;b&gt;v$system_event&lt;/b&gt;.&lt;/li&gt;&lt;li&gt;&lt;b&gt;consistent gets&lt;/b&gt;. The number of &lt;b&gt;cr&lt;/b&gt; buffers accessed in the entire instance. My sessions where the only sessions connected as I was in a very controlled environment. The data source is &lt;b&gt;v$sysstat&lt;/b&gt; and the statistics is "consistent gets".&lt;/li&gt;&lt;/ul&gt;The update, which is performed in a different Oracle session, looks like this:&lt;/div&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;declare&lt;br /&gt;  cntr number;&lt;br /&gt;begin&lt;br /&gt;for cntr in 1..2&lt;br /&gt;loop&lt;br /&gt;  update rc_test&lt;br /&gt;  set    update_stamp=sysdate&lt;br /&gt;  where  rownum&amp;lt;1000000;&lt;br /&gt;end loop;&lt;br /&gt;end;&lt;br /&gt;/&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;You may notice I updated the same row set twice. As I stated above, this can cause a "chain of undo" to occur which increases consistent ready intensity.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Experimental Results (Summary)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The results are very clear, although a little more statistically tricky than I anticipated. First the results and then I'll provide some numeric and visual details.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;There was both a visual and significant statistical difference between the initial baseline query (sample set 1) and the each query run after the three update (sample sets 2, 3, 4). This means consistent read statements take longer to run. The non-baseline samples were at least 1.5X slower than our baseline! Consistent read activity clearly made an elapsed time impact that users will notice.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;There is also another key take away, that is not so apparent.&amp;nbsp;There is a real (statistically significant) difference between each of the samples. Not just between the baseline and the others but between each of the four sample sets. This means that each time I ran the 1M update twice, it caused a real slowdown in the query.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-VKMDZWUCTIk/Te0UD2twPeI/AAAAAAAAAOo/w-BpjpUsGVU/s1600/Results+Table.prn.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="96" src="http://1.bp.blogspot.com/-VKMDZWUCTIk/Te0UD2twPeI/AAAAAAAAAOo/w-BpjpUsGVU/s320/Results+Table.prn.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 1.&lt;/b&gt;&lt;/div&gt;While you can &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CR_Analysis/CR_Final_Results.txt"&gt;download all the raw experimental data here&lt;/a&gt;&lt;/b&gt;, Figure 1 shows some of the numeric results. The columns should be self explanatory with the exception of the "P Value". This the result of the statistical significance test (details in the following section). If the P Value was less than 0.05 then we can say there is a real difference between the baseline sample set and the other sample set. It's important to understand that the P Values are between the baseline sample and the other sample set. For example, between set 1 and set 2, set 1 and set 3, and set 1 and set 4.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-wNaK2Ea_qHE/Te0Z_7jdsAI/AAAAAAAAAOs/R_Wq_osRNGs/s1600/Elapsed+Hists.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="192" src="http://3.bp.blogspot.com/-wNaK2Ea_qHE/Te0Z_7jdsAI/AAAAAAAAAOs/R_Wq_osRNGs/s320/Elapsed+Hists.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 2.&lt;/b&gt;&lt;/div&gt;Figure 2 above is a smooth histogram of each of the four sample set's &lt;i&gt;elapsed times&lt;/i&gt;. Our baseline line sample set appears on the far left, then to the right set 2, set 3, and finally set 4. Visually it appear there is a clear and real difference between these sample sets...and the statistics agree!&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-Z6lGuAbof5E/Te0f1XE141I/AAAAAAAAAO0/oDpQlri3rK8/s1600/CR+Hist+2+3+4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="192" src="http://1.bp.blogspot.com/-Z6lGuAbof5E/Te0f1XE141I/AAAAAAAAAO0/oDpQlri3rK8/s320/CR+Hist+2+3+4.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Figure 3.&lt;/b&gt;&lt;/div&gt;Figure 3 above is a &lt;i&gt;consistent read&lt;/i&gt; histogram containing sample sets 2, 3, and 4. If you look closely at sample set 1's (our baseline) average consistent read value of 308390 it dwarfs in comparison to the other samples. So much so, when the histogram includes all four sample sets, because of the required scaling, the histogram is visually worthless. This implies that visually there is a real difference between the baseline sample set and the others. And as Figure 3 shows, clearly there is also a visual difference between each of the other sample sets as well...and the statistics agree!&lt;br /&gt;&lt;br /&gt;The actual statistical analysis was more difficult than it may appear. If you are interested, read the next section. Otherwise, just skip to the &lt;i&gt;Conclusion&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Experimental Results (Detailed)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;By looking at the numerical and visual experimental results, it appears that indeed when consistent&amp;nbsp;buffers are involved, query performance (our query to be exact) was indeed negatively impacted. But visually comparing values and histograms is not enough to make a strong statement. In fact, it's rather weak. So I needed to perform a statistical hypothesis test.&lt;br /&gt;&lt;br /&gt;My plan was perform a simple &lt;b&gt;&lt;a href="http://en.wikipedia.org/wiki/Student's_t-test"&gt;t-test&lt;/a&gt;&lt;/b&gt; comparing sample set 1 (our baseline) to sample set 2, set 1 to set 3, and set 1 to set 4. But the t-test has a key requirement which, our data clearly fails to demonstrate.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-XkP17wb6a1M/Te0iMeCLnnI/AAAAAAAAAO8/eCb-1Ns93ug/s1600/CR+BL+Hist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="120" src="http://3.bp.blogspot.com/-XkP17wb6a1M/Te0iMeCLnnI/AAAAAAAAAO8/eCb-1Ns93ug/s200/CR+BL+Hist.png" width="200" /&gt;&lt;/a&gt;&lt;a href="http://1.bp.blogspot.com/-GlvFk0yvmoM/Te0iHRtIt5I/AAAAAAAAAO4/bDIpPjsftrs/s1600/Elapsed+BL+Hist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="120" src="http://1.bp.blogspot.com/-GlvFk0yvmoM/Te0iHRtIt5I/AAAAAAAAAO4/bDIpPjsftrs/s200/Elapsed+BL+Hist.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 4. &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Figure 5.&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;As you'll recall, each sample set consists of 60 samples.&amp;nbsp;Figure 4 above is the histogram for our baseline consistent reads and Figure 5 is the histogram for our baseline elapsed times.&amp;nbsp;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;A key t-test requirement is our sample set must be normally distributed. Figure 4 looks extremely &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html"&gt;exponential&lt;/a&gt;&lt;/b&gt; and Figure 5 is some inverse normal distribution looking monstrosity-thing! Oh boy... talk about disappointment! After doing some research I found a non-parametric significance test needed to be performed. While I summarize this in the actual &lt;i&gt;Mathematica&lt;/i&gt; notepad (notepad file, pdf file) there is a very nice explanation of non-parametric significance testing &lt;b&gt;&lt;a href="http://www.statsoft.com/textbook/nonparametric-statistics"&gt;here&lt;/a&gt;&lt;/b&gt;.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Thankfully &lt;i&gt;Mathematica&lt;/i&gt; has built in functions for this type of significance testing. In fact, I can tell it to choose what it thinks to be the most appropriate statistical test and do the test. It choose the &lt;b&gt;&lt;a href="http://en.wikipedia.org/wiki/Kruskal-Wallis"&gt;Kruskal-Wallis test&lt;/a&gt;&lt;/b&gt;, which was one of the tests mentioned in the URL above.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;To show a real difference exists between the sample sets, the P-Value needed to be less than 0.05.&amp;nbsp;All the significance tests I performed resulted in a P-Value rounded to 0.0000. (I'm not joking.) The tests where between set 1 and set 2, set 1 and set 3, set 1 and set 4, and also between set 2 and set 3, and set 3 and set 4. For every test, statistically there is a real difference between the sample sets.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Just in case you're wondering, I too thought a consistent 0.0000 value was perhaps too good to be true. So I did a significance test between set 1 and set 1, which would have resulted in a P-Value of 1. And it did! So I'm comfortable with the results.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;I must say though, I am not a statistician but based on my research and &lt;i&gt;Mathematica&lt;/i&gt;, I believe I did the correct statistical significance test and represented the results correctly. If you think I make a mistake, please email me at OraPub's general email (orapub@comcast.net) and I'll be more then happy to make any required changes.&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Conclusion&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Many DBAs don't realize how resource intensive building and accessing consistent read buffers are. But once the Oracle internals is understood it becomes clear that intensive consistent read activity&amp;nbsp;can significantly increase both CPU and IO consumption, which results in an elapsed time increase. But is this really true and is there is an increase make a real performance difference?&lt;br /&gt;This experiment demonstrated that:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Consistent read buffer access significantly increases query elapsed time.&lt;/li&gt;&lt;li&gt;An increase in the "chain of undo" also significantly increases consistent read access and elapsed time.&lt;/li&gt;&lt;li&gt;The elapsed time difference could definitely be felt by a user as our simple test showed a 1.5X elapsed time increase.&lt;/li&gt;&lt;/ol&gt;One important and final comment: My experiment did not include the creation of the consistent read buffers (statistics &lt;b&gt;consistent changes&lt;/b&gt;), but only the access of already created cloned buffers. I ran a few other tests just to be sure the consistent read statistics did not include &lt;b&gt;consistent changes&lt;/b&gt;...in fact, it was usually zero. So if my experiments shows a significant elapsed time increase when the consistent read buffers have already been created, we know that creating and accessing consistent read buffers will result in even longer elapsed times.&lt;br /&gt;&lt;br /&gt;To minimize the impact of consistent reads, keep your SQL tuned, commit often (but often enough to cause frequent commit issues!), and try not to query blocks that another transaction is changing.&lt;br /&gt;&lt;br /&gt;Thanks for reading.&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If you enjoy my blog, I suspect you'll get a lot out of my courses; &lt;b&gt;&lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt;&lt;/b&gt; and &lt;b&gt;&lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;&lt;/b&gt;. I teach these classes around the world multiple times each year. For the latest schedule, &lt;b&gt;&lt;a href="http://training.orapub.com/default.asp"&gt;click here&lt;/a&gt;&lt;/b&gt;. I also offer on-site training and consulting services.&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub@comcast .net.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-5498395531656179804?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/5498395531656179804/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/06/impact-of-consistent-reads.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/5498395531656179804'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/5498395531656179804'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/06/impact-of-consistent-reads.html' title='Impact of Consistent Reads'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-VKMDZWUCTIk/Te0UD2twPeI/AAAAAAAAAOo/w-BpjpUsGVU/s72-c/Results+Table.prn.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-4114857148078325718</id><published>2011-06-10T07:45:00.000-07:00</published><updated>2011-06-10T07:45:36.377-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='requirements'/><category scheme='http://www.blogger.com/atom/ns#' term='utilization'/><category scheme='http://www.blogger.com/atom/ns#' term='cpu utilization'/><category scheme='http://www.blogger.com/atom/ns#' term='capacity'/><category scheme='http://www.blogger.com/atom/ns#' term='threads'/><category scheme='http://www.blogger.com/atom/ns#' term='v$osstat'/><category scheme='http://www.blogger.com/atom/ns#' term='vmstat'/><category scheme='http://www.blogger.com/atom/ns#' term='cores'/><title type='text'>Cores vs Threads...Part 3</title><content type='html'>&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Why This Is Very Important&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As CPU subsystems become more complex and employ various methods to utilize multiple CPU cores and multiple threads per core, determining the CPU consumption requirements, capacity, and the resulting utilization is not entirely clear. But it is important we understand what the utilization tells us about our system. As an Oracle DBA, Iʼm OK with not knowing the specific OS utilization calculations, but Iʼm &lt;i&gt;&lt;b&gt;not&lt;/b&gt;&lt;/i&gt; OK with blindly stating a utilization figure without understanding what that means in relation to performance. Hence my quest...&lt;br /&gt;&lt;br /&gt;This is the third and final (I hope) posting about differences in the operating system CPU utilization when determined by &lt;b&gt;vmstat&lt;/b&gt; or by using the&amp;nbsp;&lt;b&gt;v$osstat&lt;/b&gt; CPU core approach.&amp;nbsp;If you've been following this blog series&amp;nbsp;you know there can be a statistically significant utilization difference and if there is, it can increase as the CPU subsystem gets busier. Also, seven out of the eight samples from production Oracle systems that I analyzed (some of them extremely busy systems) showed no real difference between the utilization method. However, the one AIX sample (AG1) clearly showed a difference as the CPU utilization increased over 40%. Figure 1 below is the AG1 scatter plot of the utilizations versus the sample interval.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-aq5QGwdcKLk/Tef0wHKN8gI/AAAAAAAAAOY/qej3txfWNMk/s1600/AG1Scatter.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="221" src="http://1.bp.blogspot.com/-aq5QGwdcKLk/Tef0wHKN8gI/AAAAAAAAAOY/qej3txfWNMk/s320/AG1Scatter.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 1.&lt;/b&gt;&lt;/div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Background&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For your reference, the &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/04/core-vs-threadcpu-utilization-part-1.html"&gt;initial blog posting (April 22, 2011&lt;/a&gt;&lt;/b&gt;) presented general utilization, how to gather CPU utilization purely from v$ views, and then I stated that on a few occasions I have seen the utilizations from &lt;b&gt;vmstat&lt;/b&gt; and &lt;b&gt;v$osstat&lt;/b&gt; differ significantly. The &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/05/cores-vs-threads-util-differencespart-2.html"&gt;second blog posting (May 13, 2011)&lt;/a&gt;&lt;/b&gt;&amp;nbsp;presented the experimental results and the subsequent analysis from seven production Oracle systems. As mentioned above, all but one of the samples (AG1) showed no statistical difference between the utilizations (Oracle core based based vs &lt;b&gt;vmstat&lt;/b&gt; based). But I mentioned a concern I had; none of the samples were running with the utilization over 65%. Based on my comments a reader contacted me and ended up gathering data on his system that was running between 90% to 100% CPU utilization. The analysis of this data set (AB1) was posted in the &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/05/cores-vs-threads-util-differencepart2b.html"&gt;second "B" posting on May 26, 2011&lt;/a&gt;&lt;/b&gt;. Like most of the other samples, this very busy CPU subsystem showed no difference between the utilizations.&lt;br /&gt;&lt;br /&gt;This final posting (at this point I think it is anyways) is focused clearly on &lt;i&gt;how could&lt;/i&gt; &lt;b&gt;vmstat&lt;/b&gt; and Oracle core based CPU utilizations result in a different value. And if so, is this something to be concerned about? If this is something you're interested in...Read on!&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Deeper Into Requirements and Capacity&lt;/span&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;There is a very good reason why there can be a difference. But for my explanation to make any sense, we need to understand how utilization is calculated and especially so when threads are involved. Utilization is simply requirements divided by capacity. Mathematically this can be represented as:&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;U = R / C&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;It's important to understand that we are typically looking at a slice, interval, or snapshot of time. For example, 15 minutes or 1 hour. This requires two samples from our data source. Don't mean to insult anyone here, but the value we need is calculated by subtracting the initial value from the final value resulting in the delta or difference. This delta is what we typically (but not always) use in the calculation.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;Requirements.&lt;/b&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Requirements is simply how much time CPU resources are being expended, used, and consumed. It does not matter if the consumer is a thread or a process. If it is consuming CPU resources then this time counts as "busy time." We can see this "busy time" via&amp;nbsp;&lt;b&gt;v$osstat&lt;/b&gt;. It is also encapsulated in&amp;nbsp;&lt;b&gt;vmstat&lt;/b&gt;. On all L/Unix systems&amp;nbsp;(exception: HPUX)&amp;nbsp;we can see the busy time in the&amp;nbsp;&lt;b&gt;/proc&lt;/b&gt;&amp;nbsp;filesystem. I present this in my book, &lt;b&gt;&lt;a href="http://resources.orapub.com/Oracle_Performance_Firefighting_Book_p/ff_book.htm"&gt;Oracle Performance Firefighting&lt;/a&gt;&lt;/b&gt; but here is a nice &lt;b&gt;&lt;a href="http://www.linux.com/archive/feature/126718"&gt;link to an on-line source&lt;/a&gt;&lt;/b&gt;. Mathematically, we can represent the requirements as:&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;R = busy time &amp;nbsp;=&amp;nbsp;interval time X&amp;nbsp;number of CPU power consumers&lt;br /&gt;&lt;br /&gt;If we differentiate between cores and threads, requirements can be something like:&lt;br /&gt;&lt;br /&gt;R (thread based) = busy time = interval time X number of threads consuming CPU resources&lt;br /&gt;R (core based) &amp;nbsp;= busy time = interval time X number of cores consuming CPU resources&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;As DBAs it is not our decision whether the operating system reports busy time a specific way. Someone else made that decision for us. ...it is what it is. But as we'll see below, the decision makes a profound difference in the final calculated utilization---for both &lt;b&gt;vmstat&lt;/b&gt; and &lt;b&gt;v$osstat&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;Capacity.&lt;/b&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Capacity is the power available; 100 CPU seconds, 64 cores, 128 threads, etc. As with requirements, we typically need the power available over a period of time, that is, a time slice, interval, or snapshot of time. This is actually very easy to calculate and is simply the number of whatever is supplying the power (e.g., cores) multiplied by the snapshot interval time.&lt;br /&gt;&lt;br /&gt;Suppose over a one hour interval the CPU subsystem contains 8 cores and each core has 2 threads, for a total of 16 threads.&amp;nbsp;As with the requirements, we can represent the CPU power capacity from either a core or thread perspective.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;C = interval time X power supplying units (cores) = 60 minutes X 8 cores = 480 core minutes&lt;br /&gt;C = interval time X power supplying units (threads) = 60 minutes X 8 cores X 2 threads/core = 960 thread minutes.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;As with the requirement's busy time, as DBAs it is not our decision whether the operating system reports busy time a specific way. Someone else made the decision for us.&amp;nbsp;But as we'll see below, the decision makes a profound difference in the final calculated utilization---for both&amp;nbsp;&lt;b&gt;vmstat&lt;/b&gt;&amp;nbsp;and&amp;nbsp;&lt;b&gt;v$osstat&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;Dark Matter: Cores vs Threads&lt;/b&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;Before we combine requirements and capacity to derive utilization, it is important to understand there are differences in how CPU subsystems process at the core and thread level. It's even more important to understand how this occurs on your production systems.&lt;br /&gt;&lt;br /&gt;Suppose a process,&amp;nbsp;when run by itself on an idle system takes 30 seconds to complete. This is the "wall time" or "elapsed time." Here is an example bourne shell command sequence that places an incredibly intense CPU load on a single CPU core or thread.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;echo "scale=12;for(x=0;x&amp;lt;39999999;++x){1+1.243;}" | bc &amp;gt;/dev/null&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;On my Linux system, the above command takes about 78 seconds to complete.&lt;br /&gt;&lt;br /&gt;Now... Consider the elapsed time of the above command when a bunch of the commands are launched at the same time!&lt;br /&gt;&lt;br /&gt;If &lt;i&gt;cores&lt;/i&gt; are what is providing the true CPU power, then with a 2 core CPU subsystem we will observe this type relationship between the number of concurrently launched processes and elapsed time (seconds): (1,30), (2,30), (3,60), (4,60), (5,90), (6,90), (7,120), etc. Essentially, when a process completes, a core becomes available and the next process begins. This perfect elapsed time sequencing assumes the OS makes no optimizations.&lt;br /&gt;&lt;br /&gt;If &lt;i&gt;threads&lt;/i&gt; are what is providing the true CPU power, then with a 2 core but 2 threads/core (4 total threads) CPU subsystem we will observe this type of relationship between the number of concurrently launched processes and elapsed time (seconds): (1,30), (2,30), (3,30), (4,30), (5,60), (6,60), (7,60), (8,60), (9,90), etc. Essentially, when a processing thread completes a thread becomes available and the next process thread begins. Again, this perfect elapsed time sequencing assumes the OS makes no optimizations.&lt;br /&gt;&lt;br /&gt;When threads get involved,&amp;nbsp;the resulting elapsed times are not so straightforward and can be much more complicated to anticipate and&amp;nbsp;very, very operating system specific.&lt;br /&gt;&lt;br /&gt;One of my colleagues did some testing. (my ref: JB 9-May-2011)&amp;nbsp;He used an 8 core box with each core having 2 threads. His results showed the CPU subsystem was operating more core-based than thread-based. Why? Because when 8 processes that run serially in 30 seconds were simultaneously launched, they finished in about 30 seconds yet when 9 processes where simultaneously launched the elapsed time jump to 60 seconds...meaning all 16 threads were unable to truly process the more than 8 processes simultaneously. The only way he would know this, is to perform an actual test (more about this below.)&lt;br /&gt;&lt;br /&gt;However, an IBM employee that specializes in Oracle emailed me (my ref: DM 7-June-2011) and wrote, "SMT on Power Systems allows for true simultaneous execution of up to 4 SMT threads (on Power7) in the same clock cycle." An AWR report added some support up his well presented and thought out claim. The&amp;nbsp;&lt;b&gt;v$osstat&lt;/b&gt;&amp;nbsp;&lt;b&gt;busy_time&lt;/b&gt; statistic was clearly thread based because the &lt;b&gt;busy_time&lt;/b&gt; (285888 secs) was greater then the core based capacity (13194 core secs = 34.36 min X 60 sec/min X 64 cores). From a core-based perspective and on his system, there is no way 64 cores can provide more than 13193 seconds of CPU power over the 34.36 minute interval. Threads must be involved.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Complicated...but seriously practical and necessary.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;It can get even more complicated with virtual machines, vpars, lpars, and on and on. With all this complication it is easy to loose sight of our goal: gathering reliable and understandable values for OS CPU requirements, capacity, and utilization. If I can't do this, then I can not make a simple statement such as, "From an OS perspective, CPU utilization indicates the CPU subsystem is the bottleneck." So while this may seem pretty academic, it has serious practical performance management implications.&lt;br /&gt;&lt;br /&gt;The best way to tell what is occurring on your systems is to gather some performance data. Here's how...&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Gathering The Data&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I became so frustrated with all complications and possibilities of complications, it was obvious the only real way to get a firm grasp on the reality on a real system was to gather some data on a real system. So I created a basic shell script that tracked the relationship between the number of simultaneously launched processes and their final elapsed time. I also gathered and displayed the OS CPU utilization and CPU run queue (both based on &lt;b&gt;vmstat&lt;/b&gt;).&lt;br /&gt;&lt;br /&gt;Please.... I need to write this: Do not run this script on your production system if you care about production system performance. The script is designed to suck every bit of CPU power out of your database server. Running this on a test box with the same OS and CPU architecture as your production systems should produce the results you are looking for.&lt;br /&gt;&lt;br /&gt;While I wrote the script in Linux and &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP3/ScaleTest_sh.txt"&gt;you can view it on-line here&lt;/a&gt;&lt;/b&gt;, it would be a simple matter to make the &lt;b&gt;vmstat&lt;/b&gt; column parsing adjustments and potentially a few other things. The system I gathered data from a single 4 core CPU with no threads. Here's the Linux details:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;[oracle@fourcore ~]$ cat /proc/version&lt;br /&gt;Linux version 2.6.18-164.el5PAE (mockbuild@ca-build10.us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Thu Sep 3 02:28:20 EDT 2009&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-cHhmApF1T3Q/TfD30Jo8XGI/AAAAAAAAAPE/90DGEZQKHpE/s1600/coreVsThred+ListPlot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="184" src="http://4.bp.blogspot.com/-cHhmApF1T3Q/TfD30Jo8XGI/AAAAAAAAAPE/90DGEZQKHpE/s320/coreVsThred+ListPlot.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 2.&lt;/b&gt;&lt;/div&gt;Figure 2 above shows the elapsed time (or wall time) to complete X number processes when they are all launch simultaneously in the background. It is obvious that up to four processes complete at pretty much the same time (and Figure 3 below details the numeric results). However, once 5 processes are launched simultaneously, the elapsed time takes a significant jump. This indicates that the system I gathered has 4 CPU cores and no threads, and is "core powered." ...and it does have a single CPU with 4 cores and no threads.&lt;br /&gt;&lt;br /&gt;This is important to understand. Let's generalize this a bit:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Where C is the total number of CPU cores, a core power focused system will complete C number of processes launched simultaneously at pretty much the same time. Once the number of processes is greater than C, the all-process completion time will increase. (This is what we see occurring in Figure 2 above, where C is 4 and 5.)&lt;/li&gt;&lt;li&gt;Where T is the total number of CPU threads, a thread powered focused system will complete T number of processes launched simultaneously at pretty much the same time. Once the number of processes is greater than T, the all-process completion time will increase.&lt;/li&gt;&lt;li&gt;My observations have shown that CPU subsystems without threads tend to be completely core powered. (As you would expect and what we see in Figure 2.) However, CPU subsystems with threads can be either more core or more thread focused. This is when understanding what a utilization value means becomes more complicated.&lt;/li&gt;&lt;/ul&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-1modwGcfdEA/TfD5pqSxrEI/AAAAAAAAAPI/IEin9vdijLY/s1600/coreVsThreadGrid.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="189" src="http://1.bp.blogspot.com/-1modwGcfdEA/TfD5pqSxrEI/AAAAAAAAAPI/IEin9vdijLY/s320/coreVsThreadGrid.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 3.&lt;/b&gt;&lt;/div&gt;Figure 3 above shows the numerical experimental results. I performed a statistical significance test between the elapsed time sample sets when there were 4 processes launched simultaneously and when there were 5 processes launched simultaneously. Statistically they are indeed different. The significance tested was a little tricky because the elapsed time sample sets are not normally distributed. While I won't get into the details in this blog entry, if you are interested you can &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP3/coreVsThread1a.pdf"&gt;view the Mathematica notebook PDF output here&lt;/a&gt;&lt;/b&gt;...with all the details. You can also download the &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP3/coreVsThread1a.nb"&gt;actual Mathematica notebook here&lt;/a&gt;&lt;/b&gt;. The &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP3/cpuThreadTestResults.txt"&gt;raw experimental results can be downloaded here&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Why the Possible Utilization Difference&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you are still tracking with me (thanks for reading by the way), this next step should be simple. If requirements can be represented as either core-seconds or thread-seconds, and if capacity can be represented as core-seconds or thread-seconds, we have a simple two by two matrix. As long as both the requirements and capacity are core or thread based, we should be OK (if the OS can evenly distribute all the work). But if they don't match the utilization is going to be either under or over reported (at least from a DBA perspective).&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-l7BikxMmidE/Te_66cTzSWI/AAAAAAAAAPA/33zn2_58DHE/s1600/util+prob+maxtrix.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-l7BikxMmidE/Te_66cTzSWI/AAAAAAAAAPA/33zn2_58DHE/s1600/util+prob+maxtrix.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 4.&lt;/b&gt;&lt;/div&gt;Figure 4 above is based on a hypothetical system: Over a one hour interval, the 4 core system (each core has 2 threads) was busy from a core perspective at 1000 seconds but from a thread perspective 2000 seconds.&lt;br /&gt;&lt;br /&gt;The point of the Figure 4 matrix is not the busy time or the core or the threads. Rather, the point is if the requirements and capacity are not the same units, that is core or threads, the resulting utilization will be either under reported (e.g., 21%) or over reported (e.g., 83%), whereas the "true" utilization in Figure 4 is 42%.&lt;br /&gt;&lt;br /&gt;More practically, if I assume the &lt;b&gt;busy_time&lt;/b&gt; in &lt;b&gt;v$osstat&lt;/b&gt; is core based (but it is not) and calculate the capacity based on the number of cores, the resulting utilization will be higher then reality. In Figure 4 above, that would be 83% cell. This is what could have happened on the AIX system as shown in Figure 1. Very experienced AIX DBAs are likely to have observed this utilization difference.&lt;br /&gt;&lt;br /&gt;Sometimes it is easy to spot the problem. For example, one colleague I mentioned above (my ref: DM 7-June-2011) emailed me a partial AWR report from his AIX system. Over the 2062 second interval, the &lt;b&gt;busy_time&lt;/b&gt; was reported to be 285888 seconds and the CPU subsystem consisted of 64 cores with a total of 256 threads. The core base capacity is then 131968 core seconds (131968 = 2062 X 64)... but the busy time was only 285888 seconds, so the utilization is 217%. Woops! The thread based capacity is 528384 thread seconds (528384=2064 X 256). This means the thread based CPU utilization is only 54%. This is much more in line with&amp;nbsp;&lt;b&gt;&lt;a href="http://en.wikipedia.org/wiki/Operations_research"&gt;Operations Research&lt;/a&gt;&lt;/b&gt; reality and how the system was actually performing.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Conclusion&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When we see data as in Figure 1, where there is clearly a difference between &lt;b&gt;v$osstat&lt;/b&gt; core based utilization and some other tool (e.g., &lt;b&gt;vmstat&lt;/b&gt;) it is very likely threads are being used in the calculation. How threads are being used in the calculation could easily be different than how I presented, but my point is NOT to determine exactly how a tool like &lt;b&gt;vmstat&lt;/b&gt; or &lt;b&gt;sar&lt;/b&gt; calculates CPU utilization. My point is:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;There can be a real difference in the reported utilizations.&lt;/li&gt;&lt;li&gt;We need to understand what the OS reported CPU utilization means from a performance perspective.&lt;/li&gt;&lt;li&gt;We need to gather data from our real systems to understand the true CPU requirements, capacity, and utilization. If we do not do this, then stating a utilization figure becomes arbitrary and not all that useful.&lt;/li&gt;&lt;/ol&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Next Steps To Make This Practical&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first step is&amp;nbsp;to gather &lt;b&gt;v$osstat&lt;/b&gt; data and &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/04/core-vs-threadcpu-utilization-part-1.html"&gt;do the core based utilization math&lt;/a&gt;&lt;/b&gt;. If you see any real difference in CPU utilizations, then compute the utilization using &lt;b&gt;v$osstat&lt;/b&gt; data but using the &lt;i&gt;thread&lt;/i&gt; statistics. This this still doesn't match with &lt;b&gt;vmstat&lt;/b&gt;, then you will need run a test (like I did and showed in this blog entry above) which launch processes simultaneously in the background and measuring their wall time.&lt;br /&gt;&lt;br /&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If you enjoy my blog, I suspect you'll get a lot out of my courses; &lt;b&gt;&lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt;&lt;/b&gt; and &lt;b&gt;&lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;&lt;/b&gt;. I teach these classes around the world multiple times each year. For the latest schedule, &lt;b&gt;&lt;a href="http://training.orapub.com/default.asp"&gt;click here&lt;/a&gt;&lt;/b&gt;. I also offer on-site training and consulting services.&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond to a comment or you have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub@comcast .net.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-4114857148078325718?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/4114857148078325718/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/06/cores-vs-threadspart-3.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/4114857148078325718'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/4114857148078325718'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/06/cores-vs-threadspart-3.html' title='Cores vs Threads...Part 3'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-aq5QGwdcKLk/Tef0wHKN8gI/AAAAAAAAAOY/qej3txfWNMk/s72-c/AG1Scatter.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-3824502939251058417</id><published>2011-05-26T07:13:00.000-07:00</published><updated>2011-05-26T10:51:14.237-07:00</updated><title type='text'>Cores vs Threads: Util Difference...Part2B</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Purpose&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In my last blog entry, &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/05/cores-vs-threads-util-differencespart-2.html"&gt;Cores vs Threads: Util Difference...Part 2&lt;/a&gt;&lt;/b&gt; I analyzed seven data sets to discover if there is a difference between calculating OS CPU utilization based on &lt;b&gt;v$osstat&lt;/b&gt; CPU cores versus running &lt;b&gt;vmstat&lt;/b&gt;. All but one sample set (that AIX sample set, &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/Analysis_AG1.pdf"&gt;AG1&lt;/a&gt;&lt;/b&gt;) clearly showed there was no significant difference.&lt;br /&gt;&lt;br /&gt;But I had a concern. I wrote,&lt;br /&gt;&lt;br /&gt;&lt;i&gt;The only concern I have is none of the sample sets were gathered from a system with the CPU utilization greater than 65%. I would like to see some sample sets with a CPU constrained system. It is possible if the utilization differences are highly skewed and the residual slope is not flat (discussed below), the utilization difference (i.e., the gap) could become increasingly larger.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The week after I posted that entry, a willing participate (who I'll call &lt;i&gt;AB&lt;/i&gt;) gathered and sent me data from a production Oracle Solaris system that peaked at 100% CPU utilization! So I was very excited to analyze the results. You can view the entire results in a single PDF file by &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/Analysis_AB1.pdf"&gt;clicking here&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Results&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The results were even more pronounced than the other six "no utilization difference" sample sets; clearly there was no significant difference between the utilization calculations. Why no difference? Read on...&lt;br /&gt;&lt;br /&gt;The correlation between the two utilization sources is a jaw-dropping 0.99985. The statistical hypothesis test resulted in a p-value of 0.815 clearly exceeding our threshold of 0.05 and forcing us to accept the null hypothesis that the two sample sets are not statistically different.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/--TEIbqda66c/Td13UHbDCJI/AAAAAAAAAOI/mBGHHcblC6o/s1600/util+diff.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="221" src="http://4.bp.blogspot.com/--TEIbqda66c/Td13UHbDCJI/AAAAAAAAAOI/mBGHHcblC6o/s320/util+diff.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 1.&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Figure 1 above shows the Oracle utilization as red squares and the&amp;nbsp;&lt;b&gt;vmstat&lt;/b&gt; utilization as blue dots...but you can't see any blue dots because the Oracle red squares are in front of them...the utilizations are visually exactly the same! Even with the utilization at 90% and above, there is no visual difference...amazing.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-v_jnGYRCFCA/Td13zKbcSPI/AAAAAAAAAOM/uAKob6lNdTI/s1600/residual.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="209" src="http://1.bp.blogspot.com/-v_jnGYRCFCA/Td13zKbcSPI/AAAAAAAAAOM/uAKob6lNdTI/s320/residual.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 2.&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Figure 2 above is the residual graph plotting the utilization difference versus Oracle utilization. Notice the difference is usually less than 1% and just as important the error does not increase as the utilization increases (which is common when forecasting Oracle performance). In fact, the slope of the trend line is 0.00285....flat. What this means is detailed in the &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/05/cores-vs-threads-util-differencespart-2.html"&gt;previous blog entry&lt;/a&gt;&lt;/b&gt;; search for "trend line slope".&lt;/div&gt;&lt;br /&gt;The AB1 data set clearly showed there was no real difference in the utilization calculations.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is yet another data set demonstrating in many environments using&amp;nbsp;&lt;b&gt;v$osstat&lt;/b&gt;&amp;nbsp;and CPU cores to calculate CPU utilization is a valid alternative to running&amp;nbsp;&lt;b&gt;vmstat&lt;/b&gt;. The best way to determine if it's OK in your environment is to simply gather a little data and plot some points.&lt;br /&gt;&lt;br /&gt;I still would like to see more AIX data sets. I suspect (and the single AIX AG1 data set demonstrated this) the way AIX calculates utilization is different than using &lt;b&gt;v$osstat&lt;/b&gt; and CPU cores. And as the CPU subsystem gets busier, the utilization difference increases. More data is needed before I can say more about this though.&lt;br /&gt;&lt;br /&gt;Like I mentioned in my previous blog entry, I think the more intriguing question is &lt;i&gt;why can there be a difference in utilization calculations&lt;/i&gt;. I would have posted that entry his week, but teaching last week in Boston (&lt;b&gt;&lt;a href="http://filebank.orapub.com/classpics/archived/2011Boston/index.html"&gt;see pictures here&lt;/a&gt;&lt;/b&gt;) destroyed my voice and I was so tired at night I had no time to complete the entry...so stay tuned!&lt;br /&gt;&lt;br /&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;If you enjoy my blog, I suspect you'll get a lot out of my courses; &lt;b&gt;&lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;Oracle Performance Firefighting&lt;/a&gt;&lt;/b&gt; and &lt;b&gt;&lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt;&lt;/b&gt;. I teach these classes around the world multiple times each year. For the latest &lt;b&gt;&lt;a href="http://training.orapub.com/default.asp"&gt;schedule, click here&lt;/a&gt;&lt;/b&gt;. I also offer on-site training and consulting services.&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub@comcast.net.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-3824502939251058417?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/3824502939251058417/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/05/cores-vs-threads-util-differencepart2b.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/3824502939251058417'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/3824502939251058417'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/05/cores-vs-threads-util-differencepart2b.html' title='Cores vs Threads: Util Difference...Part2B'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/--TEIbqda66c/Td13UHbDCJI/AAAAAAAAAOI/mBGHHcblC6o/s72-c/util+diff.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-3050881483234294849</id><published>2011-05-13T11:43:00.000-07:00</published><updated>2011-06-08T06:54:13.015-07:00</updated><title type='text'>Cores vs Threads: Util Differences...Part 2</title><content type='html'>This is so typical... What I think will be a casual research project ends up being so much more. I hope you find this blog entry very enlightening and useful.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Purpose&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In part one of this series I mentioned in a few rare occurrences I have seen a utilization mismatch. That is, there is a difference between the utilization calculated using the busy time and CPU core count from &lt;b&gt;v$osstat&lt;/b&gt;&amp;nbsp;compared to the utilization gathered from the OS&amp;nbsp;&lt;b&gt;vmstat&lt;/b&gt;. While I have not seen this very many times, it does occur and so I wanted to do a more formal investigation to:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Check the difference is statistically different. (It could be the result of randomness.)&lt;/li&gt;&lt;li&gt;Is this something I need to be concerned about in my performance analysis work, and&lt;/li&gt;&lt;li&gt;Understand how a mismatch could be possible.&lt;/li&gt;&lt;/ul&gt;It's questions like these I wanted to gain a firmer grasp...based on real production Oracle data, not through a thought experiment, talking to the OS administrator or vendor, or reading from a blog or book. So here we go...&lt;br /&gt;&lt;br /&gt;This is a fairly long entry; full of experimental and analysis details. If you want to skip all the details, just scroll down to the final section, "Summary" and read only that.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Utilization References&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Throughout this blog entry when I write "Oracle utilization," I am referring to the average CPU utilization calculated over a time interval as follows:&lt;br /&gt;&lt;br /&gt;U = ( &lt;b&gt;busy_time&lt;/b&gt; delta from &lt;b&gt;v$osstat&lt;/b&gt; ) / ( interval time X CPU cores from &lt;b&gt;v$osstat&lt;/b&gt; )&lt;br /&gt;&lt;br /&gt;The CPU cores from &lt;b&gt;v$osstat&lt;/b&gt; can be determined by taking the largest non-thread CPU count-type statistic value. If cores are not mentioned, then use a CPU statistic. I know this is strange, but I've never seen this approach fail (yet).&lt;br /&gt;&lt;br /&gt;When I refer to "OS utilization," I am referring to the average CPU utilization gathered over a time interval using the OS command &lt;b&gt;vmstat&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;If these formulas seem strange, please refer to &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/04/core-vs-threadcpu-utilization-part-1.html"&gt;Part 1&lt;/a&gt;&lt;/b&gt; of this series.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Experiment&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To accomplish my goals I needed utilization related data from real Oracle production systems. So I created a data collection script and asked a bunch of DBAs to run it on their production systems.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Script.&amp;nbsp;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The script is nothing special. It's a simple shell script with some &lt;b&gt;awk&lt;/b&gt; and &lt;b&gt;greps&lt;/b&gt;, a little math and there is even a while loop. The inputs are the number of samples to collect and the snapshot interval time. I gathered and calculated the CPU utilization two different ways. First, based on the &lt;b&gt;busy_time&lt;/b&gt; and CPU cores from the data source&amp;nbsp;&lt;b&gt;v$osstat&lt;/b&gt;. Second, I also gathered the CPU utilization from the OS using &lt;b&gt;vmstat&lt;/b&gt;. The results for each sample were written to an output file and lots of details were written to a log file as well. I wrote the script to work on Linux. It is very well documented and explains what is likely to need modification to work on any Unix-like OS. Click to &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/cpuutilcompare.sh.txt"&gt;view the Linux&lt;/a&gt;&lt;/b&gt; version.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Participants and Data.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I sent emails to a couple folks who commonly will collect data for me from their production systems. I also sent about fifty other emails to some students from my recent classes (&lt;b&gt;&lt;a href="http://resources.orapub.com/OraPub_Class_Pictures_and_Videos_s/74.htm"&gt;pictures&lt;/a&gt;&lt;/b&gt;). From my thinking, I had a great response. As you personally know DBAs are extremely busy plus I'm asking them to run a script in one of their production systems...it's a lot to ask.&lt;br /&gt;&lt;br /&gt;But I did get some data and it is goooood data. As I mentioned, my analysis is based on seven sample sets from HPUX, Linux, and AIX environments. There were no outliers and no data was removed. All the data was placed directly in the &lt;i&gt;Mathematica&lt;/i&gt; Notepad and you can view the data by simply looking at the resulting PDFs (see below). Each sample set consisted of 60 samples, so I was well above my chosen minimum of 30. Each sample interval was 15 minutes; long enough to remove any significant collection timing error but not too long to miss significant utilization swings.&lt;br /&gt;&lt;br /&gt;By the way, I do a lot of these types of experiments. If you want to participate, just send me an email (start by sending to orapub@comcast.net). If you desire, I will keep you anonymous, which I have done with everyone in this experiment.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Analysis&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Process&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;To analyze the data, I used &lt;i&gt;Mathematica&lt;/i&gt;. As you can see by looking at the output, it's very clean and allows me to document the analysis in detail within the &lt;i&gt;Mathematica&lt;/i&gt; "Notebook." All the data has been placed into the single &lt;i&gt;Mathematica&lt;/i&gt; notebook. The data is at the bottom of the notebook and it is documented with specifics such as the OS and number of cores and threads. I also provide the participate's reference code (so I can go back and reference all the emails, raw data files, etc.).&lt;br /&gt;&lt;br /&gt;The analysis notebook contains five main sections; Background and Purpose, Data Loading, Basic Statistics, Utilization Difference Analysis, and the Experimental Data. You can view each of the analysis results, on-line in a PDF file, by clicking on the links below.&lt;br /&gt;&lt;br /&gt;1. &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/Analysis_AG1.pdf"&gt;AG1&lt;/a&gt;&lt;/b&gt;. AIX, 16 cores, each with 2 threads (32 total threads).&lt;br /&gt;2. &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/Analysis_AG2.pdf"&gt;AG2&lt;/a&gt;&lt;/b&gt;. Linux, 4 cores, each with 4 threads (16 total threads).&lt;br /&gt;3. &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/Analysis_GO1.pdf"&gt;GO1&lt;/a&gt;&lt;/b&gt;. HPUX, 3 cores, each with 2 threads (6 total threads).&lt;br /&gt;4. &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/Analysis_LZ1.pdf"&gt;LZ1&lt;/a&gt;&lt;/b&gt;. HPUX, 32 cores, each with 2 threads (64 total threads).&lt;br /&gt;5. &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/Analysis_LZ2.pdf"&gt;LZ2&lt;/a&gt;&lt;/b&gt;. HPUX, 32 cores, each with 2 threads (64 total threads).&lt;br /&gt;6. &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/Analysis_NN1.pdf"&gt;NN1&lt;/a&gt;&lt;/b&gt;. Solaris, 4 cores, each with 2 threads (8 total threads).&lt;br /&gt;7. &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/Analysis_RB1.pdf"&gt;RB1&lt;/a&gt;&lt;/b&gt;. Linux, 8 cores, not sure if threads are being used.&lt;br /&gt;&lt;br /&gt;To download the "analysis pack" (in a single zip file) which contains all the above analysis PDFs, Mathematica notebook (think of the notebook as the source file), and the data collection script,&amp;nbsp;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/AnalysisPack.zip"&gt;click here&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Raw Results&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Figure 1 below is a summary of the results.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-Rc-lB6Ie8fs/TcmwqrXJSRI/AAAAAAAAAOE/gLZ1vF5kIAM/s1600/Sample+Source+Details.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="100" src="http://3.bp.blogspot.com/-Rc-lB6Ie8fs/TcmwqrXJSRI/AAAAAAAAAOE/gLZ1vF5kIAM/s400/Sample+Source+Details.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 1. Summary of the analysis.&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;Here is a short explanation of each figure column.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Code&lt;/b&gt; is the code to identify the sample set and to keep the data provider anonymous.&lt;/li&gt;&lt;li&gt;&lt;b&gt;OS&lt;/b&gt; is the operating system.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Cores (total)&lt;/b&gt; is the total number of CPU cores.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Threads (total)&lt;/b&gt; is the total number of threads.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Samples&lt;/b&gt; is the number of experimental sample collected and used in the analysis.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Interval (sec)&lt;/b&gt; is the sample duration for every collected sample.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Utils Correl&lt;/b&gt; is the correlation coefficient relating the relationship strength between the two utilizations; &lt;b&gt;v$osstat&lt;/b&gt; core based and OS &lt;b&gt;vmstat&lt;/b&gt; based. A high correlation indicates when one of the utilizations changes so does the other. A correlation coefficient in the 0.90+ is very high, so as you can see all of our samples showed a staggeringly high correlation.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Util Max&lt;/b&gt; is the maximum utilization of any sample value.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Avg Util Diff&lt;/b&gt; is the Oracle utilization (&lt;b&gt;v$osstat&lt;/b&gt; core based) minus the OS utilization (&lt;b&gt;vmstat&lt;/b&gt;). Throughout this analysis, the difference is always calculated this way.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Hypo Test&lt;/b&gt; is the result of hypothesis test (alpha=0.05), testing if the utilization differences are enough indicate they came from different sources, such as different utilization calculations or even a different database server. All but one of the samples easily passed the hypothesis test. There is more information about the hypothesis test below and also in the &lt;i&gt;Mathematica&lt;/i&gt; notebooks.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Histogram Notes&lt;/b&gt; are simply short observations regarding the histogram of the utilization differences.&amp;nbsp;A nice looking "bell curve" indicates the error hovers around the mean and there is not a significant and harmful trend.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Util Scatter Gap&lt;/b&gt;&amp;nbsp;shows the utilization values for all of our samples; both Oracle based and OS based. Therefore, there are 120 datapoints on the graph. This graph allows us to quickly and visually tell if there are significant differences in the utilization calculations and also when they occurred. For example, we can visually tell if these gaps increase as the utilization increases. That is a harmful trend we will want to know about.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Residual Slope&lt;/b&gt; is the slope of the linear trend line based on the utilization error (i.e., difference) as the y-axis and the Oracle based utilization as the x-axis. This is another method to spot a harmful trend. A harmful trend example is when the utilization increases so does the difference between the two utilization sources. If the slope is zero (completely flat) this means regardless of the utilization the difference is always the same. The line could be flat but above the zero y-axis, which indicates while the utilization difference is the same regardless of the utilization, the Oracle utilization is always greater than the OS based utilization. Residual graphs are a powerful way to look for harmful trends.&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Summary of Results&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I could write on and on about the individual samples, but it's not necessary (thankfully). All but one of the samples (AG1) are very similar. All the non AG1 samples visually and mathematically indicate while there is a little difference in the utilization calculations, it is very small, consistent over the gathered utilization range, and the difference in the utilizations is normally distributed.&lt;br /&gt;&lt;br /&gt;The only concern I have is the none of the sample sets were gathered from a system with the CPU utilization greater than 65%. I would like to see some sample sets with a CPU constrained system. It is possible if the utilization differences are highly skewed and the residual slope is not flat (discussed below), the utilization difference (i.e., the gap) could become increasingly larger. This stresses the importance of understanding how utilization is calculated and what this truly means for our Oracle production systems. While I will write below about the practical value of the analysis, the &lt;i&gt;How Can This Be?&lt;/i&gt; will be posted in my next entry...it's just too much information for a single blog entry.&lt;br /&gt;&lt;br /&gt;(If you have a very active CPU subsystem, please contact me and I'll help you gather the data and do the analysis for you.)&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Detailed Results (Selected)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I'm going to review in detail two sample sets. The two are very different and will make a nice contrast; LZ1 and AG1.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;LZ1 Sample Set&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I chose to focus on the LZ1 sample set because the CPU subsystem is relatively active (max 66%), it's a massive database server (64 cores), and is HPUX based. You can view the PDF analysis file by &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/CoreVsThreadP2/Analysis_LZ1.pdf"&gt;clicking here&lt;/a&gt;&lt;/b&gt;. But I have extracted some of the graphics and show them &amp;nbsp;below.&lt;br /&gt;&lt;br /&gt;First, there is a strikingly strong and positive correlation (1.000) between the two utilizations. This means when one of the utilizations increases so will the other...nearly always. So while there is a slight difference between the two utilization calculations, they are very highly correlated.&lt;br /&gt;&lt;br /&gt;This slight difference in utilizations is indeed small...infinitesimal is a better word. The average difference is only 0.00028 of a percentage point...basically no difference.&lt;br /&gt;&lt;br /&gt;A tiny utilization calculation difference may not seem or feel significant, but statistically is might be too much to indicate the data sets are the result of two different calculations. This is another of saying we want to test if our two sample sets are statistically different. If they are different, then we know any difference is not caused by randomness. The difference could be caused any number of things, but not randomness.&lt;br /&gt;&lt;br /&gt;But itʼs a little more complicated then is usual... We canʼt expect all the Oracle utilization data values to be normally distributed. Think about it; there could easily be two clusters of values, say around 20% and 50% busy. This would result in a non-nomal distribution of utilization values. The same is true for the &lt;b&gt;vmstat&lt;/b&gt; utilization data values. Because of this non-normality, we canʼt perform a simple t-test.&lt;br /&gt;&lt;br /&gt;All is not lost. According to the &lt;b&gt;central limit theorem&lt;/b&gt;&amp;nbsp;(reference &lt;b&gt;&lt;a href="http://en.wikipedia.org/wiki/Central_limit_theorem"&gt;one&lt;/a&gt;&lt;/b&gt; and &lt;b&gt;&lt;a href="http://www.statisticalengineering.com/central_limit_theorem.htm"&gt;two&lt;/a&gt;&lt;/b&gt;) if the differences in the sample pairs are normally distributed then we can statistically say the samples came from the population. Relating this to utilization, if I gather 100 &lt;b&gt;vmstat&lt;/b&gt; utilization samples (the population), divide the 100 samples into two 50 sample sets, create a new third sample set based on the differences in the samples ( S51-S1, S52-S2,..., S100-S50 ), the new sample set will be normally distributed. So cool---this occurs regardless of the distribution of the initial two sample sets. ...I see another blog entry coming...&lt;br /&gt;&lt;br /&gt;So this is what I did: I created a&amp;nbsp;new sample set based on the differences between each collection method (one value for Oracle based utilization and the other value from &lt;b&gt;vmstat&lt;/b&gt;). If the new "differences" sample set is normally distributed, than any differences in the samples (e.g., S51-S1) do exist, the difference can attributed to randomness. On the flip-side, if the new sample set is &lt;i&gt;not&lt;/i&gt; normally distributed, we know the utilization differences are not caused by randomness...but by something else.&lt;br /&gt;&lt;br /&gt;Back to the LZ1 analysis: I performed a significant test (alpha=0.05) to see if the "differences" sample set is so different that randomness could not account for the difference and therefore the difference was caused by something else...like perhaps different utilization calculations. As expected, the two sample sets are similar enough (Anderson-Darling test, p=0.950) to indicate any difference is caused by randomness.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-HyTFYeeONQs/Tcl3t_eUvgI/AAAAAAAAANk/-G96dzy7VKA/s1600/LZ1_Hist.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="186" src="http://2.bp.blogspot.com/-HyTFYeeONQs/Tcl3t_eUvgI/AAAAAAAAANk/-G96dzy7VKA/s320/LZ1_Hist.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 2.&lt;/b&gt;&lt;/div&gt;Figure 2 above is the histogram based on the "differences" sample set. The histogram doesn't look perfectly normal; it's kind of flat near (&lt;b&gt;&lt;a href="http://wiki.answers.com/Q/What_does_a_negative_kurtosis_mean"&gt;negative kurtosis&lt;/a&gt;&lt;/b&gt;) the top and the left tail is "fat." However, it's not that bad (I realize that's not real scientific.) What I like to see if the error tappers off at the tails, so that it is centered around the average...so we're close.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-0RpQp4VxU7M/TcmMKWfQVUI/AAAAAAAAAOA/ffH_VGhQae0/s1600/LZ1_DiffScatter.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="225" src="http://2.bp.blogspot.com/-0RpQp4VxU7M/TcmMKWfQVUI/AAAAAAAAAOA/ffH_VGhQae0/s320/LZ1_DiffScatter.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Figure 3.&lt;/b&gt;&lt;/div&gt;Figure 3 above is a scatter plot of the two utilizations, not the "differences." All 60 samples are represented and for each sample there is an Oracle utilization point (red square) and an OS utilization point (blue circle). In the figure it is difficult to distinguish the two points because they are almost the same! This means the utilization calculations are pretty much the same. But even more important; the gap is very consistent the same regardless of the sample or the utilization.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-0wtY45hueXs/Tcl7J1jaV4I/AAAAAAAAANs/Z9t2s5TrZpI/s1600/LZ1_Residual.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="215" src="http://3.bp.blogspot.com/-0wtY45hueXs/Tcl7J1jaV4I/AAAAAAAAANs/Z9t2s5TrZpI/s320/LZ1_Residual.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 4.&lt;/b&gt;&lt;/div&gt;Figure 4 above is known as a residual plot and is a phenomenal way to spot trends in the error, that is, the differences in the utilization calculations. It is common for mathematical models to work wonderfully at lower utilizations but get increasingly poor as the utilization increases. (We tackle how to remove the error in my &lt;b&gt;&lt;a href="http://training.orapub.com/content_forecasting.asp"&gt;Oracle Forecasting and Predictive Analysis&lt;/a&gt;&lt;/b&gt; course.) The residual graph is one way to visually help us detect if this is occurring. For our purposes, we want to check visually again (like we did with the scatter plot above) if the utilization difference is greater based on the utilization. If the trend line of the residuals is flat (and it is with a slope of 0.0002331) then the utilization difference does not change based on the utilization...for the samples we selected. You probably can't even see the trend line because it is virtually on the x-axis. I could have made the y-axis range smaller and effectively stretch the graph top to bottom, but I want to fair comparison with the AG1 data set I'll detail below.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;To summarize sample set LZ1&lt;/b&gt;, both utilization calculations always resulted in nearly the same value, the difference did not change as the utilization increased, and the hypothesis test showed the differences in the utilization can be explained by randomness.&amp;nbsp;Based on this information, I am comfortable using either utilization calculation until signifiant CPU queuing begins, which will occur &amp;nbsp;at around 85%. Then I would gather additional data and re-analyze.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;AG1 Sample Set&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The Oracle based utilization peaked at 68%, so we have an active CPU subsystem. The AG1 sample set is unique compared to the others because from multiple perspectives there is a clear difference in the utilization calculations. Just by looking at the average difference, Oracle &lt;b&gt;v$osstat&lt;/b&gt; core based utilization calculation is 6.8% higher than &lt;b&gt;vmstat&lt;/b&gt; reports. But it gets more interesting...&lt;br /&gt;&lt;br /&gt;There is a strikingly strong and positive correlation between the two utilization sources. This means when one of the utilizations increases so will the other...nearly always. So while there is a near 7% difference in the utilization sources, they are still very highly correlated.&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;A 7% difference may seem or feel significant, but statistically is might not be a big deal and could have been caused by randomness. Just as with the LZ1 sample set, I performed a significance test (alpha=0.05) to see if the utilization difference in each sample pair are significantly different. As we suspected, statistically speaking there is significant difference&amp;nbsp;(Anderson-Darling test, p=0.040). This means the difference can't be explained purely by randomness and something else is causing the difference...perhaps it's the difference in utilization calculations or something else.&amp;nbsp;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-1w5RcFwuU80/Tcl9uUV3O6I/AAAAAAAAANw/Z8WfS2_69Os/s1600/AG1_Hist.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="182" src="http://2.bp.blogspot.com/-1w5RcFwuU80/Tcl9uUV3O6I/AAAAAAAAANw/Z8WfS2_69Os/s320/AG1_Hist.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: center;"&gt;&lt;b&gt;Figure 5.&lt;/b&gt;&lt;/div&gt;Figure 5 is a histogram of the utilization differences from our sample. Notice on all but two occasions, the Oracle utilization was higher than the OS (&lt;b&gt;vmstat&lt;/b&gt;) utilization. We can also visually see the utilization differences are not normally distributed but heavily skewed to the right (skew&amp;gt;0, skew is positive)...in fact it looks &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html"&gt;log normal&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-3Qj29zWb5y8/TcmFr5bhCrI/AAAAAAAAAN8/MW6w-c3odiQ/s1600/AG1_DiffScatter.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="224" src="http://4.bp.blogspot.com/-3Qj29zWb5y8/TcmFr5bhCrI/AAAAAAAAAN8/MW6w-c3odiQ/s320/AG1_DiffScatter.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 6.&lt;/b&gt;&lt;/div&gt;Figure 6 above is very cool! It clearly shows three things; First, the utilization trends nicely from as low as 10% to over 65%. Second, we can clearly see the high utilization samples result in a larger difference between the utilization sources. And finally, the Oracle utilization (&lt;b&gt;v$osstat&lt;/b&gt; core based)&amp;nbsp;is nearly always greater than the OS &lt;b&gt;vmstat&lt;/b&gt; based&amp;nbsp;utilization. One would assume if the utilization increased further (say 90%) the utilization gap would continue to increase...but how do we know this? Read on...&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-kxAqduOlwDg/Tcl9yxMbJxI/AAAAAAAAAN4/yYy2rtDm_Uo/s1600/AG1_Residual.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="214" src="http://3.bp.blogspot.com/-kxAqduOlwDg/Tcl9yxMbJxI/AAAAAAAAAN4/yYy2rtDm_Uo/s320/AG1_Residual.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Figure 7.&lt;/b&gt;&lt;/div&gt;Figure 7 above is the residual plot showing as the Oracle utilization increases, so does the difference between the utilization calculations. Compared to the other sample residual graphs, the slope looks shockingly steep! The trend line may not appear to match the data. It is because all the data is not shown to allow for a direct and fair comparison with the other sample sets (e.g., Figure 4 above).&lt;br /&gt;&lt;br /&gt;Let's take a look at what this really means. The linear trend line slope is 0.2563. This means for every single percentage of Oracle utilization increase there is an additional 1/4 of a percentage point difference in the utilization calculations. Figure 7 shows when the Oracle utilization is around 35% (i.e., 0.35) there difference in utilizations is around 3%. If we extend the line to where the Oracle utilization is 60%, the difference in utilizations is expected to be around 13% (just plug the Oracle utilization number into the trend line formula, shown on the analysis PDF file). That is still not horrible, but not great either. The question now becomes, which utilization is the "real" utilization. This is the subject for my next blog entry.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;To summarize sample set AZ1&lt;/b&gt;, all our tests indicate there is a significant utilization calculation difference between Oracle core-based and OS based CPU utilizations.&amp;nbsp;When the utilization peaked, the Oracle based calculation was around 10% greater than &lt;b&gt;vmstat&lt;/b&gt; showed and this difference is expected to increase as the utilization increases! So we had better decide which utilization source to use or we and everyone we work with could be confused. In the next blog entry I'll drill down into this topic.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Summary&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you are concerned about which utilization method to use, don't worry...do a little analysis. You don't need to do a full-on analysis like I did. Just gather some data (you can even use my data collection script) and create a scatter plot like I did in Figure 3 and Figure 6. If you see a difference of more than around 5% then you may want to determine why the difference exists and also which method is relevant for your work.&lt;br /&gt;&lt;br /&gt;Here are a few take-aways:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Can we use &lt;b&gt;v$osstat&lt;/b&gt; as a source to reliably calculate OS CPU utilization? Absolutely.&lt;/li&gt;&lt;li&gt;Can the &lt;b&gt;v$osstat&lt;/b&gt;&amp;nbsp;core based utilization differ significantly from &lt;b&gt;vmstat&lt;/b&gt;? Absolutely.&lt;/li&gt;&lt;li&gt;How do I know if I need to be concerned? As I mentioned directly above, collect some data and do a quick analysis. If you'd like, I can help QA your conclusions.&lt;/li&gt;&lt;/ul&gt;I think an even more intriguing question is &lt;i&gt;why can there be differences in the utilization calculations&lt;/i&gt; between Oracle and the OS. That's the topic of my next blog entry...&lt;br /&gt;&lt;br /&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub@comcast.net.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-3050881483234294849?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/3050881483234294849/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/05/cores-vs-threads-util-differencespart-2.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/3050881483234294849'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/3050881483234294849'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/05/cores-vs-threads-util-differencespart-2.html' title='Cores vs Threads: Util Differences...Part 2'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-Rc-lB6Ie8fs/TcmwqrXJSRI/AAAAAAAAAOE/gLZ1vF5kIAM/s72-c/Sample+Source+Details.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-8745686304520359827</id><published>2011-04-28T06:52:00.000-07:00</published><updated>2011-05-05T15:25:36.879-07:00</updated><title type='text'>Giving Twitter A Try...@CShallahamer</title><content type='html'>I'm always looking for better ways to reach DBAs in less intrusive and obnoxious ways. Twitter may be an answer...maybe not. But I'm willing to give it a try!&lt;br /&gt;&lt;br /&gt;So the next month, I'll be&amp;nbsp;tweeting about new blog entries, tools, research, training schedules and discounts, etc. If you want to &lt;a href="http://twitter.com/@CShallahamer"&gt;follow me on Twitter, my account is @CShallahamer&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Thanks for reading,&lt;br /&gt;&lt;br /&gt;Craig.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-8745686304520359827?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/8745686304520359827/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/04/giving-twitter-trycshallahamer.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/8745686304520359827'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/8745686304520359827'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/04/giving-twitter-trycshallahamer.html' title='Giving Twitter A Try...@CShallahamer'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-6806881310077955063</id><published>2011-04-22T06:37:00.000-07:00</published><updated>2011-06-08T06:50:34.481-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='requirements'/><category scheme='http://www.blogger.com/atom/ns#' term='cpu utilization'/><category scheme='http://www.blogger.com/atom/ns#' term='capacity'/><category scheme='http://www.blogger.com/atom/ns#' term='threads'/><category scheme='http://www.blogger.com/atom/ns#' term='v$osstat'/><category scheme='http://www.blogger.com/atom/ns#' term='cores'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><category scheme='http://www.blogger.com/atom/ns#' term='busy_time'/><title type='text'>Core vs Thread...CPU Utilization - Part 1</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: large;"&gt;What's the Big Deal?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This blog series was motivated for two reasons. First, I want to demonstrate how one can calculate a database server's CPU utilization straight from Oracle v$ views. Second, over the past few years I've seen a couple examples where database servers have exhibited more power then the number of CPU cores could provide, which implies threads are indeed providing some extra power...so some investigation seems warranted.&lt;br /&gt;&lt;br /&gt;This is a two part blog entry. This first part will detail how you can calculate the OS CPU utilization straight from v$ views along with the fundamentals of utilization. (Which can be widely applied to performance analysis.) In part two, I will present the results of my experiments contrasting CPU cores versus threads...and of course you'll be able to download and run the experimental scripts yourself!&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Threads versus Cores&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When you get down to what actually services work, it is a CPU core. Threads and hyper-threads work must be ultimately processed by a core. An exception would be if threads are truly simultaneously processed (and this can be tested). Capacity planners will tell you that threads don't scale. They will tell you that once each thread is occupied, their advantages decline. Unless you have experimental evidence leading you differently, when predicting the CPU power of a database server, the conservative (some say honest and ethical) way to determine the CPU capacity is to use cores, not threads, or hyper-threads. As a simple internet search will demonstrate, there are many published references. I found a great reference that any DBA will understand on Dell's web-site &lt;a href="http://content.dell.com/us/en/enterprise/d/large-business/thread-cores-which-you-need.aspx"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Calculating OS CPU Utilization&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;On most Oracle systems it is a simple matter to compute the OS CPU utilization based on the 10g view &lt;b&gt;v$osstat&lt;/b&gt;.&amp;nbsp;The key to understanding utilization is it is simply requirements divided by capacity...end of story. Requirements can take on many forms such as CPU time consumed, IOPS performed, memory currently held, etc. Capacity can also take on many forms such as available CPU time, sustained IOPS available, physical memory, etc. Over an interval of time, if I know the CPU time consumed and I know the number of CPU cores on the database server, I can easily calculate the average database server CPU utilization over the interval. Here's a basic utilization formula:&lt;br /&gt;&lt;br /&gt;U = R / C&lt;br /&gt;&lt;br /&gt;Where;&lt;br /&gt;&lt;br /&gt;U is utilization&lt;br /&gt;R is requirements&lt;br /&gt;C is capacity&lt;br /&gt;&lt;br /&gt;I need to mention that when, referring to a CPU subsystem the capacity is the number of cores multiplied by the interval. For example, over a 1 minute interval a single core box can provide a maximum of 60 seconds of CPU. Over a 2 minute interval a single core box can provide a maximum of 120 seconds of CPU. Over a 2 minute interval a dual core box can provide a maximum of 240 seconds of CPU. I think you get the pattern, which is C = cores X interval. Using a more realistic example, over a one hour interval, a 16 core database server can provide up to 57600 seconds of CPU power; 16 cores X 1 hour X 60 min/hour X 60 sec/min = 57600 core-seconds.&lt;br /&gt;&lt;br /&gt;Let's put this together resulting in the CPU utilization. Suppose over a one-hour interval an 8 core database server was busy 10080 seconds. In other words, the CPU subsystem was required to work 10080 seconds, so R = 10080 sec. Over a one-hour interval 8 cores can provide a maximum of 28800 seconds of CPU power; C = 8 cores X&amp;nbsp;&amp;nbsp;1 hour X 60 min/hour X 60 sec/min = 28800 core-seconds. Therefore, the average CPU utilization over the interval is 0.35 or 35%; 10080 seconds / 28800 seconds.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;But where did I get the 10080 seconds figure?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;We can get the seconds of CPU consumed directly from&amp;nbsp;&lt;b&gt;v$osstat&lt;/b&gt;. The &lt;b&gt;v$osstat&lt;/b&gt; view was introduced in Oracle 10g. It contains a dizzying array of confusing statistics that seem to change from platform to platform and from release to release. The statistic&amp;nbsp;&lt;b&gt;busy_time&lt;/b&gt; is the total CPU consumed by all operating system processes in hundreds of a seconds, that is, centi-seconds. For example, if the busy time is 123456 then since the operating system (not the Oracle instance) has started, all operating system processes (Oracle and everything else) have consumed 1234.56 seconds of CPU. We are not making a statement about the speed of a CPU, but simply the processes consumed 1234.56 seconds of CPU since the server was last rebooted.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Where do we get the number of CPU cores or threads?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;For our purposes today, we want to know the number of CPU cores and sometimes threads. While not perfect I'm sure, if you reference any core related &lt;b&gt;v$osstat&lt;/b&gt; statistic that does not contain the word thread, and take the maximum value, that will be the number of CPU cores. I realize this is not real scientific, but it seems to work. (And a quick check using sar, top, or asking the sysadmin also helps build confidence!)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Let's do it!&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The pl/sql block below will determine the average OS CPU utilization over a 60 second interval. You can easily change the interval time.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;set tab on&lt;br /&gt;set serveroutput on&lt;br /&gt;declare&lt;br /&gt;  interval_sec      integer;&lt;br /&gt;  busy_sec_t0       integer;&lt;br /&gt;  busy_sec_t1       integer;&lt;br /&gt;  cpu_cores_t0      integer;&lt;br /&gt;  cpu_cores_t1      integer;&lt;br /&gt;  cpu_cores_avg     number;&lt;br /&gt;  requirements_sec  number;&lt;br /&gt;  capacity_sec      number;&lt;br /&gt;  utilization       number;&lt;br /&gt;begin&lt;br /&gt;  interval_sec := 60;&lt;br /&gt;&lt;br /&gt;  select (a.value/100), b.value&lt;br /&gt;  into   busy_sec_t0, cpu_cores_t0&lt;br /&gt;  from   ( select value from v$osstat where stat_name='BUSY_TIME' ) a,&lt;br /&gt;         ( select value from v$osstat where stat_name='NUM_CPUS' ) b ;&lt;br /&gt;&lt;br /&gt;  dbms_lock.sleep(interval_sec);&lt;br /&gt;&lt;br /&gt;  select (a.value/100), b.value&lt;br /&gt;  into   busy_sec_t1, cpu_cores_t1&lt;br /&gt;  from   ( select value from v$osstat where stat_name='BUSY_TIME' ) a,&lt;br /&gt;         ( select value from v$osstat where stat_name='NUM_CPUS' ) b ;&lt;br /&gt;&lt;br /&gt;  requirements_sec := busy_sec_t1 - busy_sec_t0 ;&lt;br /&gt;  cpu_cores_avg    := (cpu_cores_t1 + cpu_cores_t0) / 2 ;&lt;br /&gt;  capacity_sec     := interval_sec * cpu_cores_avg ;&lt;br /&gt;  utilization      := requirements_sec / capacity_sec ;&lt;br /&gt;&lt;br /&gt;  dbms_output.put_line('OS CPU Utilization Calculation...');&lt;br /&gt;  dbms_output.put_line('REQUIREMENTS----------');&lt;br /&gt;  dbms_output.put_line('..busy_sec_t0 (sec)    :'||busy_sec_t0);&lt;br /&gt;  dbms_output.put_line('..busy_sec_t1 (sec)    :'||busy_sec_t1);&lt;br /&gt;  dbms_output.put_line('..requirements (sec)   :'||requirements_sec);&lt;br /&gt;  dbms_output.put_line('CAPACITY----------');&lt;br /&gt;  dbms_output.put_line('..interval (sec)       :'||interval_sec);&lt;br /&gt;  dbms_output.put_line('..CPU cores            :'||cpu_cores_avg);&lt;br /&gt;  dbms_output.put_line('..capacity (sec)       :'||capacity_sec);&lt;br /&gt;  dbms_output.put_line('-------');&lt;br /&gt;  dbms_output.put_line('Utilization (avg)      :'||round(utilization,2));&lt;br /&gt;end;&lt;br /&gt;/&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;Note: You will need to check your &lt;b&gt;v$osstat&lt;/b&gt; CPU core statistic name as there is a good chance it will be different. Also, if you see the statistic &lt;b&gt;avg_busy_time&lt;/b&gt; do NOT use that. (How Oracle determines the average time of something over an interval is beyond me.) Instead, always use the &lt;b&gt;busy_time&lt;/b&gt; statistic.&lt;br /&gt;&lt;br /&gt;On my Linux system the results where:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;SQL&amp;gt; /&lt;br /&gt;OS CPU Utilization Calculation...&lt;br /&gt;REQUIREMENTS----------&lt;br /&gt;..busy_sec_t0 (sec)    :307954&lt;br /&gt;..busy_sec_t1 (sec)    :308015&lt;br /&gt;..requirements (sec)   :61&lt;br /&gt;CAPACITY----------&lt;br /&gt;..interval (sec)       :60&lt;br /&gt;..CPU cores            :4&lt;br /&gt;..capacity (sec)       :240&lt;br /&gt;-------&lt;br /&gt;Utilization (avg)      :.25&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed.&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;This should match the average CPU utilization from an OS command, such as sar, vmstat, or top. Why? Because the &lt;b&gt;v$osstat&lt;/b&gt; view pulls its data directly from the virtual filesystem &lt;b&gt;/proc&lt;/b&gt;...and so does sar, vmstat, and top. For details, refer to the &lt;i&gt;Operating System Contention&lt;/i&gt; chapter's &lt;i&gt;Monitoring CPU Activity&lt;/i&gt; section (starting on page 110) in my &lt;i&gt;&lt;a href="http://resources.orapub.com/Oracle_Performance_Firefighting_Book_p/ff_book.htm"&gt;Oracle Performance Firefighting&lt;/a&gt;&lt;/i&gt; book.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;What's Next?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Most Oracle systems, using CPU cores to determine the OS CPU capacity works perfectly. But over the years, I have seen a few systems where it appears the CPU capacity is actually greater than the number of CPU cores could physically provide (in this universe). This implies threads are providing more power and a better CPU capacity calculation will include a mix of CPU cores and threads. This calls for an experiment...one that we all can run on our Oracle systems. So in part two of this series I will present the tool to gather the data, share some experimental results, and draw some conclusions.&lt;br /&gt;&lt;br /&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub@comcast.net.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-6806881310077955063?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/6806881310077955063/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/04/core-vs-threadcpu-utilization-part-1.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/6806881310077955063'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/6806881310077955063'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/04/core-vs-threadcpu-utilization-part-1.html' title='Core vs Thread...CPU Utilization - Part 1'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-1368646375506573522</id><published>2011-03-26T23:54:00.000-07:00</published><updated>2011-03-26T23:54:07.663-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='response time analysis'/><category scheme='http://www.blogger.com/atom/ns#' term='time based performance analysis'/><title type='text'>New IOUG Session: Unit of Work Time Based Analysis</title><content type='html'>A couple of weeks ago I was given the opportunity to submit another presentation for the upcoming April 2011 Collaborate/IOUG conference in Orlando, Florida. I am particularly excited about this presentation because in terms of Oracle performance analysis, besides doing experiments, this is my current passion. Plus, over the last few months I have been able to significantly purify and clarify my thinking in this area. I will be posting the conference presentation once it's complete.&lt;br /&gt;&lt;br /&gt;The session number 484 and I will make the presentation on Wednesday afternoon April 13.&amp;nbsp;While the official submission title is, &lt;i&gt;Unifying Time Based Performance Analysis&lt;/i&gt; a better title would be, &lt;i&gt;Unit of Work Time Based Performance Analysis&lt;/i&gt;. If you're going to be at the conference, I hope you can make the presentation!&lt;br /&gt;&lt;br /&gt;What's it about? If you have attended my &lt;a href="http://training.orapub.com/content_adv_perf_analysis.asp"&gt;Advanced Oracle Performance Analysis&lt;/a&gt; course, read the last chapter in the &lt;a href="http://resources.orapub.com/Oracle_Performance_Firefighting_Book_p/ff_book.htm"&gt;Oracle Performance Firefighting&lt;/a&gt; book, or read my paper &lt;a href="http://resources.orapub.com/SearchResults.asp?Search=evaluate"&gt;Evaluating Alternative Performance Solutions&lt;/a&gt; you will have a pretty good idea about what the presentation is all about. Without specific reference to this topic, I also blogged about this in detail in my four part series entitled, &lt;a href="http://shallahamer-orapub.blogspot.com/2010/05/insert-batch-size-performance-effects.html"&gt;Altering insert commit batch size&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Here is the conference abstract:&lt;br /&gt;&lt;br /&gt;Time based performance analysis typically focuses on a specific process, SQL statement, or time interval (such as an hour or day). While this is the indeed valuable, switching the focus to the time related to complete a single small unit of work has a number of distinct advantages employed by computing system performance analysts for many years.&lt;br /&gt;&lt;br /&gt;By focusing on an individual unit of work, the time to complete each unit of work can be broken down into the classic performance analysis categories of service time and queue time. The "leap" is made by placing a unit of work's CPU consumption into the service time bucket and non-idle wait time into the queue time bucket. Having completed this transformation, Oracle performance analysis can now benefit from the history of proven methods, theory, and mathematics of classic computing system performance analysis. These benefits include predictive analysis, alternative hardware and architectural analysis, and understanding how Oracle performance solutions will impact performance down to the unit of work up to the elapsed time of a SQL statement or an entire process. This type of analysis also leads to highly communicative visual representations.&lt;br /&gt;&lt;br /&gt;Since this approach is new to many DBAs, data from multiple real Oracle systems is used to clearly demonstrate that indeed real Oracle systems behave surprisingly close to the performance mathematics.&lt;br /&gt;&lt;br /&gt;This presentation will lead the listener step-by-step through the analysis process starting with a standard Statspack/AWR report, transforming the interval time based analysis into a unit of work time based analysis, how to graphically display the situation, and how the impact of performance solutions can be anticipated. All the tools and templates presented a freely available and special effort is made to make the analysis very practical.&lt;br /&gt;&lt;br /&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-1368646375506573522?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/1368646375506573522/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/03/new-ioug-session-unit-of-work-time.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/1368646375506573522'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/1368646375506573522'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/03/new-ioug-session-unit-of-work-time.html' title='New IOUG Session: Unit of Work Time Based Analysis'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-3505112948356665175</id><published>2011-02-24T09:23:00.000-08:00</published><updated>2011-02-24T09:23:00.863-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='arrival pattern'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle performance'/><category scheme='http://www.blogger.com/atom/ns#' term='inter-arrival time'/><category scheme='http://www.blogger.com/atom/ns#' term='unexpected'/><category scheme='http://www.blogger.com/atom/ns#' term='arrival rate'/><category scheme='http://www.blogger.com/atom/ns#' term='fitness test'/><category scheme='http://www.blogger.com/atom/ns#' term='exponential distribution'/><title type='text'>SQL Arrival Patterns and Impact</title><content type='html'>If you have been following my recent posts and especially the post about &lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/sql-statement-elapsed-times.html"&gt;SQL statement elapsed times&lt;/a&gt;,&amp;nbsp;you'll know that unfortunately SQL statement elapsed times do not conform to a normal, poisson, exponential, or even a log normal distribution. That's unfortunate because if a statement does indeed conform, even with limited data (which we can easily obtain), we can make some pretty amazing predictions. But still, I was able to come to some useful conclusions.&amp;nbsp;&lt;b&gt;In this postin&lt;/b&gt;g I want to investigate the &lt;i&gt;pattern of SQL statement arrivals&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Why Should We Care?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Note: If the below terms are not familiar to you, I will clearly define them starting in the next section.&lt;br /&gt;&lt;br /&gt;Arrival rates and inter-arrival times are a massively important topic in computing system performance analysis. In the capacity planning and predictive analysis industry it is a given that the pattern of inter-arrival times (time/work: ms/trx) is exponentially distributed and the average arrival rate (work/time) is poisson distributed. [1, 2, 3]&amp;nbsp;This is so ingrained into minds that if&amp;nbsp;I went to a non-Oracle computing performance conference and suggested otherwise, I would immediately be questioned, demeaned, and have pencils thrown at me (or something like that). It wouldn't be pretty.&lt;br /&gt;&lt;br /&gt;But to believe science is unscientific, is it not? And what is wrong with challenging assumptions about Newton's Three Laws of Motion, that we live in a closed system, and even perhaps question arrival pattern assumptions? I got flack from the "experts" when I first start speaking about Oracle's wait interface and when to use an index. And even now I still get some flack regrading how to apply classic response time analysis to Oracle time-based analysis. And being called dangerous is actually pretty cool, especially when you look at who's saying it!&lt;br /&gt;&lt;br /&gt;But in all seriousness,&amp;nbsp;my objective in this posting is simply these three things:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;To test and create the ability for others to test; are SQL statement arrival patterns exponential? And if not, do they conform to some known statistical distribution? What can I learn that will help me in my daily performance analysis work?&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;That's my objective, pure and simple.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Plan&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To ease you into this topic, I'm going to start with defining a few key terms with words and pictures. Then I'll summarize the experimental design, how I collected the data, an analysis of three data sets, and then I'll draw some final conclusions. I will also provide the links so you can perform the same tests I have. I hope you enjoy the journey!&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Transaction Arrivals&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Average Arrival Rate&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Let's say you went to your local shopping mall, sat down in a nice cushy chair, and counted the number of people who walked into a specific store over a one minute period. Perhaps 21 people went into the store. Don't mean to insult your intelligence here, but this means 21 people arrived into the store over a one minute period. So the &lt;i&gt;arrival rate&lt;/i&gt; is 21/1 people/minute or 21/60 people/second or 0.350 people/second. This is known as an average arrival rate.&lt;br /&gt;&lt;br /&gt;An arrival rate needs a unit of occurrence or work and a also a unit of time. Examples of occurrence or work are a transaction, logical IO, physical IO read, physical IO read request, an execution of SQL statement ABC, or procedure customer_add. Examples of time are hours, seconds, etc. While this may seem trivial, when referring to the &lt;i&gt;arrival rate&lt;/i&gt; it is important to always keep the occurrence as the numerator and the time as the denominator. One reason is because the &lt;i&gt;inter-arrival time&lt;/i&gt; is naturally given in time per occurrence and most of this blog is specifically referring to the inter-arrival time. So always listen close and use the unit of occurrence and time carefully.&lt;br /&gt;&lt;br /&gt;Now that we can calculate the average arrival rate, let's take this topic to the next level...inter-arrival times.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Inter-Arrival Times&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This topic really starts with inter-arrival times. The inter-arrival time is simply the time between each arrival. For example, taking a slight yet significant twist from the above shopping mall example, suppose I wrote down &lt;i&gt;the time of each arrival&lt;/i&gt;. Not the number of people that arrived over an interval of time, but the time of each arrival. It's then a simple task to determine the time &lt;i&gt;between&lt;/i&gt; each arrival, that is, the inter-arrival time. Let's say the shopping mall inter-arrival time (sec/person) data looked like this:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;{49.4135, 24.9792, 9.40843, 6.01274, 4.25393, 20.0842, 6.33593, 79.0265, 34.7396, 75.3667, 35.4444, 36.2891, 46.6725, 1.66213, 21.2452, 1.8656, 19.2686, 7.79427}&lt;/code&gt;&lt;/pre&gt;If I take this data set and paste it into WolframAlpha (did it &lt;a href="http://www.wolframalpha.com/input/?i=%7B49.4135%2C+24.9792%2C+9.40843%2C+6.01274%2C+4.25393%2C+20.0842%2C+6.33593%2C+79.0265%2C+34.7396%2C+75.3667%2C+35.4444%2C+36.2891%2C+46.6725%2C+1.66213%2C+21.2452%2C+1.8656%2C+19.2686%2C+7.79427%7D"&gt;here&lt;/a&gt;), one of the results is this histogram.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUgvotELBqI/AAAAAAAAAME/8isjse-4kTo/s1600/mall+hist+18.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="153" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUgvotELBqI/AAAAAAAAAME/8isjse-4kTo/s200/mall+hist+18.gif" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;The industry expects inter-arrival times for computing system transactions to be exponential, but how about the data above? If you have followed my &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html"&gt;blog entry on statistical distributions&lt;/a&gt;&lt;/b&gt; you'll know that this could be exponential, poisson, or log normal. Even if we had more samples, if I creatively alter the bin sizes I could make the histogram look just about anyway I wanted. What we need to do is a statistical hypothesis test. I'm not going to do that here, but I will with the real experimental data below.&lt;br /&gt;&lt;br /&gt;Suppose I sat like a couch potato at the shopping mall until I ended up with 100 samples, not just 18 as above. The resulting histogram looks like this:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUgxThd1s1I/AAAAAAAAAMI/_lqLijbwar0/s1600/Mall+hist+100+samples.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="208" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUgxThd1s1I/AAAAAAAAAMI/_lqLijbwar0/s320/Mall+hist+100+samples.gif" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;The two histograms look physically different because the just above histogram was created using &lt;i&gt;Mathematica&lt;/i&gt;... WolframAlpha doesn't like me pasting 100 samples into it. Three points:&amp;nbsp;The average is 19.7, the median is 14.4, and while the distribution looks log normal, it is exponential because I created it that way (sorry...kind of sneaky I know).&lt;br /&gt;&lt;br /&gt;By industry definition, transaction inter-arrivals should be exponentially distributed. [1, 2, 3] There is also another accepted assumption regarding the inter-arrivals; they are expected to be &lt;i&gt;independent&lt;/i&gt; of each other. Referring to my shopping mall example: If I'm at a mall with my wife, if she walks into &lt;a href="http://www.lucy.com/"&gt;Lucy's&lt;/a&gt; I will likely follow! My arriving was &lt;i&gt;dependent&lt;/i&gt; on her arrival, so our arrivals are clearly not independent. Family's walking into restaurants are another dependent example.&lt;br /&gt;&lt;br /&gt;With the knowledge of inter-arrivals, dependent vs independent arrivals, and the accepted industry assumption, let's now look at the pattern and the average of inter-arrivals.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Inter-Arrival Average and Pattern&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Let's explore how the arrivals visually appear on a time line. This is a fantastic way to grasp the fact that it is entirely possible for inter-arrival &lt;i&gt;average times&lt;/i&gt; to be the same but the &lt;i&gt;pattern of arrivals&lt;/i&gt;&amp;nbsp;be different. The pattern of inter-arrivals above was exponential, but what does that pattern and other patterns look like on a time line?&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUl36nVw5cI/AAAAAAAAAMk/r6sGEOo8OBA/s1600/TimeLine+Plot+2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="243" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUl36nVw5cI/AAAAAAAAAMk/r6sGEOo8OBA/s320/TimeLine+Plot+2.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;If you need a quick statistical distribution refresher, I blogged about this topic &lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The image above shows four different inter-arrival patterns, each with the an average inter-arrival time of 5 ms/trx. The &lt;b&gt;top time-line&lt;/b&gt; is a constant inter-arrival rate (a perfect uniform distribution), that is, every 5 ms another transaction arrives...if only we were so lucky! The &lt;b&gt;second from top&lt;/b&gt; time-line shows transactions arriving in a normally distributed pattern, with the average inter-arrival time of 5 ms along with some variance. In fact, I have defined the standard deviation to be 2. The &lt;b&gt;third from top&lt;/b&gt;&amp;nbsp;shows transactions arriving in a log normal distributed pattern, with the average inter-arrival time pulled from a normal distribution with an average of 5 ms and the standard deviation of 2. (Confused? Read &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html"&gt;this&lt;/a&gt;&lt;/b&gt; blog post.)&amp;nbsp;The fourth from top, that is, the bottom time-line shows transactions arriving in an exponentially distributed pattern with the average inter-arrival time of 5 ms. As you can see, there is more to arrivals then just the average...the pattern is also very important.&lt;br /&gt;&lt;br /&gt;So important in fact, we can feel the difference! If you worked the counter at a fast food restaurant, which arrival pattern would you prefer? If you choose, exponential (bottom line) your nuts because sometimes you would be sitting around doing nothing while other times there would be people queued up and glaring at you! Most people (including our users) desire smoothness and predictability and so a constant (that is uniform) inter-arrival rate is what we like. When we analyze the experimental data, we'll get a picture of real Oracle SQL statement inter-arrival times!&lt;br /&gt;&lt;br /&gt;OK, I'm done with the background information. Now it's time to delve into what is really happening in production (not the lab) Oracle systems.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Experimental Design&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To demonstrate the pattern of SQL statement inter-arrivals I needed to create an experiment. I also wanted it to be easily performed by others in their production environments. I created a data collection script that will gather the inter-arrival times for a specific SQL statement (and specific plan) and record the results in an Oracle table for easy manipulation and retrieval. I also created a &lt;i&gt;Mathematica&lt;/i&gt; based notepad to perform the statistical analysis. The analysis is essentially a hypothesis test to determine if the collected inter-arrival times conform to a standard statistical distribution; normal, exponential, poisson, or log normal. Before we analyze the results, I need to describe how the data is collected.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Data Collection&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;How the data is collected is key to this experiment. If I botched anything in this experiment, it would be the data collection. It's tricky to get good inter-arrival times. To simplify the situation, the collection works better under certain circumstances.&amp;nbsp;If the same SQL statement is &lt;i&gt;not&lt;/i&gt; &lt;i&gt;waiting&lt;/i&gt; to be run by multiple sessions, it appears I can adequately determine when a SQL statement arrives by looking at it's execution start time, which will also be close to when &lt;b&gt;v$sqlstats&lt;/b&gt; inserts the first row for the statement or updates (i.e., refreshes) its existing row.&lt;br /&gt;&lt;br /&gt;As I demonstrated in my &lt;i&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/01/when-is-vsqlstats-refreshed.html"&gt;When Does V$SQLSTATS Get Refreshed&lt;/a&gt;&lt;/i&gt;&amp;nbsp;posting,&amp;nbsp;for SQL statements the execution column in &lt;b&gt;v$sqlstats&lt;/b&gt; is incremented when the statement begins (worst case when parsing ends). Therefore, when detecting an execution count change, we know the SQL statement has begun and therefore arrived. The arrival time is logged in the &lt;b&gt;op_results_raw&lt;/b&gt; table.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/sql_interarrival/SqlInterArrivalCollection1a.txt"&gt;Click here&lt;/a&gt;&lt;/b&gt;&amp;nbsp;and you can view a text file that introduces the experiment, shows the actual collection and extraction code, and step-by-step how to perform the experiment yourself. I also include some sample data that was taken from one of my test systems.&lt;br /&gt;&lt;br /&gt;The collection procedure samples from&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;in a tight loop (you can insert a delay however) and when the execute count changes, the time is recorded in the &lt;b&gt;op_results_raw&lt;/b&gt; table. The collection procedure does&amp;nbsp;not query from the &lt;b&gt;v$sqlstats&lt;/b&gt; underlying&amp;nbsp;&lt;b&gt;x$kkssqlstat&lt;/b&gt;&amp;nbsp;fixed table because this may limit your ability (think: security issues) to collect the data. However, if you wanted, all you would need to do to use&amp;nbsp;&lt;b&gt;x$kkssqlstat&lt;/b&gt;&amp;nbsp;is simply substitute the object name in the collection script and of course, connect as&amp;nbsp;&lt;b&gt;sys&lt;/b&gt;&amp;nbsp;when you collect the data.&lt;br /&gt;&lt;br /&gt;The collection procedure,&amp;nbsp;&lt;b&gt;get_sql_arrival_rate_prc&lt;/b&gt; takes four parameters.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;b&gt;sample_secs_in&lt;/b&gt; is the number of seconds to sample from &lt;b&gt;v$sqlstats&lt;/b&gt;.&lt;/li&gt;&lt;li&gt;&lt;b&gt;delay_secs_in&lt;/b&gt; is the number of seconds to sleep between samples. Setting this to zero will give you the best data. However, without a delay the sampling script will likely consume 100% of one of your CPU cores. So be very careful! Changing the delay parameter from 0 to 1 can make a big difference in the monitoring procedure's CPU consumption.&lt;/li&gt;&lt;li&gt;&lt;b&gt;sql_id_in&lt;/b&gt; is used, in part, to uniquely identify the SQL you are interested in understanding its arrival pattern.&lt;/li&gt;&lt;li&gt;&lt;b&gt;plan_hash_value_in&lt;/b&gt; is used, in part, to uniquely identify the SQL you are interested in.&lt;/li&gt;&lt;/ol&gt;Let's assume I want to sample without any delay for 60 seconds looking only at the SQL statement with a &lt;b&gt;sql_id&lt;/b&gt; of&amp;nbsp;acz1t53gkwa12 and a &lt;b&gt;plan_hash_value&lt;/b&gt; of&amp;nbsp;4269646525. This is one way to setup running the procedure and then doing so:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;def sql_id_in=acz1t53gkwa12&lt;br /&gt;def plan_hash_value_in=4269646525&lt;br /&gt;def sample_duration_secs_in=60&lt;br /&gt;def delay_secs_in=0&lt;br /&gt;&lt;br /&gt;alter session set commit_write="batch,nowait";&lt;br /&gt;set serveroutput on&lt;br /&gt;exec get_sql_arrival_rate_prc(&amp;amp;sample_duration_secs_in,&amp;amp;delay_secs_in,'&amp;amp;sql_id_in',&amp;amp;plan_hash_value_in);&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;To reduce the overhead of inserting the experimental data into an Oracle table, I utilize Oracle's&lt;i&gt; commit write&lt;/i&gt; facility. I reference the &lt;b&gt;commit_write&lt;/b&gt; setting on page 302 in the fourth printing of my book, &lt;a href="http://resources.orapub.com/Oracle_Performance_Firefighting_Book_p/ff_book.htm"&gt;Oracle Performance Firefighting&lt;/a&gt; and also discuss this in &lt;a href="http://training.orapub.com/content_firefighting.asp"&gt;performance firefighting&lt;/a&gt; course as well. This is a perfect use for the facility.&lt;br /&gt;&lt;br /&gt;After the 60 seconds I should have some rows inserted into the&amp;nbsp;&lt;b&gt;op_results_raw&lt;/b&gt; table. I show a number of short SQL statements in the experimental text file (again, &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/sql_interarrival/SqlInterArrivalCollection1a.txt"&gt;click here to download&lt;/a&gt;&lt;/b&gt;) which, to not bore you death, I do not show here. But what I'm really interested in is the list of inter arrival times. For example, in one of my sample runs on an experimental system, here is the first 19 inter-arrival sample times (in seconds).&lt;br /&gt;&lt;pre&gt;&lt;code&gt;1.017546,&lt;br /&gt;1.01921,&lt;br /&gt;1.018855,&lt;br /&gt;1.106451,&lt;br /&gt;1.01858,&lt;br /&gt;1.018235,&lt;br /&gt;1.017525,&lt;br /&gt;1.025062,&lt;br /&gt;1.023676,&lt;br /&gt;1.02267,&lt;br /&gt;1.026292,&lt;br /&gt;1.026526,&lt;br /&gt;1.023412,&lt;br /&gt;1.019862,&lt;br /&gt;1.018121,&lt;br /&gt;1.018232,&lt;br /&gt;1.107351,&lt;br /&gt;1.045038,&lt;br /&gt;1.01754,&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;With only a few values, I can call on my good friend &lt;b&gt;&lt;a href="http://www.wolframalpha.com/"&gt;WolframAlpha&lt;/a&gt;&lt;/b&gt; to quickly and easily create a histogram. All I need to do is remove the ending comma, enclose the list of values in curly braces, go to &lt;a href="http://www.wolframalpha.com/"&gt;www.wolframalpha.com&lt;/a&gt;, copy and past in the list, and submit the request. In a couple of seconds, Mr. WolframAlpha will present me with, among other things, the histogram shown below.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TS5VKrpgrUI/AAAAAAAAAI4/owxm4uqaXP0/s1600/smallSampleHist.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TS5VKrpgrUI/AAAAAAAAAI4/owxm4uqaXP0/s1600/smallSampleHist.gif" /&gt;&lt;/a&gt;&lt;/div&gt;What type of distribution does this look like; constant, normal, random, exponential, or some other?&lt;br /&gt;&lt;br /&gt;So that's how the experimental data was gathered and I'm hoping you'll be motivated to do the same. Let's move on to two actual production collections and then make some final conclusions.&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Analysis of Sample Set One&lt;/span&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Data Set: DaveB-OLTP-1&lt;br /&gt;&lt;br /&gt;Note: You can&amp;nbsp;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/sql_interarrival/SQLArrivalAnal_2b_oltp1.pdf"&gt;download this full analysis in pdf format here&lt;/a&gt;&lt;/b&gt;. It is the Mathematica notepad (printed to PDF) used to analyze the data including the statistical hypothesis testing and plenty of graphs. I also liberally commented the notepad so both you can I can follow along.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;The SQL statement comes from a production OLTP intense Oracle system. Over the 60 second sample period, the SQL statement was executed 321 times, hence we have 321 samples. Numerically, the average inter-arrival time was 186.8 ms, median 59.1 ms, standard deviation 1384.8 ms, with a minimum value of 0.135 ms and a maximum value of 19182.7 ms. Talk about variance! Notice the median is less than half of the mean. If was to randomly select one of the sample inter-arrival times, I'm likely to pick a value around the median. Said another way, the inter-arrival time is more likely to be around 59 ms than 187 ms.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TUl-kSjjZiI/AAAAAAAAAMo/r7V_dnagB7Q/s1600/DaveB-oltp+Hist+100pct.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="133" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TUl-kSjjZiI/AAAAAAAAAMo/r7V_dnagB7Q/s200/DaveB-oltp+Hist+100pct.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;While clearly not visually exciting, above is the histogram for the &lt;i&gt;entire&lt;/i&gt; data set. (The horizontal axis unit of time is microseconds.) &amp;nbsp;It clearly shows our data is massively dispersed. Why? Two simple reasons: the only reason the horizontal axis goes out so far to the right is because there are actual sample values out there...just not that many of them! Plus the maximum value sample would be placed near the far right horizontal axis around 190,000 micro-seconds. These far-right samples are not anomalies, but actual sample values and you'll see that they appear in all three data sample sets...so I'm not going to ignore them. They force me to understand that while the relatively massive inter-arrival times are indeed rare, they are so massive they effectively pull the mean (187 ms) away and to the right of the median (59 ms). However, I am also interested in the bulk of the data, so I created another histogram.&lt;/div&gt;&lt;div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TUhm4SimbAI/AAAAAAAAAMc/cBj0jy2AmpA/s1600/DaveB-oltp1+Hist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="136" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TUhm4SimbAI/AAAAAAAAAMc/cBj0jy2AmpA/s200/DaveB-oltp1+Hist.png" style="cursor: move;" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Because of the massive dispersion of the data (difference in min and max values), showing all the data limits our ability see where and how most of the samples values group. The above histogram shows 95% of the data. It's the top 5% (and perhaps less) that contain the massive inter-arrival times. Because there are 321 samples and the above histogram contains 95% of the smallest values, the above histogram contains 305 samples (perhaps 304...I didn't count).&lt;br /&gt;&lt;br /&gt;Does this inter-arrival sample set conform to the normal, poisson, exponential, or log normal distribution? Visually it sure doesn't look like it! And our hunch is correct. The statistical fitness (hypothesis) test clearly showed the difference between the sample set and each of the listed distributions could not be explained by randomness....so statistically we must assume they are different. That's a lot of words to simply say, the data does not match any of the tested distributions.&amp;nbsp;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Analysis of Sample Set Two&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Data Set: DaveB-3&lt;br /&gt;&lt;br /&gt;The SQL statement comes from a production OLTP intense Oracle system. Over the 60 second sample period, the SQL statement was executed only 81 times, hence we have only 81 samples. Numerically, the average inter-arrival time was 705.0 ms, median 149.0 ms, standard deviation 1172.1 ms, with a minimum value of 3.67 ms and a maximum value of 5649.2 ms. Again, massive variance! Notice that once again the median is less than half of the mean. The inter-arrival time is more likely to be around 149 ms than 705 ms.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TUhj-7bQLII/AAAAAAAAAMU/iUlY4eJ_H4s/s1600/DaveB-3+Hist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="136" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TUhj-7bQLII/AAAAAAAAAMU/iUlY4eJ_H4s/s200/DaveB-3+Hist.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;Because the data is so massively dispersed, just as with the previous sample set, I only show the lower 95% of the sample data. This means, the histogram contains 77 samples, not the full 81. This allows us to focus in on the bulk of the data and the ever-interesting (perhaps important) far left histogram bars.&lt;br /&gt;&lt;br /&gt;Does this inter-arrival sample set conform to the normal, poisson, exponential, or log normal distribution? Visually it looks like perhaps we found either an exponential or log normal match! Sorry...our hunch is incorrect. The statistical fitness (hypothesis) test clearly showed the differences between the sample set and each of the listed distributions could not be explained by just randomness....so statistically we must assume they are different. That's a lot of words to simply say, the data does not match any of the tested distributions. Bummer. Oh well... on to the third sample set.&lt;br /&gt;&lt;br /&gt;&lt;div style="font: 12.0px Helvetica; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;span class="Apple-style-span" style="font-family: Times; font-size: large;"&gt;Analysis of Sample Set Three&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;Data Set: DaveB-dw1&lt;br /&gt;&lt;br /&gt;The SQL statement comes from a production data warehouse Oracle system. Over the 60 second sample period, the SQL statement was executed 33974 times...yes this is correct. So we have lots of samples! Numerically, the average inter-arrival time was 1.76 ms, median 0.35 ms, standard deviation 17.79 ms, with a minimum value of 0.093 ms and a maximum value of 2859.76 ms. Again, massive variance! Notice that once again the median is less than half of the mean. The inter-arrival time is more likely to be around 0.35 ms than 1.76 ms.&lt;br /&gt;&lt;br /&gt;Below are five small histogram composed of various percentages of the data. Notice that as the largest values are excluded, we get an interesting glimpse of the majority of the samples. The just below histogram shows 100% of the data whereas the bottom histogram shows the 85% of the data, the 85% of the smallest values.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TUmYErUQA8I/AAAAAAAAAM8/bJ3oOkIXmSQ/s1600/dw1+hist+100pct.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="128" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TUmYErUQA8I/AAAAAAAAAM8/bJ3oOkIXmSQ/s200/dw1+hist+100pct.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TUmYETfh5iI/AAAAAAAAAM4/5nieAd0JPrY/s1600/dw1+hist+97pct.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;/a&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TUmYETfh5iI/AAAAAAAAAM4/5nieAd0JPrY/s1600/dw1+hist+97pct.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="128" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TUmYETfh5iI/AAAAAAAAAM4/5nieAd0JPrY/s200/dw1+hist+97pct.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TUmYDyd41dI/AAAAAAAAAM0/t6BJln5we4k/s1600/dw1+hist+95pct.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="128" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TUmYDyd41dI/AAAAAAAAAM0/t6BJln5we4k/s200/dw1+hist+95pct.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUmYDStOBuI/AAAAAAAAAMs/cvWOHvZDmSo/s1600/DaveB-dw1+Hist+90pct.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="130" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUmYDStOBuI/AAAAAAAAAMs/cvWOHvZDmSo/s200/DaveB-dw1+Hist+90pct.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TUmYDmzMjdI/AAAAAAAAAMw/wgaYQRhEepA/s1600/dw1+hist+85pct.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="130" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TUmYDmzMjdI/AAAAAAAAAMw/wgaYQRhEepA/s200/dw1+hist+85pct.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;If you're like me, you're wondering if the 85% histogram confirms to one of our common statistical distributions; perhaps normal? So once again, I performed a goodness of fit hypothesis test comparing the lowest 85% of the sample data values to the normal, poisson, exponential, and log normal distributions. Yet again, they all "failed" the test, which means they are statistically so different, randomness can not account for the difference. Bummer...&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Predicting the Median Values&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As I was performing the analysis, I noticed that the assuming the sample data is log normal distributed, which it is not, the predicted median value was kind of close to the actual median value. So I thought I would document this to force a more realistic look at the situation. Below is the actual results from our three data sets.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUm0SrvO8JI/AAAAAAAAANA/pksc0TUBs8g/s1600/Prediected+Median+Values.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="68" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUm0SrvO8JI/AAAAAAAAANA/pksc0TUBs8g/s320/Prediected+Median+Values.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;What conclusions can be draw? None...we only have three sample sets. So while I remain hopeful we can reliably predict the median, I simply do not have enough data sets to responsibly act on that hope. So.... please send me your data sets. If I receive enough I will be able to do a solid statistical analysis and post the results.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Conclusions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It is what it is... all three of our sample sets failed to statistically match the normal, exponential, poisson, or log normal distributions.&amp;nbsp;Certainly this is not the results I would have liked to see. But even so, we can draw some useful conclusions, that you can check for yourself. (In fact, if you send me your experimental data, I will run it though my analysis and email you back the results.)&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;u&gt;Don't be fooled.&lt;/u&gt; The Oracle SQL statement inter-arrival rates did not statistically conform to the normal, exponential, poisson, or log normal distributions. If someone claims otherwise, ask for the experimental data.&lt;/li&gt;&lt;li&gt;&lt;u&gt;Academically interesting.&lt;/u&gt; The median was always less than half the mean. If you find the average inter-arrival time is 1ms, I would feel comfortable going with the assumption the median is at least half of the mean.&lt;/li&gt;&lt;li&gt;&lt;u&gt;Strange data can be real data.&lt;/u&gt; A very small subset (less than 5%) of the samples are likely to be at least a factor of 10 larger than the mean. They may seem like outliers, but all of our sample sets show these exceptionally large inter-arrival times will occur, which means they are not an anomaly.&lt;/li&gt;&lt;li&gt;&lt;u&gt;Always validate forecasts.&lt;/u&gt; All classic computing system predictive work assumes inter-arrival rates to exponentially distributed. But our data is clearly not...so while I can make predictions, we can see one reason why our predictions are not always spot on! This is just one reason why I stress in my courses (especially my &lt;a href="http://training.orapub.com/content_forecasting.asp"&gt;&lt;b&gt;Oracle Forecasting and Predictive Analysis&lt;/b&gt;&lt;/a&gt; course) the need to validate our forecasts.&lt;/li&gt;&lt;li&gt;&lt;u&gt;Expect the unexpected, just not that often.&lt;/u&gt; In my mind, this is by far &lt;b&gt;the most practical application of this research&lt;/b&gt;.&amp;nbsp;Because inter-arrival times are clearly not constant and vary wildly, it should not surprise us when a non-heavily loaded system experiences an "unexpected" and perhaps brief slowdown. The slowdown may be short-lived, but it will occur just not that often...but with increasing likelihood as your system approaches the elbow of the response time curve. The way to reduce the likelihood of this slowdown occurring is to, in some way, influence your system to operate at a perhaps surprisingly low utilization for the most limited resource (e.g., CPU, IO, application object). If you're interested in this topic, I highly recommend Taleb's book, &lt;a href="http://www.amazon.com/gp/product/1400067936?ie=UTF8&amp;amp;tag=orin-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=1400067936"&gt;Fooled by Randomness.&lt;/a&gt;&lt;/li&gt;&lt;/ol&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;References regarding arrival patterns:&lt;br /&gt;&lt;br /&gt;1. Gunther, 1998,&amp;nbsp;&lt;i&gt;&lt;a href="http://www.amazon.com/gp/product/059512674X?ie=UTF8&amp;amp;tag=orin-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=059512674X"&gt;The Practical Performance Analyst&lt;/a&gt;&lt;img alt="" border="0" height="1" src="http://www.assoc-amazon.com/e/ir?t=orin-20&amp;amp;l=as2&amp;amp;o=1&amp;amp;a=059512674X" style="border: none !important; margin: 0px !important;" width="1" /&gt;&lt;/i&gt;, pg 88&lt;br /&gt;2. Jain, 1991,&amp;nbsp;&lt;i&gt;&lt;a href="http://www.amazon.com/gp/product/0471503363?ie=UTF8&amp;amp;tag=orin-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=9325&amp;amp;creativeASIN=0471503363"&gt;The Art of Computer Systems Performance Analysis&lt;/a&gt;&lt;img alt="" border="0" height="1" src="http://www.assoc-amazon.com/e/ir?t=orin-20&amp;amp;l=as2&amp;amp;o=1&amp;amp;a=0471503363" style="border: none !important; margin: 0px !important;" width="1" /&gt;&lt;/i&gt;, pg 488, 496, 508&lt;br /&gt;3. Shallahamer, 2007,&amp;nbsp;&lt;i&gt;&lt;a href="http://resources.orapub.com/Forecasting_Oracle_Performance_Book_p/fop_book.htm"&gt;Forecasting Oracle Performance&lt;/a&gt;&lt;/i&gt;, pg 60-61&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. Another option is to send an email to OraPub's general email address, which is currently orapub@comcast.net.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-3505112948356665175?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/3505112948356665175/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/02/sql-arrival-patterns-and-impact.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/3505112948356665175'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/3505112948356665175'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/02/sql-arrival-patterns-and-impact.html' title='SQL Arrival Patterns and Impact'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_FEKH6HhYAEI/TUgvotELBqI/AAAAAAAAAME/8isjse-4kTo/s72-c/mall+hist+18.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-4909810862902100767</id><published>2011-02-08T06:03:00.000-08:00</published><updated>2011-02-08T06:03:00.632-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='poisson'/><category scheme='http://www.blogger.com/atom/ns#' term='v$sqlstats'/><category scheme='http://www.blogger.com/atom/ns#' term='log normal'/><category scheme='http://www.blogger.com/atom/ns#' term='bind variable'/><category scheme='http://www.blogger.com/atom/ns#' term='normal distribution'/><category scheme='http://www.blogger.com/atom/ns#' term='distribution'/><category scheme='http://www.blogger.com/atom/ns#' term='exponential'/><category scheme='http://www.blogger.com/atom/ns#' term='statistical distribution'/><category scheme='http://www.blogger.com/atom/ns#' term='sql_id'/><category scheme='http://www.blogger.com/atom/ns#' term='elapsed time'/><category scheme='http://www.blogger.com/atom/ns#' term='mathematica'/><category scheme='http://www.blogger.com/atom/ns#' term='plan_hash_value'/><title type='text'>SQL Statement Elapsed Times</title><content type='html'>&lt;b&gt;The Situation&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In my performance work I often am presented with a Statspack or AWR report. After performing my 3-Circle Analysis the key SQL statement(s) is easily identified. Obviously my client and I want to know key characteristics about that SQL statement. The standard reports presents me with only total values such as elapsed time, executions, physical IO blocks read, CPU consumption, etc. that occured during the report interval. From an elapsed time perspective, all I can determine is the &lt;i&gt;average&lt;/i&gt; elapsed time...not a whole lot of information and as I'll detail below, it has limited value and can easily cause miscommunication.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The problem and my desire&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I find that in communicating using only the average SQL elapsed time my audience makes a number of assumptions which, are many times incorrect. Plus I simply cannot glean very much from what the standard reports present. This is another way of saying my communication skills need some work and I need more data! I want to better understand, describe, and communicate SQL statement elapsed times. So in essence, my desire is to be able to responsibly provide more information with very limited inputs.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The value of a conforming SQL statement&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Now if a SQL statement elapsed time conforms to a &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html"&gt;known statistical distribution&lt;/a&gt;&lt;/b&gt;, then with minimal inputs I will be able to responsibly say much more, communicate better, and serve my clients and students better. But even if it does &lt;i&gt;not&lt;/i&gt; statistically conform, if a general pattern develops I may be able to say something like, "The typical elapsed time will most likely be much less than the average." Just saying that will be valuable to me.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;It's not normal&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2010/11/average-challenge-part-1.html"&gt;Last November I blogge&lt;/a&gt;&lt;/b&gt;d about how given an average value, most DBAs will assume a value is just as likely to be greater than or less than the average. This is another, albeit very simplified and not entirely true, way of saying the samples are &lt;b&gt;&lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html"&gt;normally distributed&lt;/a&gt;&lt;/b&gt;. One of my objectives in the initial "average" blog entry was to demonstrate that Oracle's wait events are not normally distributed. For example, given the average &lt;b&gt;db file scattered read&lt;/b&gt; wait time of 10ms, it is highly unlikely there are just as many waits greater than 10ms as there are less then 10ms. What we clearly saw in that blog entry was that more wait occurrences took less than 10ms than longer than 10ms. &lt;b&gt;In this blog entry we'll take a giant leap forward.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Check This Out For Yourself&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I spent quite a bit of time preparing this blog entry. In part because I want others to be able to do exactly what I have done, that is, to check this out for themselves. So I created a tool kit that anyone can use to investigate what I'm writing about on their systems or they can use the tool kit on their test system using sample data the tool kit quickly generates. While all the actual &lt;a href="http://filezone.orapub.com/Research/sql_elapsed_distribution/AnalysisPack.zip"&gt;&lt;b&gt;data samples and the associated analysis tools in this blog entry can be found here&lt;/b&gt;&lt;/a&gt;, the data collection tool kit used to gather the sample data can be downloaded, installed, and used for free. To get the tool kit, go to http://www.orapub.com and then search for "sql distribution". &lt;b&gt;&lt;a href="http://resources.orapub.com/SearchResults.asp?Search=sql+distribution"&gt;Here's a link&lt;/a&gt;&lt;/b&gt; to help you get started.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Experimental Design&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This experiment requires two Oracle users each with their own Oracle SQL*Plus connection: the monitoring user and an application user. You can run this yourself (test or production system) and see the details by downloading and using the tool kit. The monitoring session quickly and repeatedly queries from &lt;b&gt;v$sqlstats&lt;/b&gt; looking for refreshed rows related to a specific SQL statement, which is uniquely identified by its &lt;b&gt;sql_id&lt;/b&gt; and &lt;b&gt;plan_hash_value&lt;/b&gt;. The algorithm creatively (it's pretty cool, check it out) determines when the statement begins, ends, and also collects timing and resource consumption details. &lt;br /&gt;&lt;br /&gt;It is limited however; to simplify the data collection strategy, concurrent executions are averaged when they complete. To mitigate this factor, the chosen SQL statements where somewhat unlikely to be executed concurrently. Also, to reduce the likelihood of bogus data being recorded, in some cases potentially incorrect samples are discarded.&amp;nbsp;My data collection tool is clearly not the ultimate, but it's the best I have at the moment and I have taken specific steps to reduce the likelihood of bogus data being part of the final experimental sample set.&lt;br /&gt;&lt;br /&gt;The results are inserted into the results table,&amp;nbsp;&lt;b&gt;op_sql_sample_tbl&lt;/b&gt;. When the data collection has finished, a simple query formats the results so they can be easily understood and also will feed nicely into a &lt;i&gt;Mathematica&lt;/i&gt; notepad for analysis. The notepad file is also provided in the tool kit.&lt;br /&gt;&lt;br /&gt;Below is an example of what a run result can look like. For readability I replaced the real &lt;b&gt;sql_id&lt;/b&gt; and &lt;b&gt;plan_hash_value&lt;/b&gt; with abc&amp;nbsp;and 123 respectively.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;def sql_id_in='abc'&lt;br /&gt;def hash_plan_value_in=123&lt;br /&gt;def collection_period=60&lt;br /&gt;set serveroutput on&lt;br /&gt;exec get_sql_sample_prc(&amp;amp;collection_period,0,'&amp;amp;sql_id_in',&amp;amp;hash_plan_value_in);&lt;br /&gt;&lt;br /&gt;set tab off&lt;br /&gt;set linesize 200&lt;br /&gt;col SQL format a25&lt;br /&gt;col sample format 999999&lt;br /&gt;col PIOR_exe format 999999999.0000&lt;br /&gt;col LIO_exe format 999999999.0000&lt;br /&gt;col CPU_ms_exe format 999999999.0000&lt;br /&gt;col Wall_ms_exe format 999999999.0000&lt;br /&gt;select sql_id||','||plan_hash_value SQL,&lt;br /&gt;       sample_no sample,&lt;br /&gt;       executions execs,&lt;br /&gt;       disk_reads/executions PIOR_exe,&lt;br /&gt;       buffer_gets/executions LIO_exe,&lt;br /&gt;       cpu_time/executions/1000 CPU_ms_exe,&lt;br /&gt;       elapsed_time/executions/1000 Wall_ms_exe&lt;br /&gt;from   op_sql_sample_tbl&lt;br /&gt;order by 1,2;&lt;br /&gt;&lt;br /&gt;SQL      SAMPLE EXECS    PIOR_EXE     LIO_EXE      CPU_MS_EXE     WALL_MS_EXE&lt;br /&gt;-------- ------ ----- ----------- ----------- --------------- ---------------&lt;br /&gt;abc,123       1     1  46071.0000  69023.0000        847.8710        992.2190&lt;br /&gt;abc,123       2     1  46071.0000  69023.0000        830.8740        838.2920&lt;br /&gt;abc,123       3     1  46071.0000  69023.0000        854.8690        855.4500&lt;br /&gt;abc,123       4     1  46071.0000  69023.0000        809.8780        827.9930&lt;br /&gt;abc,123       5     3  46067.0000  69023.0000        840.5387        925.5117&lt;br /&gt;abc,123       6     1  46070.0000  69023.0000        853.8700       1171.6570&lt;br /&gt;abc,123       7     1  46070.0000  69023.0000        845.8720        847.1400&lt;br /&gt;abc,123       8     1  46070.0000  69023.0000        840.8720       1049.0460&lt;br /&gt;abc,123       9     2  46067.5000  69023.0000        861.8690       1115.8440&lt;br /&gt;abc,123      10     1  46070.0000  69023.0000        849.8710       1372.9310&lt;br /&gt;abc,123      11     1  46070.0000  69023.0000        838.8720        843.6460&lt;br /&gt;abc,123      12     1  46065.0000  69023.0000        852.8700        860.7630&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;Without access to &lt;i&gt;Mathematica&lt;/i&gt; to you can quickly create a histogram by selecting the &lt;b&gt;wall_ms_exe&lt;/b&gt; column, placing the values into a single line separated by commas, enclose the line in curly braces, pasting that into &lt;a href="http://www.wolframalpha.com/"&gt;http://www.wolframalpha.com&lt;/a&gt; and pressing the "=" submit button. Here's what I mean:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;{992.219, 838.292, 855.45, 827.993, 925.511667, 1171.657, 847.14, 1049.046, 1115.844, 1372.931, 843.646, 860.763}&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;Even with this limited sample data set above from one of my test systems, you'll get tons of statistical data including the below histogram I snipped from the WolframAlpha's on-line result page.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TSy22qLUpAI/AAAAAAAAAH4/V5cFgvKT6rE/s1600/wolframalpha-20110111135857827.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TSy22qLUpAI/AAAAAAAAAH4/V5cFgvKT6rE/s1600/wolframalpha-20110111135857827.gif" /&gt;&lt;/a&gt;&lt;/div&gt;(I'm still amazed WolframAlpha is on-line and for free.) I will explain the results in the below analysis section.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Real Experimental Results&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Real Production Data&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;While I could have used data from my in-house test systems, for this experiment I wanted to use real SQL statements from real production Oracle systems. I found two volunteers who, using my tool kit, gathered data for a number of SQL statements. A profound thanks go out to &lt;a href="http://aberdave.blogspot.com/"&gt;Dave Abercrombie&lt;/a&gt; and Garret Olin, who have both provided me with real performance data, and in the past as well! Dave is also my &lt;i&gt;Mathematica&lt;/i&gt; mentor and statistical &lt;a href="http://filezone.orapub.com/images/yoda.jpg"&gt;Jedi Master&lt;/a&gt;. He also writes a very &lt;a href="http://aberdave.blogspot.com/"&gt;insightful blog&lt;/a&gt; you may want to follow. While both Dave and Garret have helped me in this quest, they are in no way responsible for any mistakes or botches I may have made.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Lots of Production Data&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Even using the snippet of data above along with WolframAlpha you get the gist of the experimental results. But as I demonstrated in my &lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html"&gt;Important Statistical Distributions blog postin&lt;/a&gt;g, looks can be deceiving! Plus it's always nice to get lots of data. I asked Dave and Garret for at least a hundred samples. This reduces the risk of anomalous data making a significant impact...not to mention people decrying the conclusions as bogus and insignificant.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Statistical Distributions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Before I get into the analysis of the experimental results, it's extremely important to understand a histogram, five common distributions, the histograms for these distributions, and a few key characteristics of each. These key statistical distributions are uniform, normal, exponential, poisson, and log normal. If you are not familiar with these, please reference &lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html"&gt;&lt;b&gt;my blog posting about Statistical Distributions&lt;/b&gt;&lt;/a&gt;. That blog entry was originally part of this post, but it grew too large and can stand on its own.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Analysis of Experimental Results&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;My colleagues collected four SQL statements from OLTP systems.&amp;nbsp;I performed statistical fitness tests checking to see if the sample sets, statistically speaking (alpha=0.05) conform to a normal, exponential, poisson, or log normal distribution.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;Statistically speaking, all four sample sets did &lt;/b&gt;&lt;/i&gt;&lt;b&gt;&lt;i&gt;not&lt;/i&gt;&lt;/b&gt;&lt;i&gt;&lt;b&gt; match any of the above distributions.&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;In every case, the log normal distribution was the closest match, but still it was not even close to statistically conforming.&lt;br /&gt;&lt;br /&gt;If you want to check out the statistical work which includes &lt;b&gt;&lt;i&gt;lots&lt;/i&gt;&lt;/b&gt; of comments, math, and graphs, you can download the &lt;i&gt;Mathematica&lt;/i&gt; notepad, a PDF of one of the experiments, data sets, etc. by &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/sql_elapsed_distribution/AnalysisPack.zip"&gt;clicking here&lt;/a&gt;&lt;/b&gt;. It's pretty cool.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Initial visual test passed nearly every time&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I want to demonstrate just how easy it is to be deceived. The below histogram was taken from one of the four elapsed time sample sets (ref: Garr 8q, 97% of histogram data shown).&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUrBkKoIBHI/AAAAAAAAANI/EkCcKiwSqTs/s1600/Garr+8q+Hist+97pct.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="211" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUrBkKoIBHI/AAAAAAAAANI/EkCcKiwSqTs/s320/Garr+8q+Hist+97pct.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;Looking at this histogram, the data looks exponential, perhaps poisson or log normal. This sample data failed to match all of these distributions. Not only did the hypothesis test fail, but it was not even close! My lesson from this is, the next time someone tells me the histogram represents data that is exponentially, poisson, or log normal I will kindly demand to see the fitness (statistical hypothesis) test comparing the experimental data with the stated &lt;i&gt;fitted&lt;/i&gt; data distribution.&lt;br /&gt;&lt;br /&gt;By the way, if you recall in &lt;a href="http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html"&gt;Important Statistical Distribution blog entry&lt;/a&gt; about if the sample data is log normal, you can take the log of each sample, place it into a histogram, and the resulting histogram will look normal. Well...using the data from the above histogram, I did just that and the histogram below was the result!&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUrB3sRIsdI/AAAAAAAAANM/yyykYXltUSk/s1600/Gar+8q+Log+Trans+Hist+97pct.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="216" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUrB3sRIsdI/AAAAAAAAANM/yyykYXltUSk/s320/Gar+8q+Log+Trans+Hist+97pct.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;This is a visually clear sign that our data set is not log normal distributed! But looking at the experimental data histogram we could not visually determine this. Personally, I find this very interesting...but time constraints did not allow me to delve deeper.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Beware of bind variables!&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The data collection processes identifies the statement based on the combination of a &lt;b&gt;sql_id&lt;/b&gt; and &lt;b&gt;plan_hash_value&lt;/b&gt;. That's good because there can be many plans for the same &lt;b&gt;sql_id&lt;/b&gt;. This allows us to differentiate between execution plans! However, there can also be many bind variable combinations applied to the same execution plan! If you see multiple clear histogram peaks (modes), there is likely a different bind variable set(s). There can also be multiple bind variable sets that result in the same elapsed time. This will effectively &lt;i&gt;stack&lt;/i&gt; a histogram bar.&amp;nbsp;Based on my conversations with the data provider of the image below, he knows there are multiple bind variable sets related to the same execution plan. The below histogram confirmed what he thought was occurring.&lt;br /&gt;&lt;br /&gt;Of the four data sets I received, there was one that I believe is actually far more common then we suspect. It's histogram is shown below (ref: Aber 31, 90% of histogram data shown).&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUrDA2M73kI/AAAAAAAAANQ/m-r8oD4j5CU/s1600/Amber+31+Hist+90pct.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="210" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUrDA2M73kI/AAAAAAAAANQ/m-r8oD4j5CU/s320/Amber+31+Hist+90pct.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;This histogram above shows the elapsed times for a single SQL statement with a single execution plan, yet with what looks to be two or three different bind variable sets. Notice there are at least three clear modes (peaks); near zero, near 10000 ms, and near 70000 ms. The actual average elapsed time is around 57200 ms. Is stating the average elapsed time is 57.2 seconds a good way to communicate the elapsed time? If I added the standard deviation is 104 seconds and we have 230 samples, it's obvious elapsed time ranges are all over the place and of course, never go negative.&lt;br /&gt;&lt;br /&gt;A better way to communicate the analysis is that user will tend to either have results returned around 10000 ms or around 70000 ms, but probably not around the average of 57200 ms. There is a huge difference between 10 seconds and 70 seconds. Depending on your audience, actually showing them this histogram will immediately convey the complexity of the situation. &amp;nbsp;My point is, unless we actually gather the detailed data and plot a histogram, describing the situation in terms of average can easily mislead your audience.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Profiling&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;/b&gt;This also brings up another point. If I'm going to profile the SQL statement to tune it, which bind variable situation am I tuning for? Can I tune for both? The profile is likely to look very different based on the bind variables. I used to do a lot of SQL tuning and when I did, I always demanded real bind variables and I made sure everyone new the bind variables I was using. Even then I new that, when we go for 100% optimization, there are likely to be big winners...and also big losers. So we've got to be careful. Having a histogram of the actual elapsed time can help us understand the true situation.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Predictive possibilities&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;There is good news and there is bad news...&amp;nbsp;I thought for sure the experimental data would statistically conform to either the exponential, poisson, or log normal distribution...but it wasn't even close. The big disappointment is I cannot predict the median (or other statistics) based only on the average SQL elapsed time. But the results are what they are.&lt;br /&gt;&lt;br /&gt;While the comparison between the data sets and the distribution did not statistically match, in every case unless bind variables caused a massive difference in the elapsed time, the median is far less then the mean. And if I assume the data is log normal distributed, the predicted median value was higher then the actual median. That is, if I predict the median, the actual median is probably lower. Said another way, if the average is 1000 ms and the predicted median is 50 ms, the actual median is probably less then 50 ms and what the user is more likely to experience.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Pick one of your Statspack/AWR SQL Statements&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now let's apply this to a real life situation. Referencing&amp;nbsp;&lt;b&gt;&lt;a href="http://filezone.orapub.com/perf_stats/SP_PDXPROD.txt"&gt;one of my cleansed Statspack reports&lt;/a&gt;&lt;/b&gt; and then looking at the reported top CPU consumer, I see the following statistics during the one hour sample period/interval.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;    CPU                  CPU per             Elapsd                     Old&lt;br /&gt;  Time (s)   Executions  Exec (s)  %Total   Time (s)    Buffer Gets  Hash Value&lt;br /&gt;---------- ------------ ---------- ------ ---------- --------------- ----------&lt;br /&gt;  14456.36        7,000       2.07   86.3  497518.80   1,182,585,272  314328380&lt;/code&gt;&lt;/pre&gt;Over the one hour sample interval this SQL statement was executed 7000 times which took about about 497518.80 seconds in total, that is the wall time or the elapsed time. This means that during this one hour sample period and for this SQL statement, the average elapsed time was &lt;a href="http://www.wolframalpha.com/input/?i=497518.80%2F7000"&gt;71.1&lt;/a&gt; seconds.&lt;br /&gt;&lt;br /&gt;If the statement is using a single bind variable set or the bind variables are not causing wild elapsed time swings, then you would expect the typical elapsed time to be less then 71 seconds and perhaps around half that. We can also expect the elapsed time to occasionally be much longer than 71 seconds.&lt;br /&gt;&lt;br /&gt;If I needed more specific and more useful data, then I would use my data collection script to gather some data, plug it into my &lt;i&gt;Mathematica&lt;/i&gt; analysis notepad, and crank out the histogram.&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Unfortunately, because our analysis could not match the experimental data with a known statistical distribution, we can not confidently and responsibly make precise predictions about the expected median and other statistics. That's too bad, but that's the way it is...for now.&lt;/div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;If you really want to know...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you want to sincerely understand the elapsed time characteristics of a key SQL statement, then you will need to gather some data. Running the statement a few times from within the application or perhaps using SQL*Plus will get you some results, but that is not what we are looking for here. We want to see the elapsed times for the SQL statement as it occurs in your real production system. The best way to do this is through some automated tool. But what tool?&lt;br /&gt;&lt;br /&gt;Whatever that tool may be, it must gather &lt;i&gt;multiple individual elapsed times&lt;/i&gt; and then draw conclusions. Gathering the number of executions and the total elapsed time, even within an 30 minute interval, will only provide the average...and that's not what we're going for here.&lt;br /&gt;&lt;br /&gt;You &lt;b&gt;&lt;a href="http://resources.orapub.com/SearchResults.asp?Search=sql+distribution"&gt;could use my tool&lt;/a&gt;&lt;/b&gt; that my colleagues and I used to gather the data which I analyzed. But the sampling method (and to get the best samples) nearly 100% consumes a CPU core...not what I would call a production ready product! If I come up with a utility to gather this type of data with minimal overhead I'll be sure to blog about it and offer it to other DBAs.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Conclusions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Based upon my analysis I think it is reasonable to assume&amp;nbsp;given a SQL statement (the text, that is, the &lt;b&gt;sql_id&lt;/b&gt;) its elapsed time will not conform to any common statistical distribution. Why? Because for a given SQL statement there can be multiple execution plans and within each execution plan there can be multiple bind variable combinations. Even if you can demonstrate the SQL being investigated is running with the same execution plan and with the same bind variable set, as two of my sample sets showed, the elapsed times varied wildly and did not conform to the normal, exponential, poisson, or log normal distribution. While the distribution may look log normal, statistically it didn't pass the hypothesis test.&lt;br /&gt;&lt;br /&gt;Here's a few points that I will take away from this specific work:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;While this experiment did not &lt;i&gt;prove&lt;/i&gt; SQL statement elapsed times do not conform to a standard distribution, it certainly raises the question they may not conform. Why? Because every sample I have analyzed has shown its elapsed time distribution is not statistically normal, poisson, exponential, or log normal distributed. If someone states otherwise, respectfully ask to see their data and subsequent analysis.&lt;/li&gt;&lt;li&gt;Unless a SQL statement is using multiple bind variable sets, I would be very comfortable stating the median is less and possibly considerably less then the mean.&amp;nbsp;All our analysis data sets (and my experimental ones also) show the median is far less then the mean. At least by 50%.&lt;/li&gt;&lt;li&gt;Just because a data set visually looks very much like another data set or statistical distribution, does not mean it statistically matches...be careful.&lt;/li&gt;&lt;li&gt;If you really want to know the elapsed time characteristic of a key SQL statement, you must actually gather elapsed time samples...and lots of them. If you don't have access to &lt;i&gt;Mathematica&lt;/i&gt;, send me your data and I will gladly perform the analysis for you!&lt;/li&gt;&lt;/ol&gt;&lt;ol&gt;&lt;/ol&gt;I learned a lot from preparing this blog entry: My math skills have certainly increased, especially regarding hypothesis testing. My&amp;nbsp;&lt;i&gt;Mathematica&lt;/i&gt;&amp;nbsp;skills have also dramatically increased...this can be seen by viewing the notepad file I developed and used to analyze the data. Developing the data collection tool kit was very satisfying and I've already used to it to create another similar tool and I'm considering developing an advanced product version (stay tuned).&amp;nbsp;My SQL elapsed time assumptions were where seriously brought into question. I was able to come away with stronger conclusions about the typical SQL elapsed time then I could previously. Not a hunch, but based on a solid statistical analysis.&lt;br /&gt;&lt;br /&gt;Thanks for reading and I look forward to receiving data samples from you!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-4909810862902100767?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/4909810862902100767/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/02/sql-statement-elapsed-times.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/4909810862902100767'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/4909810862902100767'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/02/sql-statement-elapsed-times.html' title='SQL Statement Elapsed Times'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_FEKH6HhYAEI/TSy22qLUpAI/AAAAAAAAAH4/V5cFgvKT6rE/s72-c/wolframalpha-20110111135857827.gif' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-7034607475602592808</id><published>2011-02-02T06:00:00.000-08:00</published><updated>2011-02-06T08:26:08.637-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='uniform distribution'/><category scheme='http://www.blogger.com/atom/ns#' term='poisson distribution'/><category scheme='http://www.blogger.com/atom/ns#' term='normal distribution'/><category scheme='http://www.blogger.com/atom/ns#' term='log normal distribution'/><category scheme='http://www.blogger.com/atom/ns#' term='distribution'/><category scheme='http://www.blogger.com/atom/ns#' term='exponential distribution'/><title type='text'>Important Statistical Distributions...really</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: large;"&gt;Why Would I Ever Blog About This!?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;These past two months I have been doing a tremendous amount of research work. (I love it and it's being poured into my training courses.) I have been driven to achieve a deeper understanding of an "average" and what it implies and what is it worth in Oracle performance analysis.&amp;nbsp;As you will soon see in subsequent posts, understanding statistical distributions has been absolutely key to carrying out the research. &amp;nbsp;I also plan to reference this posting in future postings.&lt;br /&gt;&lt;br /&gt;Please, if you have run away from statistics before, I promise this will be different. It's written for the typical Oracle DBA, which means I wrote it for you and therefore I'm really hoping it makes lots of sense and will be useful in your career. &amp;nbsp;So please...read on.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;My Approach&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;My approach is to start with introducing the histogram. I'm doing this because it is a wonderful visual way to compare and contrast experimental sample sets and of course, statistical distributions. Then I will move on and introduce five different distributions. I didn't just randomly pick five. I chose these five because they are important for our work.&amp;nbsp;For each distribution I start by presenting an simple non-Oracle example. Then I get a little more technical by describing the inputs to create the distribution, for example the average. I also include other interesting tidbits. Here we go...&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Understanding a Histogram&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When presented with a sample set, such as {1.5, 3.2, 2.6, 4.2, 3.8, 2.1 , 5.1, 2.6, 6.5, 3.4, 4.2}, a fantastic way to visually grasp the data and to get a gut feel about it, is to create a histogram. The image below was created by simply copying and pasting the previous sample set directly into &lt;a href="http://www.wolframalpha.com/"&gt;WolframAlpha&lt;/a&gt;...give it a try.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TT94Yi48cVI/AAAAAAAAALE/gRk1hvluyso/s1600/wolframalpha-20110125192617467.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TT94Yi48cVI/AAAAAAAAALE/gRk1hvluyso/s1600/wolframalpha-20110125192617467.gif" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;A histogram's &lt;b&gt;vertical axis&lt;/b&gt; is the number of occurrences which, in our case is the number of samples. In our sample set there are 11 samples, which means there are 11 occurrences. Each of these 11 occurrences will be represented somewhere in the histogram.&lt;br /&gt;&lt;br /&gt;The &lt;b&gt;horizontal axis&lt;/b&gt; are the values of the samples. Scanning our 11 sample values you'll notice the minimum value is 1.5 and the maximum value is 6.5. The horizontal axis must include this range. For our sample set's histogram, the minimum value on the histogram is 1 and the maximum value is 7.&lt;br /&gt;&lt;br /&gt;Our first sample is 1.5 and is represented on this histogram as the bottom (that is, first) occurrence on the first bar from the left. The second sample 3.2 is represented as the first occurrence on the third bar from the left. Notice there are three samples between 3 and 4, hence that histogram bar is 3 occurrences high. If you count the number of occurrences in the histogram, you'll notice there are 11, which is also the number of our samples!&lt;br /&gt;&lt;br /&gt;What is interesting in this sample set's histogram is we can see the distribution is skewed to the left with a somewhat long right tail.&amp;nbsp;The mean, that is the average, is 3.6 and the median is 3.4. This tells us there are more samples to left of the mean than to right of the mean. Recognizing this difference is very important in Oracle performance analysis.&lt;br /&gt;&lt;br /&gt;That's the end of my introduction to histograms. The key is every sample is represented somewhere on the graph. Next I'm going to introduce the five statistical distributions every Oracle performance analyst needs to know about; uniform, normal, exponential, poisson, and finally the log normal distribution.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;b&gt;The Uniform Distribution&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Every programmer at some point has needed a random number. Most random number generators provide the ability to return a random number between two values, say 0 and 100. There is an important underlying assumption we typically don't think about. This assumption is any number is just as likely to be returned as any other number. Said another way, the likelihood of returning a 15 is just as likely as returning 55. Said yet another way, there is no preference toward returning a specific number or a group of numbers. This is another way of saying the distribution of results is uniform...hence the uniform distribution.&lt;br /&gt;&lt;br /&gt;Let's start with a specific quantify and numeric range of random numbers and place them into a histogram. The histogram below is based on a set of only 10 random real numbers between 0 and 100. I also specified there to be 10 histogram bins, that is, groups or buckets. I defined the random set of values by providing the minimum and maximum values.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TS3GiLBzSfI/AAAAAAAAAH8/OX90dYTaHGI/s1600/Random10Values.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="141" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TS3GiLBzSfI/AAAAAAAAAH8/OX90dYTaHGI/s200/Random10Values.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;By the way, in the figure above you should be able to count the total number of samples which, should add up to the number of samples (10).&amp;nbsp;That doesn't look very random because it contains only 10 values. If I increase the number of samples from 10 to 10000 the histogram looks very different. Below is the histogram containing 10000 random real numbers between 0 and 100.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TS3HehdKA9I/AAAAAAAAAIA/CKZ5kY0n59w/s1600/Random10000Values.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="138" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TS3HehdKA9I/AAAAAAAAAIA/CKZ5kY0n59w/s200/Random10000Values.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;This is more like it and what we expected to see. A way of interpreting this histogram is that we are about as likely to pick any number between 0 and 100... it's like random! ...that's because the sample set is full of random uniformly distributed numbers.&lt;br /&gt;&lt;br /&gt;The median for a uniform distribution is the same as the mean. If you were to sort all the samples and pick the one in the middle that would be the median and also the mean.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Normal Distribution&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;If I asked a group of people to measure the length of a physically present piece of wood down to the millimeter, I will receive a variety of answers. If took all the results (my sample set) and placed them into a histogram format, the result would be the classic bell curve, which is more formally known as the normal distribution.&lt;br /&gt;&lt;br /&gt;The key thing to remember about a normal distribution is there are just as many samples less than the mean than there are greater than the mean. This is another way of saying the median is equal to the mean.&lt;br /&gt;&lt;br /&gt;A normal distribution set of values is defined by its mean and standard deviation (which is a statistic that tell us about the dispersion of the samples). If I had a really cool random number generator, I could tell it to return a set of numbers that are normally distributed, with a mean of &lt;i&gt;m&lt;/i&gt;, and a standard deviation of &lt;i&gt;s&lt;/i&gt;. Thankfully, I do have a spiffy random number generator like this! I used a &lt;a href="http://reference.wolfram.com/mathematica/ref/NormalDistribution.html"&gt;&lt;i&gt;Mathematica&lt;/i&gt; command&lt;/a&gt;&amp;nbsp;(actually called a symbol, not a command) to generate a sample set used to create the below histograms!&lt;br /&gt;&lt;br /&gt;The histogram below is based on a set of only 10 normally distributed real numbers with a mean of 50 and a standard deviation of 2. I also specified there to be 10 bins, that is, groups or buckets.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TS3KsuI9oOI/AAAAAAAAAIE/iaHK-b6W3TM/s1600/Normal10Values.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="150" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TS3KsuI9oOI/AAAAAAAAAIE/iaHK-b6W3TM/s200/Normal10Values.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;That doesn't look very normal! That's because there are only 10 samples, but if we increase the number of samples to 10000 it looks more, well...normal.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TS3KvY2zT3I/AAAAAAAAAIM/1_Bf0hJpLA0/s1600/Normal10000Values10Bins.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="147" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TS3KvY2zT3I/AAAAAAAAAIM/1_Bf0hJpLA0/s200/Normal10000Values10Bins.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;But still, it's not that smooth because I set the number of bins to 10. If I let &lt;i&gt;Mathematica&lt;/i&gt; automatically set the number of bins, we see the classic looking normal distribution histogram. Remember these three "normal" distribution images have the same mean (average) and standard deviation.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TS3KvMy9g2I/AAAAAAAAAII/GyBW9cZgbPU/s1600/Normal1000ValuesXBins.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="149" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TS3KvMy9g2I/AAAAAAAAAII/GyBW9cZgbPU/s200/Normal1000ValuesXBins.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;The median for a normal distribution is the same as the mean. If you where to sort all the samples and pick the one in the middle that would be both the median and also the mean.&lt;br /&gt;&lt;br /&gt;The normal distribution is often used because it makes things easy, most people know what the histogram looks like, the math is more straightforward, and students are used to working with normally distributed data. People are drawn to symmetry...we want symmetry. If we see something that is not symmetrical, it's like we are forced to understand why...an that takes energy and time. So most people, including research scientists, tend to assume their data is normal. [&lt;i&gt;Log-normal Distributions across the Science: Key and Clues&lt;/i&gt;, BioScience, May 2001] &amp;nbsp;In my &lt;a href="http://resources.orapub.com/Forecasting_Oracle_Performance_Book_p/fop_book.htm"&gt;Forecasting Oracle Performance&lt;/a&gt; book, you will notice I write many times, "...assuming the samples are normally distributed..." By stating this, I am saying the math I'm going to use will work well only when the sample set is normally distributed. If the distribution is not normal, while the math will crank out a result, it will not be as reliable.&lt;br /&gt;&lt;br /&gt;As you dig deeper into Oracle performance analysis, you'll begin to realize that many performance related distributions are indeed not normal.&amp;nbsp;Most people think that most distributions are normal, but as I'm demonstrating in my blog entries there are many situations where this is not true...especially when we're talking about Oracle performance topics.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Exponential Distribution&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Radioactive material decays exponentially. For example, suppose in the next hour a piece of radioactive material has a 10% chance of splitting, therefore having a 90% chance of not splitting. Because radioactive decay occurs exponentially, within the next hour there will be a 5% chance the material will split and a 95% chance it will not split. Within the third hour there will be a 2.5% chance the material will split, with a 97.5% chance it will not split. And on and on...unfortunately forever. Here is our sample set in &lt;a href="http://www.wolframalpha.com/input/?i=plot+%7B%7B1%2C5%7D%2C%7B2%2C2.5%7D%2C%7B3%2C1.25%7D%2C%7B4%2C0.625%7D%2C%7B5%2C0.313%7D%2C%7B6%2C0.156%7D%2C%7B7%2C0.078%7D%7D"&gt;WolframAlpha-ready&lt;/a&gt; format, followed by the plot:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;plot {{1,5},{2,2.5},{3,1.25},{4,0.625},{5,0.313},{6,0.156},{7,0.078}}&lt;/code&gt;&lt;/pre&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TUCpaSsc-JI/AAAAAAAAALQ/QecdTWbjTpE/s1600/wolframalpha-20110126170721083.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="220" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TUCpaSsc-JI/AAAAAAAAALQ/QecdTWbjTpE/s320/wolframalpha-20110126170721083.gif" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;Crack open a book about queuing theory or computing system performance analysis and you'll see the words exponential distribution. It's one of the phrases that computer folks throw around, only few can talk about, and very few really understand. So in a few short paragraphs, I'm going to try and explain this&amp;nbsp;(that's the "talk about" part)&amp;nbsp;as clearly as I possibly can, without the aid of a white board and personal classroom interaction. If you want personal classroom interaction, &lt;a href="http://training.orapub.com/"&gt;&lt;b&gt;check out my courses&lt;/b&gt;&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The exponential distribution is like most other distributions in that it is defined by a small set of parameters. For the uniform distribution, the sample set definition is x number of random numbers (i.e., integers, real, etc.) between a minimum and maximum value. For the normal distribution, the sample set definition is x number of normally distributed numbers (i.e., integers, real, etc.) having a mean of &lt;i&gt;m&lt;/i&gt; and a standard deviation of &lt;i&gt;s&lt;/i&gt;. For the exponential distribution, the sample set definition is x number of exponentially distributed numbers having an average of &lt;i&gt;m&lt;/i&gt;. (Actually the real input parameter is 1/&lt;i&gt;m&lt;/i&gt;). There is only a single parameter!&lt;br /&gt;&lt;br /&gt;Unlike a uniform or normal distribution, there are more samples less than the mean then greater than the mean!&amp;nbsp;If you understand the second point above, then it will make sense that if you where to sort all the samples and pick the one in the middle (the median sample) its value would be less than the mean. Said another way, there are more samples less than the mean then greater then the mean. You can see this visually in the figures below. In the radioactive decay example above, the mean is 1.42 and the median is 0.625.&lt;br /&gt;&lt;br /&gt;Exponentially distributed sample sets are very common in computing system performance analysis. It is a given in capacity planning that the time between transaction arrivals (one of my next blog entries) and also how long it takes to service the transaction (not their waiting or queue time but the actual service) is exponentially distributed.&lt;br /&gt;&lt;br /&gt;The histogram below is based on a set of only 10 exponentially distributed real numbers with a mean of 50. I also specified there to be 10 bins, that is, groups or buckets.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TS3SByNDU-I/AAAAAAAAAIQ/YDp8at5OGW8/s1600/Expo10Values.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="150" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TS3SByNDU-I/AAAAAAAAAIQ/YDp8at5OGW8/s200/Expo10Values.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;Because there are only 10 samples, the histogram is very awkward looking. But even with only 10 samples, it seems to look different then both the 10 sample uniform and normal distribution histograms shown above. This histogram is also eerily similar to the histogram near the top of this blog based on the 12 sample SQL execution elapsed times...woops... sorry...that's at the top of my &lt;i&gt;next&lt;/i&gt; blog posting!&lt;br /&gt;&lt;br /&gt;The below two figures are based on the exact same sample set. The only difference is how I defined the histogram bins.&lt;br /&gt;&lt;br /&gt;The figure below is still an exponentially distributed set of values with a mean of 50 but I increased the number of samples from only 10 to 10000.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TS3SCUfI0OI/AAAAAAAAAIU/X8GD7gw7Q5A/s1600/Expo10000Values10Bins.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="149" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TS3SCUfI0OI/AAAAAAAAAIU/X8GD7gw7Q5A/s200/Expo10000Values10Bins.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;It is still chunky looking but that's because I set the number of bins to 10. I wanted to show you this because many times when doing experiments and performance analysis we don't have 10000 samples and so there may only be a few bins. Even with only 10 bins, the above pattern or &lt;i&gt;look&lt;/i&gt; is what you might see. Letting &lt;i&gt;Mathematica&lt;/i&gt; set the number of bins, we get the classic looking exponential distribution histogram.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TS3SDOZWqFI/AAAAAAAAAIY/T0oh_hRYaJ8/s1600/Expo10000ValuesXBins.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="235" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TS3SDOZWqFI/AAAAAAAAAIY/T0oh_hRYaJ8/s320/Expo10000ValuesXBins.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;Remember, all three of the above histograms has a mean of 50. The only difference is the number of samples and the number of histogram bins.&lt;br /&gt;&lt;br /&gt;Remember, the above two images/histograms are based on the exact same sample set and the only difference in the bin size settings!&lt;br /&gt;&lt;br /&gt;Let's investigate the median a bit. As you'll recall if we sorted all the samples and picked the middle sample, that will be the median. Looking at the directly above 10000 sample histogram above, what does the mean and the median look to be? Well, I told you the mean is 50, so what about the median? Because the median is based on the &lt;i&gt;number&lt;/i&gt; of samples not their values, in this case the median will be less than the mean. &lt;i&gt;It's like those few large values have pulled the mean to right and away from the median.&lt;/i&gt; While you can't see this in the above histogram graphic, there are samples with values of 300 to 400 to 500. There are not a lot of them, but they are there and they are effectively pulling the mean toward them!&lt;br /&gt;&lt;br /&gt;While the mean for the above histogram is 50, the median is about 35. So our hunch is correct in that the median is less than the mean.&lt;br /&gt;&lt;br /&gt;While many Oracle performance distributions may look exponential, if we increase the bin size they begin to look poisson-like. So it's important we also investigate the poisson distribution.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Poisson Distribution&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Suppose I'm doing a traffic study and need to estimate the average number of vehicles that pass by a specific free-flowing(1) destination each minute. I wake up one delightful morning, get my cup of coffee, arrive at my destination and start counting.&amp;nbsp;Every 30 seconds we record the number of vehicles that pass by. Let's say I did this for 10 minutes, which means I would have 20 samples. The result sample set is poisson distributed. Assuming I actually did this, here are the actual 20 sample values, followed by the histogram.&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: monospace; white-space: pre;"&gt;{19, 19, 18, 10, 20, 16, 12, 18, 15, 15, 18, 18, 16, 20, 17, 14, 27, 19, 14, 17} &lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TTTE5v9hBLI/AAAAAAAAAI8/r1d20l8wLJ0/s1600/Poisson20Samples17Mean.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="142" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TTTE5v9hBLI/AAAAAAAAAI8/r1d20l8wLJ0/s200/Poisson20Samples17Mean.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;You can see a similar histogram by copying and pasting the above values (keep the curly braces) into &lt;a href="http://wolframalpha.com/"&gt;WolframAlpha&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;(1) By &lt;i&gt;free-flowing&lt;/i&gt; I mean each vehicle that passes is &lt;i&gt;not&lt;/i&gt; somehow dependent on another vehicle that passes. For example, if there is a traffic jam, an accident, or a stop sign nearby causing the vehicles to bunch up, then the vehicle arrivals would be related. This is another way of saying the arrivals must be independent which, must occur in a true poisson distribution. Additionally, the inter-arrival times (i.e., time between each arrival) must be exponentially distributed.&lt;br /&gt;&lt;br /&gt;The poisson histogram image above is very typical. Going left to right, there is a quick build up to the peak and then then a slow build down. If we have enough samples, like the normal distribution, the mean will equal the median. So even though there are a few large values to the right of the mean, there are enough smaller values to the left of the mean to keep the mean and median "in check" and not swaying from each other.&amp;nbsp;As you'll recall from above, this is very different compared to an exponential distribution. With an exponential distribution the far left histogram bar is the tallest, that is it contains the most samples, and hence there is not built up.&lt;br /&gt;&lt;br /&gt;Take a look at the below two histograms. Are they exponential or poisson? It is natural to think the distribution on the left is exponential and the distribution on the right is poisson... But actually, they are both poisson.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TTTJ1YFpMTI/AAAAAAAAAJA/W18ddmoo_4w/s1600/PoissonCompare1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="142" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TTTJ1YFpMTI/AAAAAAAAAJA/W18ddmoo_4w/s200/PoissonCompare1.png" width="200" /&gt;&lt;/a&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TTTJ6Upq5iI/AAAAAAAAAJE/1dkQa7m46qM/s1600/PoissonCompare2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="142" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TTTJ6Upq5iI/AAAAAAAAAJE/1dkQa7m46qM/s200/PoissonCompare2.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;When comparing the above two images, the image on the left (contains the wider bins) looks more exponential than poisson. Surprisingly, they are both based on the &lt;i&gt;exact same sample set&lt;/i&gt;! If you look closely, you'll notice the only difference is the histogram bin size. You can also be deceived when there are lots of samples because a poisson distribution can look like a normal distribution! Just look at the image below.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TTTKesqg6iI/AAAAAAAAAJI/FDf7026C-68/s1600/Poisson10000ValuesXBins.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="149" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TTTKesqg6iI/AAAAAAAAAJI/FDf7026C-68/s200/Poisson10000ValuesXBins.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;I will get more into a poisson process and its distribution in my upcoming blog entry on SQL statement arrival rates. But for now, keep in mind&amp;nbsp;that &lt;b&gt;&lt;i&gt;what may appear to be a normal or exponential distribution could actually be more poisson-like!&lt;/i&gt;&lt;/b&gt;&amp;nbsp;&lt;b&gt;&lt;u&gt;They can be difficult to distinguish&lt;/u&gt;&lt;/b&gt;. The only way to test this is to perform a statistical hypothesis test.&lt;br /&gt;&lt;br /&gt;As I will blog about soon, many Oracle performance distribution may look exponential or poisson, but they fail a statistical hypothesis test. There is yet another and lesser known distribution that many times is the best match for Oracle performance related sample sets. It's called the log normal distribution.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Log Normal Distribution&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;There are times when a &lt;i&gt;skewed&lt;/i&gt; normal distribution occurs. By &lt;i&gt;skew&lt;/i&gt; I mean the tallest histogram bar is not in the middle but to the left or the right. For example, a skew is likely when mean values are near zero, variances are large and perhaps extreme, and the sample values cannot be negative. When these types of conditions exist, the log normal distribution may best describe the sample set.&lt;br /&gt;&lt;br /&gt;Personal income, reaction time to snake bites and bee stings, and country GDP and oil field reserves are all supposed to be log normal distributed. Humm...&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Proving it to myself&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;To prove this to myself, I decided to give this a try using real country GDP and country oil reserve data. With my hands shaking in anticipation, I called once again on &lt;i&gt;Mathematica&lt;/i&gt;. &lt;i&gt;Mathematica&lt;/i&gt; has vast data resources that can be pulled onto my desktop and analyzed. It's scary-amazing. Within minutes I had the GDP and oil reserves for 231 countries at my fingertips. I'm not sure of the year, but that's really not important anyways.&lt;br /&gt;&lt;br /&gt;I then placed the data into a histogram.&amp;nbsp;Regardless of my histogram tweaks, the image always looked exponentially distributed. However, when I performed a statistical fitness tests comparing the data samples with the distributions presented in this blog,&amp;nbsp;&lt;i&gt;only&lt;/i&gt; the log normal distribution was statistically similar. Every other distribution did not conform to the actual data. So I guess the &lt;i&gt;experts&lt;/i&gt; where correct.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;How to create a log normal data set&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;First, I'll state a simple definition: &lt;u&gt;A sample set is log-nomal if log(sample &lt;/u&gt;&lt;i&gt;&lt;u&gt;x&lt;/u&gt;&lt;/i&gt;&lt;u&gt;) is normally distributed.&lt;/u&gt; For example, if I have 100 log-normal samples and I apply log(sample[i]) to each one and then create a histogram from the results, the histogram will be normally distributed!&lt;br /&gt;&lt;br /&gt;Here is how I created a log normal sample set taking a slightly different twist:&amp;nbsp;For each sample &lt;i&gt;x&lt;/i&gt; in a normal distribution with a given mean and standard deviation, apply &lt;i&gt;exp(x)&lt;/i&gt; to it and place the result into another sample set. If you create a histogram on the new &lt;i&gt;exp(x)&lt;/i&gt; samples it will look this like this (assuming the normal distribution samples have a mean of 3 and the standard deviation of 1.5):&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TUHLVSgqvvI/AAAAAAAAALk/XI40rYGltCc/s1600/log+normal+1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="140" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TUHLVSgqvvI/AAAAAAAAALk/XI40rYGltCc/s200/log+normal+1.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;&lt;i&gt;Figure X.&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;It looks a lot like the exponential and poisson distributions! In fact, based on it's two input parameters (mean and standard deviation from its associated normal distribution), it can look like either one...especially if we mess with the histogram bin sizes and number. I suspect this flexibility is what makes it a relatively good visual match for our experimental data.&lt;br /&gt;&lt;br /&gt;Now this is really cool! Recall the country GDP sample set I mentioned just above in the &lt;i&gt;Proving it to myself&lt;/i&gt;&amp;nbsp;section. Below is the country GDP histogram, with 80 bins and showing 90% of the data.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TUbeFviXlyI/AAAAAAAAAL0/mcamoN3UbD8/s1600/Country+Raw+Hist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="135" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TUbeFviXlyI/AAAAAAAAAL0/mcamoN3UbD8/s200/Country+Raw+Hist.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;If I take the &lt;i&gt;log&lt;/i&gt; of each country's GDP and create a histogram of the result, it looks like this:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUbdKVignmI/AAAAAAAAALw/sgD2Z576TLY/s1600/GDP+trans+Normal+Hist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="125" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TUbdKVignmI/AAAAAAAAALw/sgD2Z576TLY/s200/GDP+trans+Normal+Hist.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;Very cool, eh? So while the raw data histogram looks more exponential, when we apply the log function to the data and create a histogram it looks pretty normal...which means visually our country GDP data is log normal distributed. And as I mentioned above, performing a statistical hyposthesis test the data is statistically log normal as well.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;How to lie using histograms&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Interestingly, the image below is based on the &lt;i&gt;exact same sample set&lt;/i&gt; as the &lt;b&gt;&lt;i&gt;Figure X&lt;/i&gt;&lt;/b&gt; histogram three images above. The only difference is I set the number of bins to 40 and displayed 90% of the data. This is why the horizontal axis only extends to around 70, not over 700.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TUHLafQVfbI/AAAAAAAAALo/uV5SbrmNhkM/s1600/log+normal+2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="142" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TUHLafQVfbI/AAAAAAAAALo/uV5SbrmNhkM/s200/log+normal+2.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;Now that's more like it! Notice the far three left bars are not the tallest and also notice how far the tail extends to the right. These are two key identifying characteristics of the log normal distribution. But &lt;i&gt;I have to warn you&lt;/i&gt;, there are many log normal distribution sample sets that do &lt;i&gt;not&lt;/i&gt; have the far left bars less then the tallest bar. (e.g., country GDP data.) The only way to really test if your data is log normal is to perform a hypothesis test. I will detail this in the next blog entry.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Predicting the median and mean&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This is pretty cool: The actual data samples in the above histogram have an average of 53 and a median of 19. So there are enough large value samples to effectively pull the mean away from the median. If you recall, for both the normal and poisson distributions, the mean and median are equal.&lt;br /&gt;&lt;br /&gt;Ready to be freaked out? For a log normal distribution, the &lt;i&gt;median&lt;/i&gt; of its samples is supposed to be the constant &lt;i&gt;e&lt;/i&gt; to the power of its normal distribution's average. If you recall above, the log normal sample set was created from normally distribution samples with a mean of 3. Therefore, the median equation is:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;median = e^m = e^3 = 2.718282846^3 = 20.08&lt;/code&gt;&lt;/pre&gt;Woah! Just above I said the actual median was 19, which is very close considering the sample set consists of only 1000 samples.&lt;br /&gt;&lt;br /&gt;Let's try another freak-out thing: the &lt;i&gt;mean&lt;/i&gt; of a log normal distribution is the constant &lt;i&gt;e&lt;/i&gt; to the power of its normal distribution's average plus its standard deviation squared divided by two. Words are messy, so the equation is:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;mean = e^(m+(s^2)/2) = 2.718282846^(3+(1.5^2)/2) = 61.87&lt;/code&gt;&lt;/pre&gt;Again, not perfect but close...although I would have liked it to be closer. I will delve into the application of this in subsequent blog postings.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Shifting sand&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I mentioned above the input parameters to create a log normal distribution are the mean and standard deviation of its associated normal distribution. These two parameters are also sometimes referred to as the &lt;i&gt;scale&lt;/i&gt; and &lt;i&gt;shape&lt;/i&gt; parameters.&amp;nbsp;If I mess with the &lt;i&gt;shape&lt;/i&gt; parameter (i.e., standard deviation), this causes the histogram tail to either contract or extend far to the right.&amp;nbsp;The &lt;i&gt;scale&lt;/i&gt; parameter shifts the tallest histogram bar to the left or right.&lt;br /&gt;&lt;br /&gt;Here's an example. In the image below the darkest red-ish color is the overlap of two histograms. The two histograms are colored purple and pink.&amp;nbsp;The two data sets are only different in their scale parameter. You can see one of the differences is the tallest bar shifted to the right when the scale parameter was increased. Perhaps not that interesting, but it will be useful in future blog posts.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TUHNVv64LwI/AAAAAAAAALs/4eHg2Rs7_N0/s1600/log+normal+3mix.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="140" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TUHNVv64LwI/AAAAAAAAALs/4eHg2Rs7_N0/s200/log+normal+3mix.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;The log normal distribution is amazing. The reason I focused so much on it is because it is important for Oracle performance analysis. I haven't demonstrated this yet, but I will in subsequent posts...stay tuned.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Conclusions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As I dig deeper into Oracle performance analysis, I am forced to understand statistical distributions. There is just no way around it. Documenting my research and to prepare me to analyze experimental data has produced the content for this posting. My hope is that I have conveyed a few key take-aways:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;A clear understanding of a histogram.&lt;/li&gt;&lt;li&gt;By changing histogram characteristics (e.g., bin size, number of bins) you can make a sample set look like a desired distribution.&lt;/li&gt;&lt;li&gt;How common statistical distributions relate to our lives.&lt;/li&gt;&lt;li&gt;What common statistical distribution histograms look like.&lt;/li&gt;&lt;/ol&gt;In subsequent blog postings, I will analyze how various Oracle system happenings relate to these common statistical distributions. This will allow us to communicate more confidently, more correctly, and also perhaps make some interesting predictions.&lt;br /&gt;&lt;br /&gt;Thanks for reading!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-7034607475602592808?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/7034607475602592808/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/7034607475602592808'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/7034607475602592808'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/02/important-statistical.html' title='Important Statistical Distributions...really'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_FEKH6HhYAEI/TT94Yi48cVI/AAAAAAAAALE/gRk1hvluyso/s72-c/wolframalpha-20110125192617467.gif' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-4319442933663946302</id><published>2011-01-26T10:34:00.000-08:00</published><updated>2011-02-02T06:50:36.762-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='child cursor'/><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='hashing'/><category scheme='http://www.blogger.com/atom/ns#' term='library cache'/><category scheme='http://www.blogger.com/atom/ns#' term='cursor'/><category scheme='http://www.blogger.com/atom/ns#' term='mathematica'/><title type='text'>Library Cache Visualization Tool: How To</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TT9NCmCaZqI/AAAAAAAAAKw/xrGyTYHQ0-Q/s1600/Prod+1+Cool+3.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" height="147" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TT9NCmCaZqI/AAAAAAAAAKw/xrGyTYHQ0-Q/s200/Prod+1+Cool+3.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;b&gt;Some background&lt;/b&gt;...I think what separates Oracle DBAs is their ability to communicate. If you take the time to study (my &lt;a href="http://training.orapub.com/"&gt;courses&lt;/a&gt; and &lt;a href="http://resources.orapub.com/SearchResults.asp?Search=books"&gt;books&lt;/a&gt; can help), every DBA will eventually become an Oracle technology expert. But from this point is where our uniqueness become more interesting as we each set out on a different career path.&lt;br /&gt;&lt;br /&gt;To improve my teaching I spend countless hours developing stories, role-plays , and various entertaining ways to transfer very complicated topics to DBAs. But what has been frustrating for me was the lack of visualization. Pictures, white boards, and flip charts are OK, but I wanted to do something that's visually amazing, flexible, and free. Finally, I discovered that &lt;i&gt;&lt;a href="http://www.wolfram.com/"&gt;Mathematica&lt;/a&gt;&lt;/i&gt; could help me in this quest. So last year I embarked on a journey to visualize aspects of Oracle's technology.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9NOAtvSWI/AAAAAAAAAK0/j_Z9HzVYYs0/s1600/Prod+1+Cool+4.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="200" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9NOAtvSWI/AAAAAAAAAK0/j_Z9HzVYYs0/s200/Prod+1+Cool+4.png" width="155" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;I started with Oracle's buffer cache. Thousands of you &lt;a href="http://resources.orapub.com/SearchResults.asp?Search=math+buffer+cache"&gt;&lt;b&gt;downloaded the tool&lt;/b&gt;&lt;/a&gt;&amp;nbsp;and the read the &lt;a href="http://shallahamer-orapub.blogspot.com/2010/09/buffer-cache-visualization-and-tool.html"&gt;&lt;b&gt;associated blog entry&lt;/b&gt;&lt;/a&gt; (its documentation). A few months ago I created an initial visualization tool for Oracle's library cache. The visualizations where jaw dropping but it was too highly abstracted for my liking, didn't relate enough with production systems, and had no associated documentation. Finally, this has changed...&lt;br /&gt;&lt;br /&gt;Oracle's library cache (LC) is amazing. And viewing it on your PC is even more amazing and very enlightening.&amp;nbsp;I just completed a significant update (still free) that is much more realistic, based on an 11g library cache dump and allows you to enter details from your real production Oracle system! Yes, it's very cool! You can &lt;a href="http://resources.orapub.com/SearchResults.asp?Search=math+library+cache"&gt;&lt;b&gt;download OraPub's Library Cache Visualization Tool here for free&lt;/b&gt;&lt;/a&gt;.&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9Ng_8q7tI/AAAAAAAAAK4/LmUaJNuXuAk/s1600/Prod+1+Cool+2.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" height="147" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9Ng_8q7tI/AAAAAAAAAK4/LmUaJNuXuAk/s200/Prod+1+Cool+2.png" width="200" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;b&gt;The purpose&lt;/b&gt; of this blog entry is to introduce you to the new version (2g) of OraPub's Library Cache Visualization Tool and also to introduce you (if you desire) to Oracle's library cache. I hope you enjoy this blog entry and the visualization tool!&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Library Cache Introduction&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Within Oracle's shared pool exists the library cache (LC). The library cache can be abstracted and viewed like a classic library's card catalog system. The cards are located by hashing to the correct catalog and then sequentially searching through the catalog looking for the desired card. Each card references a book's location and a few other details.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;It's Searched&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Oracle's LC is searched using a &lt;a href="http://firefightingbook.com/cgi-bin/search_opff.cgi?s1=library&amp;amp;s2=hash"&gt;hashing algorithm&lt;/a&gt;. Suppose our server process is parsing the SQL statement, &lt;b&gt;select * from dual&lt;/b&gt;. The server process attempts to minimize the time and resources required to parse the statement. One strategy employed is to check and see if the SQL statement has already been parsed and its cursor available in the library cache. But the server process must first locate the specific cursor reference, just like we must first locate a printed book's reference card.&lt;br /&gt;&lt;br /&gt;The server process will pass the SQL statement text to a &lt;i&gt;hash value generation function&lt;/i&gt; that will transform the text (e.g., &lt;b&gt;select * from dual&lt;/b&gt;) into a hash value (e.g., rT4rif87ujc). The hash value is then passed to &lt;i&gt;hash function&lt;/i&gt; which, will output a number within a specified range (e.g., 0,...,350000). &amp;nbsp;The output number (e.g., 3216) is a reference to what is called a &lt;i&gt;hash bucket&lt;/i&gt;. If our SQL statement's cursor exists in the library cache it will have a reference that is associated with this hash bucket (e.g., 3216).&lt;br /&gt;&lt;br /&gt;Each hash bucket can have an associated chain. The chain length could be zero (which is actually the most likely on real Oracle systems) or it could be perhaps two or even three references long. The references are more properly called &lt;i&gt;&lt;a href="http://firefightingbook.com/cgi-bin/search_opff.cgi?s1=handle"&gt;handles&lt;/a&gt;&lt;/i&gt;, because they reference a chunk of memory within Oracle's shared pool. Continuing my example, the server process jumps to the hash bucket (e.g., 3216) and begins sequentially scanning its associated hash chain searching for my SQL statement's hash value (e.g.,&amp;nbsp;rT4rif87ujc). If&amp;nbsp;it finds the hash value (the cursor reference) that's a soft parse, if not, then it must create the cursor plus a bunch of other stuff...this is known as a hard parse.&lt;br /&gt;&lt;br /&gt;The image below is a visual example of unrealistically simplified LC hashing structure...but it serves our purpose here wonderfully.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TUBHm3bEOJI/AAAAAAAAALI/EU37BkGk6i8/s1600/LC+Chains+Example.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="312" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TUBHm3bEOJI/AAAAAAAAALI/EU37BkGk6i8/s320/LC+Chains+Example.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;The LC hashing structure modeled above has 16 hash chains. Most chains contain two references, that is, handles. Suppose in my quest to locate my SQL statements cursor (hash value, &lt;b&gt;CSR 5&lt;/b&gt;), I hash to bucket,&amp;nbsp;&lt;b&gt;LC 3&lt;/b&gt;. The &lt;b&gt;B&lt;/b&gt; is for beginning and the &lt;b&gt;E&lt;/b&gt; represents the chain's end. My server process will now begin to sequentially search the LC hash chain &lt;b&gt;LC 3&lt;/b&gt;&amp;nbsp;looking for my cursor, represented as &lt;b&gt;CSR 5&lt;/b&gt;. It will quickly find it! Now my server process has access to all sorts of details related to the SQL statement and does not have to build the cursor, which is relatively expensive and part of the hard parsing process.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;References and Relationships&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The LC is responsible for maintaining the relationships between objects in the LC...and it's complex. This is why the visualization tool is so cool because we get a glimpse of this complexity. As a quick example, the LC will maintain the relationships between tables and SQL statements (more specifically child cursors). This way if the table is altered, Oracle knows which child cursors and cursors to invalidate. This invalidation process will propagate to associated views, procedures, functions, triggers, packages, etc. Oracle tries very hard to limit this propagation as it has some nasty repercussions (that I will not discuss here).&lt;br /&gt;&lt;br /&gt;The LC handles point to memory that contains information about procedures, functions, tables, views, synonyms, cursors, child cursors, and I'm sure there are others. Not only are there references to these nodes, but references to other nodes. For example a parent cursor (&lt;b&gt;select * from customers&lt;/b&gt;) can reference a child cursor (which has a specific execution plan) and a table (&lt;b&gt;customers&lt;/b&gt;). When you add in the connections with procedures and functions it's gets pretty crazy. Or how about tables that are referenced in multiple statements or when a cursor is shared by multiple procedures, functions, or packages...&lt;br /&gt;&lt;br /&gt;The image below is an example of the relationship between 2 cursors (parent cursors), their associated 4 child cursors, and their associated 5 tables. As you can see, with only a few objects and especially when sharing occurs, the relationships can become quite complex.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TUBJVk3d2qI/AAAAAAAAALM/QVaSvwLuSNc/s1600/LC+Objects+Example.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TUBJVk3d2qI/AAAAAAAAALM/QVaSvwLuSNc/s320/LC+Objects+Example.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;If I create a table, a couple of SQL statements, a procedure, and I force a couple of child cursors by changing the optimization mode, and finally dump the library cache to a trace file, I am able to diagram their relationships. But it takes awhile. Mapping more than a couple of objects would quickly turn into a nightmare!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Core Objects&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;As I mentioned, the LC references and contains meta data about Oracle objects. Not all objects defined in Oracle's data dictionary, but only those that have been recently referenced (there are exceptions) and have some cached information. For example, you can &lt;i&gt;&lt;b&gt;not&lt;/b&gt;&lt;/i&gt; get a good idea about the number of tables referenced in the LC by issuing a &lt;b&gt;select count(*) from all_tables&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;You can get a list of all the types or namespaces currently residing in Oracle's LC by issuing this simple command:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;select namespace from v$librarycache order by 1;&lt;/code&gt;&lt;/pre&gt;On my 11g system there was 17 namespaces. The parent cursors and child cursors (more below) are contained within the &lt;i&gt;SQL AREA&lt;/i&gt; namespace.&amp;nbsp;Personally, I find the LC dictionary views inadequate. For me, if I want to get an idea of what is going on in the LC, I dump it to a trace file. Here's how to do that:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;alter session set max_dump_file_size=unlimited;&lt;br /&gt;alter session set events 'immediate trace name library_cache level 10';&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;span class="Apple-style-span" style="font-family: monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-family: Times;"&gt;&lt;span class="Apple-style-span" style="white-space: normal;"&gt;Of course the next question is, "Where the heck is my trace file?" In 10g and earlier in SQL*Plus issue a &lt;b&gt;show parameter user_dump_dest&lt;/b&gt;. In 11g, here's one way to get the full path to the trace file.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;select tracefile from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;If you look closely in the trace file you will eventually see the word &lt;b&gt;FullHashValue&lt;/b&gt; followed by a bunch of text. Look to the right and you will see &lt;b&gt;Type=&lt;/b&gt;. Below is an example from a recent trace file with the hash value and identifier dramatically shortened.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;FullHashValue=9ba Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=395 OwnerIdn=5&lt;/code&gt;&lt;/pre&gt;This object is, obviously, a cursor. If I skip down a few lines I see this:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;Parent Cursor:  sql_id=8zu55nbpz8fdu parent=0x3030fe60 maxchild=2 plk=n ppn=n&lt;/code&gt;&lt;/pre&gt;Notice the &lt;b&gt;maxchild=2&lt;/b&gt; entry. This means this cursor (i.e., parent cursor) has two child cursors. If was to jump back up a couple lines in the trace file, I would find.&lt;br /&gt;&lt;pre&gt;&lt;code&gt; Child:  id='0' Table=30310d58 Reference=303103e4 Handle=3030fb9c&lt;br /&gt; Child:  id='1' Table=30310d58 Reference=303105b4 Handle=30303c24&lt;/code&gt;&lt;/pre&gt;These are the references to the two child cursors. Interestingly, if I search for the &lt;b&gt;findme&lt;/b&gt; table reference in the trace file, it will have a reference section and I will see it has references to handles&amp;nbsp;&lt;b&gt;3030fb9c&lt;/b&gt; and &lt;b&gt;30303c24&lt;/b&gt;. So you can see there is a link from the table to the associated child cursors.&lt;br /&gt;&lt;br /&gt;A child cursor is required for different execution plans for the syntacally identical cursor. For example, if I run the below code, one parent cursor and two child cursors will result.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;alter session set optimizer_mode=all_rows;&lt;br /&gt;select * from findme;&lt;br /&gt;alter session set optimizer_mode=first_rows;&lt;br /&gt;select * from findme;&lt;/code&gt;&lt;/pre&gt;This is one reason why if you query from &lt;b&gt;v$sql&lt;/b&gt; you can get more than one row returned.&amp;nbsp;The situation quickly becomes complex when adding even a single procedure that references the &lt;b&gt;findme&lt;/b&gt; table. Each procedure has a reference to the child cursor of any SQL or other procedures and functions contained within it. And all the table's referenced by the procedure's child cursors also point to the procedure itself. It can become confusing very quickly.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Relationship Mapping&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;With just the simple situation outlined above, plus some details I conveniently left out, you can see:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Every child cursor references a parent cursor.&lt;/li&gt;&lt;li&gt;A parent cursor may reference a child cursor, but not always.&lt;/li&gt;&lt;li&gt;Every procedure references the child cursor of SQL, procedures, etc. it calls.&lt;/li&gt;&lt;li&gt;Table references point to all procedures and child cursors that reference it.&lt;/li&gt;&lt;/ul&gt;I think you can see that with just a few objects, the connections will soon look like sub-atomic particles or astronomical objects! But that's what makes this all so interesting...&lt;br /&gt;&lt;br /&gt;For my LC visualization tool I made the decision to not include&amp;nbsp;packages, procedures, functions, and triggers (and other objects). Their inclusion would have signifiantly increased the complexity of the visualization beyond being useful. If you keep in mind that what you are seeing is probably the least complex situation given the inputs then you'll be fine. Even the introduction of a few procedures will significantly increase the node connections.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Tool Introduction and Control&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;You can download OraPub's Library Cache Visualization Tool for free from OraPub's web-site&amp;nbsp;&lt;a href="http://resources.orapub.com/SearchResults.asp?Search=library+cache+visual"&gt;here&lt;/a&gt;. As the tool's page indicates and links to, you will need to download and install &lt;i&gt;Mathematica's&lt;/i&gt; free player. It works just like the Acrobat &lt;i&gt;Reader&lt;/i&gt; in that the reader is free, but to create the document you must license Acrobat, which in this case means &lt;i&gt;Mathematica&lt;/i&gt;. (Note: OraPub has a special relationship with &lt;i&gt;Mathematica's&lt;/i&gt; company, Wolfram and can arrange for a discounted license. Just email OraPub for details.)&lt;br /&gt;&lt;br /&gt;This posting is based on version 2g of OraPub's LC Visualization Tool. Other versions will operate similarly, but the images shown below could be somewhat different.&lt;br /&gt;&lt;br /&gt;Note: You can always reset the visualization settings by clicking on the upper right hand plus sign and then selecting &lt;i&gt;Initial Settings&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;When you first run OraPub's LC Visualization Tool, it will look like this:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9BN7xeT2I/AAAAAAAAAJ8/La29LaUldlk/s1600/Initial+Shot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="262" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9BN7xeT2I/AAAAAAAAAJ8/La29LaUldlk/s320/Initial+Shot.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;b&gt;If you just can't wait&lt;/b&gt;, click on&amp;nbsp;&lt;b&gt;Preset&lt;/b&gt; and select &lt;i&gt;Intro 2&lt;/i&gt;. Click on the image and drag your mouse to rotate the image...more on this below.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;When you first run the tool, as shown above, a very simple, very unrealistic, highly abstracted, yet very useful and cool LC visualization appears along with all your control options. Here's a short description of the tool's control mechanisms:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Cursors w/Child Cursors&lt;/b&gt; is the number of &lt;i&gt;parent&lt;/i&gt; cursors that have one or more child cursors.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Cursors wo/Child Cursors&lt;/b&gt; is the number of parent cursors that do not have a child cursor.&amp;nbsp;While we typically think of parent cursors always having at least one child cursor, if you examine a library cache dump you will quickly notice this is not true. For demonstrations, I usually set this to zero.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Child Cursors&lt;/b&gt; is the total number of child cursors in the library cache. If you were to execute a SQL statement, dump the LC and examine the trace file, you will find your SQL statement's parent cursor and you will notice it has one child cursor. The number of child cursors be set to at least the number of &lt;i&gt;Cursors w/Child Cursors&lt;/i&gt;. If not, the tool should automatically reset the number of child cursors equal to the number of parent cursors with children. It is common for a parent cursor to have multiple child cursors.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Unique Table/Views&lt;/b&gt; is the total number of &lt;i&gt;unique&lt;/i&gt; tables and views. While there are other objects such as synonyms and sequences in the LC, for simplicity they are not represented. A real child cursor will reference at least one table, view, synonym, etc. In this tool, a child cursor will reference one or more unique table/view.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Table/View Share %&lt;/b&gt; is the percentage of unique tables that each child cursor will be associated with. This injects the messy pointer reality that a table will likely be associated with many child cursors. You will notice that when we increase this parameter, the number of memory references dramatically increases. For example, a customer table will likely be associated with literally hundreds of SQL statement child cursors. This tool will evenly (i.e., uniformly) distribute the tables to the child cursors. In a real Oracle system, obviously this will not occur as some tables will be referenced by more child cursors than other tables.&lt;/li&gt;&lt;li&gt;&lt;b&gt;LC Hash Chains&lt;/b&gt; is the number of library cache hash chains. Each LC bucket will have a hash chain, though in production systems most chains will contain no references. At the top of an Oracle 11g LC dump, you will notice a very nice table that shows the chain length by the count of chains. You may find that most of your chains have a length of zero. For fast searches we want a chain length of only one. Hashing is a fascinating topic and is very important for Oracle memory management. This is why I gave quite a bit of space on this topic in both the buffer cache and library cache chapters in my book, &lt;a href="http://resources.orapub.com/Oracle_Performance_Firefighting_Book_p/ff_book.htm"&gt;Oracle Performance Firefighting&lt;/a&gt;. For simple demonstration purposes I usually set the number of LC hash chains to around the number of tables and child cursors. In real Oracle systems, there can easily be over 100,000 chains!&lt;/li&gt;&lt;li&gt;&lt;b&gt;To Plot&lt;/b&gt; allows us to focus on only the LC chains, the other objects in the LC cache (cursors, child cursors, and tables/views), or all the objects. This is a great way to study each topic without the complexity of additional memory links. But as you'll see, when objects are fully shown the complexity can be dazzling! Parent cursors without a child cursor are only displayed with the &lt;i&gt;Full&lt;/i&gt;&amp;nbsp;option.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Scale %&lt;/b&gt; allows you to enter production size values for the parent cursors, etc. but only visualize a percentage of them. This is important because a meaningful visualization of a production system must be scaled down or it will take quite a long time to create a meaningless visualization. Before you enter production size values, set the &lt;b&gt;Scale %&lt;/b&gt; to &lt;i&gt;0.1&lt;/i&gt;. Then increase one tenth of a percentage point at a time&amp;nbsp;until the desired visualization appears.&amp;nbsp;Click on the control's plus sign and then click on the just displayed plus sign to jump 1/10 percentage point. This will also convey just how massive and complex Oracle's LC is. The scaling algorithm is very crude: the result is simply the the given percentage multiplied by the given number of objects. For example, if the number of unique tables is 20000 and the scale percentage is 10%, 2000 tables will be visualized...much too many by the way.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Preset&lt;/b&gt;&amp;nbsp;was implemented so we can quickly display some basic pre-defined visualizations. &lt;u&gt;&lt;b&gt;This is important:&lt;/b&gt;&lt;/u&gt; If you want to modify a preset, you must first set the &lt;b&gt;Preset&lt;/b&gt; to &lt;b&gt;Custom&lt;/b&gt;. If you forget to do this, your changes will immediately be overwritten by the displayed preset values.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Circle Size&lt;/b&gt; allows the two-demonentional (2D) visualization circle sizes to be adjusted. Based on the number of objects displayed and the viewing area, for a visually pleasing image you sometimes will want to adjust the colored circle size.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Visualization&lt;/b&gt; options provide you with a number of ways to view the same exact information. The &lt;i&gt;&lt;b&gt;Point&lt;/b&gt;&lt;/i&gt; option simply shows a single point for every node. This allows lots of information to be displayed two-dimensionally. You can mouse-over each point to see what it is. The &lt;i&gt;&lt;b&gt;2D&lt;/b&gt;&lt;/i&gt; option displays each node as a circle with the node abbreviation text visible (usually visible). This is a fantastic way to learn about the LC: With all the memory object sharing and list connections, as the number of displayed items increases, the value of the two dimensional view quickly diminishes.&amp;nbsp;The &lt;b&gt;&lt;i&gt;3D&lt;/i&gt;&lt;/b&gt; option shows each node as a small sphere and allows you to view lots of information in a more realistic way. The spheres are color coded based on the selected &lt;i&gt;Color Scheme&lt;/i&gt; (next item). The 3D view opens up a whole new way to study the LC. But eventually the 3D option can become cluttered and so the &lt;b&gt;&lt;i&gt;3D Wire&lt;/i&gt;&lt;/b&gt; option changes the spheres to points and allows the object name to be displayed on a mouse-over (you have to be zoomed in pretty close though).&lt;/li&gt;&lt;li&gt;&lt;b&gt;Color Scheme&lt;/b&gt; allows you to pick the visualization color scheme. This allow you to pick a more personally pleasing visualization. But even more important, I find that every projector/beamer projects the same color differently. By trying the various color schemes, you should be able to find one that works well. Using the default color scheme of &lt;i&gt;Multi&lt;/i&gt;, parent cursors are blue-ish, child cursors are brown-ish, tables/views are yellow-ish, and the beginning and ending of LC hash chains are dark green.&lt;/li&gt;&lt;/ul&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Using the Tool - First Time&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is what you will see when you first run the tool. (It's&amp;nbsp;the same image as the one shown at the very top.) &amp;nbsp;If you have messed with the settings, just click on the &lt;b&gt;Preset&lt;/b&gt; &lt;i&gt;Intro 1&lt;/i&gt;.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TT9B3XaIwRI/AAAAAAAAAKA/QyVElb3llak/s1600/Initial+Shot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="262" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TT9B3XaIwRI/AAAAAAAAAKA/QyVElb3llak/s320/Initial+Shot.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;Notice that each object type has a distinct color and abbreviation. By default, parent cursors are brown-ish and labeled as "CSR". Child cursors are brown-ish and labeled as "CCSR". Tables are yellow-ish and labeled as "TBL". Also notice the all the cursor and table related settings match what is being displayed.&lt;br /&gt;&lt;br /&gt;There are two key items I want to highlight.&lt;br /&gt;&lt;br /&gt;First is the &lt;b&gt;Table/View Share %&lt;/b&gt; is set to zero. Notice in the &lt;i&gt;above&lt;/i&gt; visualization no table is shared with another child cursor. &lt;i&gt;Below&lt;/i&gt; is the result with sharing set to 100%. You can do this yourself; set the &lt;b&gt;Preset&lt;/b&gt; to &lt;i&gt;Custom&lt;/i&gt; and then click on the &lt;b&gt;Table/View Share %&lt;/b&gt; &lt;i&gt;100&lt;/i&gt; option. To get a good looking image, on my screen I also had to set the &lt;b&gt;Circle Size&lt;/b&gt; to &lt;i&gt;0.15&lt;/i&gt;.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9CVmwntFI/AAAAAAAAAKE/Yj5_O2sf3WM/s1600/Option+1+share+100.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9CVmwntFI/AAAAAAAAAKE/Yj5_O2sf3WM/s320/Option+1+share+100.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;In the above image the tables are 100% shared. Because there are three child cursors (CCSR1,...CCSR3), each table node has three links.&lt;br /&gt;&lt;br /&gt;The second item I would like to highlight is the LC hash chains are not shown in the above image. This is because the &lt;b&gt;To Plot&lt;/b&gt; option is set to &lt;i&gt;Objects&lt;/i&gt;. If I click on &lt;i&gt;LC Chains&lt;/i&gt;, as the image below shows, only the LC chains and their contents is displayed. On my screen, I also increased the &lt;b&gt;Circle Size&lt;/b&gt; back to &lt;i&gt;0.3&lt;/i&gt;.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9CnReb4HI/AAAAAAAAAKI/yVbiq7yjnEc/s1600/Intro+1+Share+100+LC.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="149" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9CnReb4HI/AAAAAAAAAKI/yVbiq7yjnEc/s320/Intro+1+Share+100+LC.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;Interestingly, as a library cache dump indicates and visualized above, the child cursors are not directly on a hash chain. However, tables referenced in a parent cursor and and the parent cursors are directly associated with a hash chain. Referencing the image above, if a change was made to table &lt;i&gt;TBL 2&lt;/i&gt;, Oracle would end up hashing to LC chain 2 (&lt;b&gt;LC 2B&lt;/b&gt;, the "B" is for beginning and the "E" is for ending) and start sequentially scanning looking for the handle. &lt;i&gt;TBL 2&lt;/i&gt; is the first entry on LC chain 2. Oracle would then invalidate the table, follow the reference from the Table 2 entry to its associated child cursors (CCSR1, CCSR2, CCSR3) and then also to the cursors (CSR 1, CSR 2, CSR3)...invalidating them all!&lt;br /&gt;&lt;br /&gt;In a real production Oracle system, most LC chains are only one object in length. So the above image would never, we hope, actually occur. While not shown, if you increase the number of LC hash chains to a few more than the number of objects you will get a truer looking abstraction.&lt;br /&gt;&lt;br /&gt;Displaying the LC chains and the other objects separately is a wonderful way to learn about LC structures. But we miss out on understanding the complexity. The image blow has the &lt;b&gt;To Plot&lt;/b&gt; set to &lt;i&gt;Full&lt;/i&gt;, thereby displaying all the objects with their associated links. For my screen also changing the &lt;b&gt;Circle Size&lt;/b&gt; back to &lt;i&gt;0.15&lt;/i&gt; looked the best.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TT9C8TmZM9I/AAAAAAAAAKM/BWJALSie0YU/s1600/Intro+1+Share+100+Full.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TT9C8TmZM9I/AAAAAAAAAKM/BWJALSie0YU/s320/Intro+1+Share+100+Full.png" width="311" /&gt;&lt;/a&gt;&lt;/div&gt;Because there are only a few objects in this visualization we can pretty much follow all the links. But even now, it's tricky. Plus they can easily be overlapping links that are not visible from a two dimensional perspective. Changing the &lt;b&gt;Orientation&lt;/b&gt; (located at bottom of the tool) can sometimes help when there are overlapping lines. The image below has the &lt;b&gt;Visualization&lt;/b&gt; set to &lt;i&gt;3D&lt;/i&gt; plus I rotated the image and zoomed in so get a very nice image.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TT9Da3JIkBI/AAAAAAAAAKQ/eNi_AV5mOY0/s1600/Intro+1+Share+100+Full+3D.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="268" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TT9Da3JIkBI/AAAAAAAAAKQ/eNi_AV5mOY0/s320/Intro+1+Share+100+Full+3D.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;The objects are color coded, so with a limited number of objects we can see exactly what's going on. The green sphere's are the beginning and ending points of the LC chains. Using the default &lt;b&gt;Color Scheme&lt;/b&gt;, &lt;i&gt;Multi&lt;/i&gt; cursors are white-ish, child cursors are brown-ish, and tables/views are yellow-ish. For three dimensional images you may want to change the default &lt;b&gt;Color Scheme&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;You will notice even with thousands of nodes, &lt;i&gt;Mathematica&lt;/i&gt; typically plots the LC chain begin and end points relatively far away from the mass of nodes. And when the number of nodes increases as well as the number of LC chains (remember most LC chains have a length of only one) the image looks like an exploding star.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Using the Tool - More Complexity&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For this visualization select the &lt;b&gt;Preset&lt;/b&gt;,&amp;nbsp;&lt;i&gt;Intro 2&lt;/i&gt;. This is a more realistic, though still highly abstracted, LC visualization. Immediately your eyes will pick up on a number of things.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9D6BfeoXI/AAAAAAAAAKU/IQkpJ9evTVc/s1600/Intro+2+Default.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="268" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9D6BfeoXI/AAAAAAAAAKU/IQkpJ9evTVc/s320/Intro+2+Default.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;With more LC chains their beginning and end points look like pins in a pin cushion! Having many short LC chains enable very fast searches. The dense center is due to intense table/view sharing and to a lessor extent because there are still fewer LC chains than tables and child cursors, so the chains will be longer than zero or one.&lt;br /&gt;&lt;br /&gt;Now change the &lt;b&gt;Preset&lt;/b&gt; to &lt;i&gt;Custom&lt;/i&gt;, (Don't forget to do this or your changes will be undone.) and change the &lt;b&gt;Table/View Share %&lt;/b&gt; from &lt;i&gt;100&lt;/i&gt; to &lt;i&gt;25&lt;/i&gt;, then to &lt;i&gt;50&lt;/i&gt;, then &lt;i&gt;75&lt;/i&gt;, and back to &lt;i&gt;100&lt;/i&gt;. Notice that as sharing increases so is the linking singular intensity. While sharing resources can save memory, the relationship establishment and maintenance dramatically increases (think: CPU consumption and &lt;a href="http://resources.orapub.com/SearchResults.asp?Search=latch"&gt;serialization control&lt;/a&gt;). Also, the likelihood of a hot spots affecting overall performance increases, whereas when the sharing is minimal, widespread hotspot impact is less likely. Below is the image with &lt;b&gt;Table/View Share %&lt;/b&gt; set to &lt;i&gt;50&lt;/i&gt;&amp;nbsp;and I also zoomed into the image and rotated it until I was satisfied with the image.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TT9Ei5rPF-I/AAAAAAAAAKY/0pl7Q9tNyiM/s1600/Intro+2+Share+50.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="268" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TT9Ei5rPF-I/AAAAAAAAAKY/0pl7Q9tNyiM/s320/Intro+2+Share+50.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;The image below I simply zoomed into the center looking over the shoulder of one of the clusters. ...and we wonder why its common to experience LC cache latch or mutex contention?...&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_FEKH6HhYAEI/TT9FM9WiN2I/AAAAAAAAAKc/Mt21u5H80jE/s1600/Intro+2+Share+50+Zoom.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="268" src="http://2.bp.blogspot.com/_FEKH6HhYAEI/TT9FM9WiN2I/AAAAAAAAAKc/Mt21u5H80jE/s320/Intro+2+Share+50+Zoom.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;The image below shows plotting&amp;nbsp;(set &lt;b&gt;To Plot&lt;/b&gt;)&amp;nbsp;only the &lt;i&gt;LC chains&lt;/i&gt;&amp;nbsp;with the &lt;b&gt;Visualization&lt;/b&gt; set to&amp;nbsp;&lt;i&gt;Points&lt;/i&gt;.&amp;nbsp;While the &lt;i&gt;2D&lt;/i&gt; visualization is much more interesting, I'm now trying to convey that the hash chain length is still on average more than one (actually more like 2+) and I'm trying to get all this in a small area, hence the use of the &lt;i&gt;Points&lt;/i&gt; visualization.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT36nBih9mI/AAAAAAAAAJw/wMQv98yvbeE/s1600/Intro+2+Chain+Points.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="236" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT36nBih9mI/AAAAAAAAAJw/wMQv98yvbeE/s320/Intro+2+Chain+Points.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;If you where to click on the &lt;i&gt;Objects&lt;/i&gt; or &lt;i&gt;Full&lt;/i&gt; &lt;b&gt;To Plot&lt;/b&gt; option, the image will be very crowded and pretty much worthless. When you see this happen, it's time to think about a &lt;i&gt;3D&lt;/i&gt; view or reduce the scale. Below is the visualization with the &lt;b&gt;Table/View Share %&lt;/b&gt; of &lt;i&gt;50&lt;/i&gt;, a &lt;b&gt;To Plot&lt;/b&gt; of &lt;i&gt;Full&lt;/i&gt; and the &lt;b&gt;Visualization&lt;/b&gt; set to &lt;i&gt;Points&lt;/i&gt;. Not real useful but the symmetry is beautiful!&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9FrDLIUAI/AAAAAAAAAKg/sjMynVPuyP0/s1600/Intro+2+Share+50+Full+Points.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9FrDLIUAI/AAAAAAAAAKg/sjMynVPuyP0/s320/Intro+2+Share+50+Full+Points.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;To made the visualization more meaningful, I set the&amp;nbsp;&lt;b&gt;Visualization&lt;/b&gt;&amp;nbsp;to&amp;nbsp;&lt;i&gt;2D&lt;/i&gt;, reduced the&amp;nbsp;&lt;b&gt;Scale %&lt;/b&gt;&amp;nbsp;to&amp;nbsp;&lt;i&gt;10&lt;/i&gt;&amp;nbsp;and set the&amp;nbsp;&lt;b&gt;Circle Size&lt;/b&gt;&amp;nbsp;to&amp;nbsp;&lt;i&gt;0.15&lt;/i&gt;. This gives me a scaled down version of the LC. Notice that you can see the beginning of the two clusters that were very apparent in the non-scaled 3D visualization. Pretty cool, eh?&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TT9GJI6aZ6I/AAAAAAAAAKk/nTMaBWUWVCM/s1600/Intro+2+Share+50+Scale+10+Circles.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TT9GJI6aZ6I/AAAAAAAAAKk/nTMaBWUWVCM/s320/Intro+2+Share+50+Scale+10+Circles.png" width="245" /&gt;&lt;/a&gt;&lt;/div&gt;I think you get the idea! But what about real production system?&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Using the Tool - Production Systems&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now that you are familiar with the tool, the next step everyone wants is to plug their real data into the model! Please remember the tool only shows some of the objects in the library cache and abstractions are made...but still we can learn a lot plus gain a much higher respect for Oracle's shared pool and library cache...not to mention the kernel developers and architects.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Get some real data&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The first thing we need to do is extract some key information about the current state of the library cache. As an example, after I dumped the LC as demonstrated above (way above), I then run a simple shell script which parses through the file and summarizes the elements we need. You can download and see this shell script &lt;a href="http://filezone.orapub.com/Research/lcVisualization01/getLCinfo.txt"&gt;&lt;b&gt;here&lt;/b&gt;&lt;/a&gt;. Here is an example of what you could see just after an 11g instance recycle.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;$ getLCinfo.sh prod18_ora_21939.trc&lt;br /&gt;&lt;br /&gt;Library Cache Dump Summary for Oracle 11g&lt;br /&gt;&lt;br /&gt;--------------------------------------------&lt;br /&gt;Procs &amp;amp; Funcs                      : 60&lt;br /&gt;Parent Cursors                     : 23802&lt;br /&gt;       w/ child cursors            : 13870&lt;br /&gt;       wo/child cursors            : 9932&lt;br /&gt;Child Cursors                      : 23579&lt;br /&gt;              per parent (w/csr)   : 1.70&lt;br /&gt;Unique Tables &amp;amp; Views              : 2970&lt;br /&gt;--------------------------------------------&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;b&gt;Enter the data into the tool&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The first thing you should do is set the &lt;b&gt;Preset&lt;/b&gt; to &lt;i&gt;Custom&lt;/i&gt;, then set the &lt;b&gt;Scaling %&lt;/b&gt; to &lt;i&gt;0.01&lt;/i&gt;, and finally set the &lt;b&gt;Visualization&lt;/b&gt; to &lt;i&gt;Points&lt;/i&gt;. A large 3D visualization could takes minutes to render. Do yourself a favor and start with a very scaled down and simple visualization and then start making changes. Now you can carefully enter the numeric values.&lt;br /&gt;&lt;br /&gt;The only value we don't have a production value for is the &lt;b&gt;Table/View Share %&lt;/b&gt;. Without a more complex gathering script, you'll need to make an educated guess. A good general guess would be around 25%. &amp;nbsp;A large system with only a few tables will approach 100%. A system with thousands and thousands of tables and views will approach 10%. These are simply my best guesses. It's very interesting to increase the shared percentage and see how dramatically the complexity increases!&lt;br /&gt;&lt;br /&gt;We also need a value for the number of LC hash chains. You can determine this by looking near the top of the LC dump trace file. If Oracle tries to default the number of LC hash chains to have an average length between zero and one. Even on a very small system, the default number of LC hash can be over 100000. To reduce visualization clutter, I usually set the number of LC hash chains to the sum of child cursors and tables/views. This will ensure the default chain length is one. If this produces too much clutter, just reduce the value while knowing on a real system there are many, many chains with a length of zero. For this example, I entered 17000 LC hash chains.&lt;br /&gt;&lt;br /&gt;The image below is the above data carefully entered along with the recommended initial settings. You could have also selected a &lt;b&gt;Preset&lt;/b&gt; of &lt;i&gt;Prod 1&lt;/i&gt;. It's not too exciting, but our goal is to get something visual and get it quickly. Now we can alter the &lt;i&gt;visually&lt;/i&gt; related parameters to get a more useful and better looking visual.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9GiNuR0eI/AAAAAAAAAKo/BYabgbs-MBI/s1600/Prod+1+Default.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="305" src="http://4.bp.blogspot.com/_FEKH6HhYAEI/TT9GiNuR0eI/AAAAAAAAAKo/BYabgbs-MBI/s320/Prod+1+Default.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;One of the changes I like to make is slowly increase the &lt;b&gt;Scale %&lt;/b&gt; and watch the complexity increase. One way to do this is to click on the big "+" sign in the &lt;b&gt;Scale %&lt;/b&gt; control box. On my system with this visualization and clicking on the &lt;i&gt;play&lt;/i&gt; icon, the visualization essentially locks up &lt;i&gt;Mathematica&lt;/i&gt;...so be careful. To get the below images, I repeatedly clicked on the "+"&amp;nbsp;icon&amp;nbsp;and then to decreasing the scale I clicked on the "-" icon. Below is the series of images, each with an increasing scale percentage.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_FEKH6HhYAEI/TT9KOcxHBmI/AAAAAAAAAKs/wnAO54PvwGg/s1600/Prod+1+Series.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="124" src="http://3.bp.blogspot.com/_FEKH6HhYAEI/TT9KOcxHBmI/AAAAAAAAAKs/wnAO54PvwGg/s320/Prod+1+Series.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Using The Tool - Myself&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_FEKH6HhYAEI/TT9QM_egtOI/AAAAAAAAAK8/-HWXu85opTA/s1600/Prod+2+Cool+1.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" height="200" src="http://1.bp.blogspot.com/_FEKH6HhYAEI/TT9QM_egtOI/AAAAAAAAAK8/-HWXu85opTA/s200/Prod+2+Cool+1.png" width="166" /&gt;&lt;/a&gt;&lt;/div&gt;For myself, I use the tool mainly while &lt;a href="http://training.orapub.com/"&gt;teaching&lt;/a&gt; and when working with my customers (that is, &lt;a href="http://resources.orapub.com/articles.asp?id=146"&gt;my consulting work&lt;/a&gt;). I find that creating a visual model of a real system or even a simple example&amp;nbsp;quickly builds a conceptual framework of the LC architecture. Then I can focus on the particular aspect I'm interested in; LC latch/mutex contention, parsing, share pool memory management, hashing, etc.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Using the Tool - Now it's Your Turn&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;You have access to a powerful library cache visualization tool. Now it's&amp;nbsp;your turn to begin exploring. My recommendation is to first use the provided presets and then customize the control settings. Then gather real production data from your system and begin using the tool to gain insights and increase your communication prowess...after all, like I mentioned at the start of this blog entry,&amp;nbsp;I think what separates Oracle DBAs is their ability to communicate.&lt;br /&gt;&lt;br /&gt;Thanks for reading and I hope you find this tool immensely gratifying!&lt;br /&gt;&lt;br /&gt;Craig.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub.com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4169710303065679169-4319442933663946302?l=shallahamer-orapub.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://shallahamer-orapub.blogspot.com/feeds/4319442933663946302/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/01/new-library-cache-visualization-tool.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/4319442933663946302'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4169710303065679169/posts/default/4319442933663946302'/><link rel='alternate' type='text/html' href='http://shallahamer-orapub.blogspot.com/2011/01/new-library-cache-visualization-tool.html' title='Library Cache Visualization Tool: How To'/><author><name>Craig Shallahamer Founder and President of OraPub, Inc.</name><uri>http://www.blogger.com/profile/04109635337570098781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='30' height='32' src='http://1.bp.blogspot.com/_FEKH6HhYAEI/S01rIsurXVI/AAAAAAAAABE/_nyQoflU8Vo/S220/craig_promo_bw.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_FEKH6HhYAEI/TT9NCmCaZqI/AAAAAAAAAKw/xrGyTYHQ0-Q/s72-c/Prod+1+Cool+3.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4169710303065679169.post-7701795700355953448</id><published>2011-01-11T09:45:00.000-08:00</published><updated>2011-01-12T10:33:58.975-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='v$sqlstats'/><category scheme='http://www.blogger.com/atom/ns#' term='experimental design'/><title type='text'>When Is V$SQLSTATS Refreshed?</title><content type='html'>Collecting performance data can sometimes be very simple. But at other times it can be a nightmare. Not only is there the classic possibility of the collection impacting the system your monitoring, but multiple sessions can be involved plus the differences in timing can be microseconds. And the closer you get to the database kernel, the trickier it gets. This past week I came across what I initially thought would be a very simple collection question, "When is an entry made into&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;?"&amp;nbsp;It turns out, it's more complicated then I anticipated and I thought you might be interested in the answers plus the scripts I used to perform the experiments.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;What's So Complicated?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Ask yourself these questions regarding &lt;i&gt;&lt;u&gt;when&lt;/u&gt;&lt;/i&gt; an entry is made into &lt;b&gt;v$sqlstats&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;When a statement has started running?&lt;br /&gt;When a statement completes?&lt;br /&gt;As a statement is running?&lt;br /&gt;Are all statistics updated at the same time?&lt;br /&gt;Is it the same for procedures, functions, anonymous PL/SQL blocks, standard SQL entered in SQL*Plus?&lt;br /&gt;&lt;br /&gt;What initially seemed to be a very simple question, turned out to be much more complicated. The only way to answer these questions was to set up an experiment... so that's what I did.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Experimental Design&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The general data collection strategy I used requires two users; monitoring and&amp;nbsp;application. There is actually a third, the data repository user&amp;nbsp;which, I used &lt;b&gt;system&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;I created a single text file that contains an short introduction, all the code, explanations how to run the experiment yourself, and the results I will detail below. This single file can be downloaded by&amp;nbsp;&lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/vsqlstat_change/vsqlstat_collection.txt"&gt;clicking here&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The monitoring user&lt;/b&gt; continuously checks the&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt; view&amp;nbsp;for a change in the specific SQL we're looking for. By default the change check is in a very tight loop with no delay. Through the variable &lt;b&gt;troll_delay_in&lt;/b&gt; you can insert a delay.&amp;nbsp;Actually, I query from&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;'s underlying fixed view &lt;b&gt;x$kkssqlstat&lt;/b&gt;&amp;nbsp;because this is lower overhead and I was already using&amp;nbsp;the &lt;b&gt;x$&lt;/b&gt; for some other experiments, so it ended up this way. In both my experiments and in this blog, when I reference &lt;b&gt;v$sqlstats&lt;/b&gt; I am also referring to its underlying &lt;b&gt;x$&lt;/b&gt; fixed view. The experimental results are inserted into the &lt;b&gt;op_results_raw&lt;/b&gt; table, which I created&amp;nbsp;within the&amp;nbsp;&lt;b&gt;system&lt;/b&gt;&amp;nbsp;schema. (I feel uncomfortable about creating objects&amp;nbsp;in the&amp;nbsp;&lt;b&gt;sys&lt;/b&gt;&amp;nbsp;schema.) An important aspect of the experimental design is all result entries are timestamped. It is the timestamped entries that enable us to see what occurred and in what order. As you'll see, this strategy works very nicely.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The application user&lt;/b&gt; runs the SQL that we closely monitor. I have four&amp;nbsp;tests; basic select, a procedure with the same select, a procedure&amp;nbsp;that runs the select twice, and a procedure that calls another procedure. As with the monitored user, the&amp;nbsp;application user inserts the results into the &lt;b&gt;op_results_raw&lt;/b&gt;&amp;nbsp;table owned by the user &lt;b&gt;system&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The SYSTEM user&lt;/b&gt; owns the results table, &lt;b&gt;op_results_raw&lt;/b&gt;. It probably&amp;nbsp;would have been cleaner to have the application user be the owner, but for now&amp;nbsp;it is what it is.&lt;br /&gt;&lt;br /&gt;One of the keys to make this work is clearly identifying the application&amp;nbsp;SQL. To do this we need its &lt;b&gt;sql_id&lt;/b&gt; and &lt;b&gt;plan_hash_value&lt;/b&gt; columns. While I'm not displaying all the actual experimental code (but I provided the download link above), I think this is important and versatile enough to post. So here it is:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;def the_test_sql=" * from dual"&lt;br /&gt;&lt;br /&gt;select &amp;amp;the_test_sql;&lt;br /&gt;select sql_id,plan_hash_value,substr(sql_text,1,60) the_sql&lt;br /&gt;from v$sqlstats&lt;br /&gt;where sql_id = (select prev_sql_id from v$session where sid=(select sid from v$mystat where rownum=1));&lt;/code&gt;&lt;/pre&gt;Notice that&amp;nbsp;immediately after I run the above application SQL, I query from the&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt; and &lt;b&gt;v$session&lt;/b&gt; looking for the "just run" &lt;b&gt;sql_id&lt;/b&gt; and &lt;b&gt;plan_hash_value&lt;/b&gt;. This works&amp;nbsp;wonderfully!&amp;nbsp;Here is the output on my test environment.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;SQL_ID &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;PLAN_HASH_VALUE THE_SQL&lt;br /&gt;------------- --------------- ----------------------------------------&lt;br /&gt;3fbqsb5wsumj4 &amp;nbsp; &amp;nbsp; &amp;nbsp; 272002086 select &amp;nbsp;* from dual&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;Make sure you have serveroutput set to &lt;b&gt;off&lt;/b&gt; when you run this or it will not do what we want.&lt;/div&gt;&lt;br /&gt;My test server was running a single CPU with four cores on Linux with Oracle Release 11.2.0.1. It is a small database but large enough for this experiment.&lt;br /&gt;&lt;br /&gt;As mentioned above, the experimental results from both the monitoring and the application user are inserted into the &lt;b&gt;op_results_raw&lt;/b&gt; table. Here is an application user code snippet. For readability, I took out a bunch of lines from the middle. Obviously, before the below code is run the variable&amp;nbsp;&lt;b&gt;the_test_sql&lt;/b&gt; must be set.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;def results_owner_in=system&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'Started with SQL execution test.');&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'SQL start...');&lt;br /&gt;select &amp;amp;the_test_sql;&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'SQL end...');&lt;br /&gt;exec dbms_lock.sleep(7);&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'SQL start...');&lt;br /&gt;select &amp;amp;the_test_sql;&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'SQL end...');&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'Finished with SQL execution test.');&lt;br /&gt;commit;&lt;/code&gt;&lt;/pre&gt;To help me interpret the results,&amp;nbsp;I insert various timing markers, such as &lt;i&gt;Started the SQL execution test&lt;/i&gt;. I also want to know, for the monitoring and application user, when their duties started and ended. I wanted to ensure my monitoring started before and ended after the application testing. I also inserted a relatively long timing gap between the repeated SQL executions. In the example above, I used seven seconds. This makes understanding and interpreting the results much easier because the common events are clustered by time.&lt;br /&gt;&lt;br /&gt;While I will detail the actual experimental results, so you can see how I reported the results here is a snippet. &lt;i&gt;Note: For readability I replace the actual &lt;b&gt;sql_id&lt;/b&gt; and &lt;b&gt;plan_hash_value&lt;/b&gt; with 7,7. I did this kind of thing throughout this blog posting.&lt;/i&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;SQL&amp;gt; l&lt;br /&gt;&amp;nbsp;&amp;nbsp;1 &amp;nbsp;select to_char(mark_time,'HH24:MI:SS.FF4') Mark,&lt;br /&gt;&amp;nbsp;&amp;nbsp;2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; happening&lt;br /&gt;&amp;nbsp;&amp;nbsp;3 &amp;nbsp;from &amp;nbsp; op_results_raw&lt;br /&gt;&amp;nbsp;&amp;nbsp;4* order by mark_time&lt;br /&gt;SQL&amp;gt; /&lt;br /&gt;&lt;br /&gt;MARK &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; HAPPENING&lt;br /&gt;-------------- -----------------------------------------------------------------&lt;br /&gt;&lt;div&gt;&lt;div&gt;13:59:20.9452 &amp;nbsp;Started v$sqlstats collection&lt;/div&gt;&lt;div&gt;13:59:21.0485 &amp;nbsp;v$sqlstats change for 7,7 exec=20 dskrds=6736027 lio=922219&lt;/div&gt;&lt;div&gt;13:59:26.4489 &amp;nbsp;Started with SQL execution test.&lt;/div&gt;&lt;div&gt;13:59:26.4513 &amp;nbsp;SQL start...&lt;/div&gt;&lt;div&gt;13:59:26.4519 &amp;nbsp;v$sqlstats change for 7,7 exec=20 dskrds=6736027 lio=922219&lt;/div&gt;&lt;div&gt;13:59:26.4522 &amp;nbsp;v$sqlstats change for 7,7 exec=21 dskrds=6736027 lio=922219&lt;/div&gt;&lt;/div&gt;...&lt;/code&gt;&lt;/pre&gt;Remember, all the code, results, a more detailed explanation is available in a single text file by &lt;b&gt;&lt;a href="http://filezone.orapub.com/Research/vsqlstat_change/vsqlstat_collection.txt"&gt;clicking here&lt;/a&gt;&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Results: Simple SQL Statement&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;The &lt;i&gt;simple&lt;/i&gt; SQL statement is:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;select /* orapub 3 */ avg(t0.object_id+t1.object_id+t2.object_id) from test1 t0, test1 t1, test1 t2 where t0.object_id=t1.object_id and t1.object_id=t2.object_id&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;On my system its &lt;b&gt;sql_id&lt;/b&gt; is&amp;nbsp;7un4pb76n3sqc and &lt;b&gt;plan_hash_value&lt;/b&gt; is&amp;nbsp;772987886 which, for readability were replaced below with 7,7. Here is the beginning of the output:&lt;/div&gt;&lt;pre&gt;&lt;code&gt;MARK &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; HAPPENING&lt;div&gt;&lt;div&gt;-------------- ----------------------------------------------------------------&lt;/div&gt;&lt;div&gt;13:59:20.9452 &amp;nbsp;Started v$sqlstats collection&lt;/div&gt;&lt;div&gt;13:59:21.0485 &amp;nbsp;v$sqlstats change for 7,7 exec=20 dskrds=6736027 lio=922219&lt;/div&gt;&lt;div&gt;13:59:26.4489 &amp;nbsp;Started with SQL execution test.&lt;/div&gt;&lt;div&gt;13:59:26.4513 &amp;nbsp;SQL start...&lt;/div&gt;&lt;div&gt;13:59:26.4519 &amp;nbsp;v$sqlstats change for 7,7 exec=20 dskrds=6736027 lio=922219&lt;/div&gt;&lt;div&gt;13:59:26.4522 &amp;nbsp;v$sqlstats change for 7,7 exec=21 dskrds=6736027 lio=922219&lt;/div&gt;&lt;div&gt;13:59:28.4311 &amp;nbsp;v$sqlstats change for 7,7 exec=21 dskrds=6759000 lio=964528&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;13:59:30.4383 &amp;nbsp;v$sqlstats change for 7,7 exec=21 dskrds=6768115 lio=968314&lt;/div&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;&lt;b&gt;20.9452&lt;/b&gt;: The statistics collection was enabled before anything else started.&lt;/div&gt;&lt;div&gt;&lt;b&gt;21.0485&lt;/b&gt;: The collection has never seen the statement before, so it thinks the &lt;b&gt;v$sqlstats&lt;/b&gt; entry has just been inserted or refreshed. This is incorrect, so disregard this entry.&lt;/div&gt;&lt;div&gt;&lt;b&gt;26.4489&lt;/b&gt;: There was a clear ~5 second gap. This is how long it took me to copy and paste the application use code into its sqlplus session...without making a mistake. In the Experimental Design section above, you can see the actual SQL that caused this entry.&lt;/div&gt;&lt;div&gt;&lt;b&gt;26.4513&lt;/b&gt;: This application user entry was made just before the simple SQL was run.&lt;/div&gt;&lt;div&gt;&lt;b&gt;26.4519&lt;/b&gt;: The SQL statement started so an entry is made into &lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;(actually the row refreshed, that is, updated) and our monitoring user detects this, pulls from &lt;b&gt;v$sqlstats&lt;/b&gt;, and records some basic information that we see above.&amp;nbsp;Notice that compared to the 21.0485 entry,&amp;nbsp;there is no change shown to trigger this entry. However, the monitoring user did in fact detect a change by comparing the previous and current&amp;nbsp;&lt;b&gt;last_update_change&lt;/b&gt; column. &amp;nbsp;That column must have changed but the execution column had not yet been changed, or at least we don't see the change in the &lt;b&gt;v$&lt;/b&gt; fixed view. There could be many reasons for this, such as there is no read consistency with performance data. But notice the next entry.&lt;/div&gt;&lt;div&gt;&lt;b&gt;26.4552&lt;/b&gt;: In less then 2 1000ths of a second we detect another change and we can see the number of executions has increased. I suspect we caught the &lt;b&gt;v$&lt;/b&gt; fixed view in the middle of an update...remember there is no read consistency in performance views! So it appears the execution column is updated when the statement begins.&lt;/div&gt;&lt;div&gt;&lt;b&gt;28.4311&lt;/b&gt;: It has been about two seconds since the last &lt;b&gt;v$&lt;/b&gt; entry so another row is inserted into the &lt;b&gt;v$&lt;/b&gt; table and we see the results here. In this case both the disk reads and logical buffers reads have increased, but the number of executions has not changed... because the number of executions changed when the statement started.&lt;/div&gt;&lt;div&gt;&lt;b&gt;30.4383&lt;/b&gt;: It's been about two seconds...so we get another entry into the &lt;b&gt;v$&lt;/b&gt; table. This will go one and on until the statement completes.&lt;/div&gt;&lt;pre&gt;&lt;code&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;&lt;div&gt;13:59:56.5887  v$sqlstats change for 7,7 exec=21 dskrds=7037188 lio=968314&lt;br /&gt;13:59:58.5933  v$sqlstats change for 7,7 exec=21 dskrds=7059028 lio=968314&lt;br /&gt;14:00:00.5970 &amp;nbsp;v$sqlstats change for 7,7 exec=21 dskrds=7083143 lio=968314&lt;/div&gt;&lt;div&gt;14:00:00.8328 &amp;nbsp;SQL end...&lt;/div&gt;&lt;div&gt;14:00:07.8464 &amp;nbsp;SQL start...&lt;/div&gt;&lt;div&gt;14:00:07.8472 &amp;nbsp;v$sqlstats change for 7,7 exec=22 dskrds=7085215 lio=968314&lt;/div&gt;&lt;div&gt;14:00:09.6418 &amp;nbsp;v$sqlstats change for 7,7 exec=22 dskrds=7111639 lio=1014409&lt;/div&gt;&lt;div&gt;14:00:11.6577 &amp;nbsp;v$sqlstats change for 7,7 exec=22 dskrds=7119745 lio=1014409&lt;/div&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;I am not showing all the entries and have jumped to the last few before the statement finishes. Directly above are the final entries for the above execution (SQL end...) and then you will notice the next execution will begin (SQL start...). Here is an explanation:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;56.5887&lt;/b&gt;: The statement is still running and the normal two second &lt;b&gt;insert&lt;/b&gt; into &lt;b&gt;v$sqlstats&lt;/b&gt; occurred, detected by our monitoring user, and recorded in our results table.&lt;br /&gt;&lt;b&gt;58.5933&lt;/b&gt;: Same as with 56.887.&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;b&gt;00.5970&lt;/b&gt;: The statement completes and immediately the&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;row is refreshed. Notice the&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;row did not refresh in the usual two second cycle. As soon as the statement completed, the row was refreshed, detected by our monitoring user, and recorded in our results table.&lt;/div&gt;&lt;div&gt;&lt;b&gt;00.8328&lt;/b&gt;: The application user inserts a row telling us our simple SQL statement has indeed finished. Notice the monitoring user has no idea this has occurred, hence there is &lt;b&gt;v$sqlstats&lt;/b&gt; entry.&lt;/div&gt;&lt;div&gt;&lt;b&gt;07.8464&lt;/b&gt;. About seven seconds later(!!) the simple SQL statement is about to be run once again.&lt;/div&gt;&lt;div&gt;&lt;b&gt;07.8472&lt;/b&gt;. The statement begins and the monitoring user detects a refreshed row in &lt;b&gt;v$sqlstats&lt;/b&gt;. Notice the execution count has been incremented to 22. Once again, for this SQL statement, the execution is incremented at the start of the SQL statement. Also, the &lt;b&gt;v$sqlstats&lt;/b&gt; entry does not match the two second pattern. Again, it appears when a statement starts a row is immediately refreshed in &lt;b&gt;v$sqlstats&lt;/b&gt;. We can expect the next &lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;refresh to occur around the &lt;b&gt;09.8472&lt;/b&gt; time.&lt;/div&gt;&lt;div&gt;&lt;b&gt;09.6418&lt;/b&gt;. As expected about two seconds later, the next &lt;b&gt;v$sqlstats&lt;/b&gt; refresh is made because the SQL statement is active. We can see that some of activity is related to the increase in disk reads.&lt;/div&gt;&lt;div&gt;&lt;b&gt;011.6577&lt;/b&gt;. ...same as the above entry...&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;From this experiment I learned:&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Upon a&amp;nbsp;&lt;b&gt;select&lt;/b&gt; statement start, the &lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;row is either inserted or refreshed (if it already exists).&lt;/li&gt;&lt;li&gt;The execution count is incremented when a &lt;b&gt;select&lt;/b&gt; begins.&lt;/li&gt;&lt;li&gt;Statistics are refreshed in&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;about every two seconds, except when the statement begins and ends, in which a change immediately occurs.&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;A word of caution. This test was specifically for a &lt;b&gt;select&lt;/b&gt; statement. While I do additional experiments related to procedures, I did not do experiment with &lt;b&gt;inserts&lt;/b&gt;, &lt;b&gt;updates&lt;/b&gt;, &lt;b&gt;delete&lt;/b&gt;, or DDL. While I suspect DML will follow this pattern, I have no experimental evidence to back this up... there are other things I need to attend to...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Results: Procedure With Simple SQL Statement&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For the next experiment, I created a simple procedure containing the &lt;i&gt;simple&lt;/i&gt; SQL statement. Here it is:&lt;/div&gt;&lt;pre&gt;&lt;code&gt;create or replace procedure op_test4&lt;br /&gt;as&lt;br /&gt;begin&lt;br /&gt;declare&lt;br /&gt;   nothingness number;&lt;br /&gt;begin&lt;br /&gt;   select /* orapub 4 */ avg(t0.object_id+t1.object_id+t2.object_id)&lt;br /&gt;   into   nothingness&lt;br /&gt;   from   test1 t0, test1 t1, test1 t2&lt;br /&gt;   where  t0.object_id=t1.object_id and t1.object_id=t2.object_id;&lt;br /&gt;end;&lt;br /&gt;end;&lt;br /&gt;/&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;Here is the application user script, with the middle parts removed, I copy and pasted into the application user's sqlplus session. The &lt;b&gt;prc_name&lt;/b&gt; variable was set to &lt;b&gt;op_test4&lt;/b&gt;.&lt;br /&gt;&lt;pre&gt;&lt;code&gt;def results_owner_in=system&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'Started with PRC execution test.');&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'PRC start...');&lt;br /&gt;exec &amp;amp;prc_name;&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'PRC end...');&lt;br /&gt;exec dbms_lock.sleep(7);&lt;br /&gt;...&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'PRC start...');&lt;br /&gt;exec &amp;amp;prc_name;&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'PRC end...');&lt;br /&gt;insert into &amp;amp;results_owner_in..op_results_raw values (systimestamp,'Finished with PRC execution test.');&lt;br /&gt;commit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;For readability, I replaced the &lt;b&gt;sql_id&lt;/b&gt; with &lt;b&gt;abc&lt;/b&gt;. I did not change the &lt;b&gt;plan_hash_value&lt;/b&gt;. It was indeed 0 and was 0 for all procedures I have investigated. I'm going to comment on briefly on the results.&lt;/div&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;&lt;div&gt;&lt;div&gt;MARK &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; HAPPENING&lt;/div&gt;&lt;div&gt;-------------- ---------------------------------------------------------------&lt;/div&gt;&lt;div&gt;17:20:13.7554 &amp;nbsp;Started v$sqlstats collection&lt;/div&gt;&lt;div&gt;17:20:13.7887 &amp;nbsp;v$sqlstats change for abc,0 exec=3 dskrds=760213 lio=124219&lt;/div&gt;&lt;div&gt;17:20:17.6207 &amp;nbsp;Started with PRC execution test.&lt;/div&gt;&lt;div&gt;17:20:17.8577 &amp;nbsp;PRC start...&lt;/div&gt;&lt;div&gt;17:20:17.8584 &amp;nbsp;v$sqlstats change for abc,0 exec=3 dskrds=760213 lio=124219&lt;/div&gt;&lt;div&gt;17:20:19.4002 &amp;nbsp;v$sqlstats change for abc,0 exec=3 dskrds=782882 lio=146913&lt;/div&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;&lt;b&gt;13.7554&lt;/b&gt;: The monitoring user has begun looking for change in &lt;b&gt;v$sqlstats&lt;/b&gt;.&lt;/div&gt;&lt;div&gt;&lt;b&gt;13.7887&lt;/b&gt;: Since we just started monitoring and have never seen an entry in &lt;b&gt;v$sqlstats&lt;/b&gt; for this &lt;b&gt;sql_id&lt;/b&gt; and &lt;b&gt;plan_hash_value&lt;/b&gt;, our monitoring user detects this "change" and records the event. Again, disregard this entry.&lt;/div&gt;&lt;div&gt;&lt;b&gt;17.6207&lt;/b&gt;: About four seconds after monitoring begins the procedure test is about to begin. Again, this time lag is due to me copying and pasting the application code (shown in part above) into the application user's session.&lt;/div&gt;&lt;div&gt;&lt;b&gt;17.8577&lt;/b&gt;: The procedure is about to begin.&lt;/div&gt;&lt;div&gt;&lt;b&gt;17.8584&lt;/b&gt;: Immediately an entry is made into &lt;b&gt;v$sqlstats&lt;/b&gt; and the monitoring user detects this and records a row in our results table. Notice there has been no change in the execution count. It was 3 before we started the procedure and it is still three!&lt;/div&gt;&lt;div&gt;&lt;b&gt;19.4002&lt;/b&gt;: Two seconds later, as expected since the procedure is running, the&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;row is refreshed and the monitoring user detects the change. In this case we can see there is an increase in both physical disk reads and logical IO. Notice the execution count has still not changed!&lt;/div&gt;&lt;pre&gt;&lt;code&gt;&lt;div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;17:20:47.5802 &amp;nbsp;v$sqlstats change for abc,0 exec=3 dskrds=1071028 lio=149206&lt;/div&gt;&lt;div&gt;17:20:49.5903 &amp;nbsp;v$sqlstats change for abc,0 exec=3 dskrds=1100540 lio=149206&lt;/div&gt;&lt;div&gt;17:20:50.0857 &amp;nbsp;v$sqlstats change for abc,0 exec=4 dskrds=1109395 lio=149206&lt;/div&gt;&lt;div&gt;17:20:50.1243 &amp;nbsp;PRC end...&lt;/div&gt;&lt;div&gt;17:20:57.1721 &amp;nbsp;PRC start...&lt;/div&gt;&lt;div&gt;17:20:57.1727 &amp;nbsp;v$sqlstats change for abc,0 exec=4 dskrds=1109395 lio=149206&lt;/div&gt;&lt;div&gt;17:20:58.6511 &amp;nbsp;v$sqlstats change for abc,0 exec=4 dskrds=1131141 lio=170985&lt;/div&gt;&lt;div&gt;17:21:00.6632 &amp;nbsp;v$sqlstats change for abc,0 exec=4 dskrds=1143823 lio=174193&lt;/div&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;&lt;b&gt;47.5802&lt;/b&gt;: The procedure is still running and the still consuming computing resources.&lt;/div&gt;&lt;div&gt;&lt;b&gt;49.5903&lt;/b&gt;:&amp;nbsp;Two seconds later, as expected since the procedure is running, the &lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;&amp;nbsp;row is refreshed, the monitoring user detects the change, and inserts a row into the results table.&lt;/div&gt;&lt;div&gt;&lt;b&gt;50.0857&lt;/b&gt;: Before the next two second cycle occurs, the procedure finishes, &lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;immediately refreshes the row, and our monitoring user detects the change and stores the results. Notice the execution count is finally incremented! So in this test, the execution count was incremented when the procedure completed not when the procedure started.&lt;/div&gt;&lt;div&gt;&lt;b&gt;50.1243&lt;/b&gt;: The application user posts that the procedure has ended.&lt;/div&gt;&lt;div&gt;&lt;b&gt;57.1721&lt;/b&gt;: After our specified seven second delay another identical procedure execution is about to begin.&lt;/div&gt;&lt;div&gt;&lt;b&gt;57.1727&lt;/b&gt;: It does begin, the&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;row is refreshed, the monitoring user detects and records the change. Notice the execution count did not increment.&lt;/div&gt;&lt;div&gt;&lt;b&gt;58.6511&lt;/b&gt;: Nearly two seconds later, but strangely closer to 1.5 seconds later, the&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;row is refreshed.&lt;/div&gt;&lt;div&gt;&lt;b&gt;00.6632&lt;/b&gt;: Two seconds later, the&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;row is again refreshed.&lt;/div&gt;&lt;pre&gt;&lt;code&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;&lt;div&gt;17:21:28.8233 &amp;nbsp;v$sqlstats change for&amp;nbsp;abc,0 exec=4 dskrds=1433791 lio=174193&lt;/div&gt;&lt;div&gt;17:21:30.8343 &amp;nbsp;v$sqlstats change for abc,0 exec=4 dskrds=1453958 lio=174193&lt;/div&gt;&lt;div&gt;17:21:31.1759 &amp;nbsp;v$sqlstats change for abc,0 exec=5 dskrds=1458564 lio=174193&lt;/div&gt;&lt;div&gt;17:21:31.1773 &amp;nbsp;PRC end...&lt;/div&gt;&lt;div&gt;17:21:38.1846 &amp;nbsp;PRC start...&lt;/div&gt;&lt;div&gt;17:21:38.1851 &amp;nbsp;v$sqlstats change for abc,0 exec=5 dskrds=1458564 lio=174193&lt;/div&gt;&lt;div&gt;17:21:39.8759 &amp;nbsp;v$sqlstats change for abc,0 exec=5 dskrds=1482325 lio=197985&lt;/div&gt;&lt;div&gt;17:21:41.8949 &amp;nbsp;v$sqlstats change for abc,0 exec=5 dskrds=1490264 lio=199180&lt;/div&gt;&lt;/div&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;&lt;b&gt;31.1759&lt;/b&gt;: Breaking the second cycle because the procedure finished, the&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;row is refreshed. Notice the execution count is incremented to five.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;From this experiment I learned, more like confirmed:&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Upon a&amp;nbsp;&lt;b&gt;select&lt;/b&gt;&amp;nbsp;statement start, the&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;row is either inserted or refreshed (if it already exists).&lt;/li&gt;&lt;li&gt;The execution count is incremented when a&amp;nbsp;&lt;b&gt;select&lt;/b&gt;&amp;nbsp;begins.&lt;/li&gt;&lt;li&gt;Statistics are refreshed in&amp;nbsp;&lt;b&gt;v$sqlstats&lt;/b&gt;&amp;nbsp;about every two seconds, except when the statement begins and ends, in which a change immediately occurs.&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Results: Procedure With Two Simple SQL Statements&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For the next experiment, I created a simple procedure containing two of the &lt;i&gt;simple&lt;/i&gt; SQL statement once. Here it is:&lt;/div&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;create or replace procedure op_test5&lt;br /&gt;as&lt;br /&gt;begin&lt;br /&gt;declare&lt;br /&gt; nothingness number;&lt;br /&gt; i number;&lt;br /&gt;begin&lt;br /&gt;    for i in 1..2&lt;br /&gt;    loop&lt;br /&gt;       select /* orapub 5 */ avg(t0.object_id+t1.object_id+t2.object_id)&lt;br /&gt;       into   nothingness&lt;br /&gt;       from   test1 t0, test1 t1, test1 t2&lt;br /&gt;       where  t0.object_id=t1.object_id and t1.object_id=t2.object_id;&lt;br /&gt; end loop;&lt;br /&gt;end;&lt;br /&gt;end;&lt;br /&gt;/&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;The application user script is exactly the same as the previous procedure test. The only difference is the variable &lt;b&gt;prc_name&lt;/b&gt; is set to &lt;b&gt;op_test5&lt;/b&gt;. For readability, I replaced the &lt;b&gt;sql_id&lt;/b&gt; with &lt;b&gt;xyz&lt;/b&gt; but did not change the &lt;b&gt;plan_hash_value&lt;/b&gt; of zero. Here are the results.&lt;/div&gt;&lt;pre&gt;&lt;code&gt;&lt;div&gt;&lt;div&gt;MARK &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; HAPPENING&lt;/div&gt;&lt;div&gt;-------------- --------------------------------------------------------------&lt;/div&gt;&lt;div&gt;18:50:26.0991 &amp;nbsp;Started v$sqlstats collection&lt;/div&gt;&lt;div&gt;18:50:26.1003 &amp;nbsp;v$sqlstats change for xyz,0 exec=20 dskrds=13208886 lio=1041322&lt;/div&gt;&lt;div&gt;18:50:31.8619 &amp;nbsp;Started with PRC execution test.&lt;/div&gt;&lt;div&gt;18:50:31.8632 &amp;nbsp;PRC start...&lt;/div&gt;&lt;div&gt;18:50:31.8638 &amp;nbsp;v$sqlstats change for xyz,0 exec=20 dskrds=13208886 lio=1041322&lt;/div&gt;&lt;div&gt;18:50:33.1397 &amp;nbsp;v$sqlstats change for xyz,0 exec=20 dskrds=13213295 lio=1045734&lt;/div&gt;&lt;div&gt;18:50:35.1491 &amp;nbsp;v$sqlstats change for xyz,0 exec=20 dskrds=13235541 lio=1066291&lt;/div&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;&lt;div&gt;18:51:35.4911 &amp;nbsp;v$sqlstats change for xyz,0 exec=20 dskrds=13855325 lio=1091260&lt;/div&gt;&lt;div&gt;18:51:37.5069 &amp;nbsp;v$sqlstats change for xyz,0 exec=20 dskrds=13884991 lio=1091260&lt;/div&gt;&lt;div&gt;18:51:38.9183 &amp;nbsp;v$sqlstats change for xyz,0 exec=21 dskrds=13907265 lio=1091260&lt;/div&gt;&lt;div&gt;18:51:38.9194 &amp;nbsp;PRC end...&lt;/div&gt;&lt;div&gt;18:51:46.2494 &amp;nbsp;PRC start...&lt;/div&gt;&lt;div&gt;18:51:46.2513 &amp;nbsp;v$sqlstats change for xyz,0 exec=21 dskrds=13907265 lio=1091260&lt;/div&gt;&lt;div&gt;18:51:47.5500 &amp;nbsp;v$sqlstats change for xyz,0 exec=21 dskrds=13917643 lio=1101648&lt;/div&gt;&lt;div&gt;18:51:49.5649 &amp;nbsp;v$sqlstats change for xyz,0 exec=21 dskrds=13936621 lio=1116247&lt;/div&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;&lt;div&gt;18:52:51.8954 &amp;nbsp;v$sqlstats change for xyz,0 exec=21 dskrds=14586303 lio=1141216&lt;/div&gt;&lt;div&gt;18:52:53.1498 &amp;nbsp;v$sqlstats change for xyz,0 exec=22 dskrds=14605637 lio=1141216&lt;/div&gt;&lt;div&gt;18:52:53.1511 &amp;nbsp;PRC end...&lt;/div&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;&lt;div&gt;18:57:34.2478 &amp;nbsp;v$sqlstats change for xyz,0 exec=25 dskrds=17242358 lio=1341040&lt;/div&gt;&lt;div&gt;18:57:36.2597 &amp;nbsp;v$sqlstats change for xyz,0 exec=25 dskrds=17262000 lio=1341040&lt;/div&gt;&lt;div&gt;18:57:37.8937 &amp;nbsp;v$sqlstats change for xyz,0 exec=26 dskrds=17280340 lio=1341040&lt;/div&gt;&lt;div&gt;18:57:37.8945 &amp;nbsp;PRC end...&lt;/div&gt;&lt;div&gt;18:57:37.8957 &amp;nbsp;Finished with PRC execution test.&lt;/div&gt;&lt;div&gt;18:57:56.1001 &amp;nbsp;Finished v$sqlstats collection&lt;/div&gt;&lt;/div&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;If you have followed the previous two experiments, you should have no problem understanding the above. The same pattern is followed as with the above procedure. While I learned nothing new, it reinforced what I previous learned.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Results: Procedure With Simple SQL and Procedure Call&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For the next experiment, I created a simple procedure containing the&amp;nbsp;&lt;i&gt;simple&lt;/i&gt;&amp;nbsp;SQL statement and I included a call to another procedure, &lt;b&gt;dbms_random&lt;/b&gt;. Here it is:&lt;/div&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;create or replace procedure op_test6&lt;br /&gt;as&lt;br /&gt;begin&lt;br /&gt;declare&lt;br /&gt;   nothingness number;&lt;br /&gt;begin&lt;br /&gt;   select /* orapub 6a */ avg(t0.object_id+t1.object_id+t2.object_id)&lt;br /&gt;   into   nothingness&lt;br /&gt;   from   test1 t0, test1 t1, test1 t2&lt;br /&gt;   where  t0.object_id=t1.object_id and t1.object_id=t2.object_id;&lt;br /&gt;&lt;br /&gt;   select /* orapub 6b */ dbms_random.random into nothingness from dual;&lt;br /&gt;end;&lt;br /&gt;end;&lt;br /&gt;/&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;The application user script is exactly the same as the previous procedure test. The only difference is the variable &lt;b&gt;prc_name&lt;/b&gt; is set to &lt;b&gt;op_test6&lt;/b&gt;. For readability, I replaced the &lt;b&gt;sql_id&lt;/b&gt; with &lt;b&gt;akb&lt;/b&gt;. The results are exactly was with the previous two procedure experiments!&lt;/div&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;&lt;div&gt;&lt;div&gt;MARK &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; HAPPENING&lt;/div&gt;&lt;div&gt;-------------- ---------------------------------------------------------------&lt;/div&gt;&lt;div&gt;16:21:57.9886 &amp;nbsp;Started v$sqlstats collection&lt;/div&gt;&lt;div&gt;16:21:57.9894 &amp;nbsp;v$sqlstats change for akb,0 exec=25 dskrds=937907 lio=624657&lt;/div&gt;&lt;div&gt;16:22:05.7840 &amp;nbsp;Started with PRC execution test.&lt;/div&gt;&lt;div&gt;16:22:05.7847 &amp;nbsp;PRC start...&lt;/div&gt;&lt;div&gt;16:22:05.7849 &amp;nbsp;v$sqlstats change for akb,0 exec=25 dskrds=937907 lio=624657&lt;/div&gt;&lt;div&gt;16:22:07.5003 &amp;nbsp;v$sqlstats change for akb,0 exec=25 dskrds=963609 lio=649644&lt;/div&gt;&lt;div&gt;16:22:09.5084 &amp;nbsp;v$sqlstats change for akb,0 exec=25 dskrds=965463 lio=649644&lt;/div&gt;&lt;div&gt;16:22:11.5073 &amp;nbsp;v$sqlstats change for akb,0 exec=25 dskrds=967173 lio=649644&lt;/div&gt;&lt;div&gt;16:22:13.5417 &amp;nbsp;v$sqlstats change for akb,0 exec=25 dskrds=969048 lio=649644&lt;/div&gt;&lt;div&gt;16:22:15.5542 &amp;nbsp;v$sqlstats change for akb,0 exec=25 dskrds=970743 lio=649644&lt;/div&gt;&lt;div&gt;16:22:17.5569 &amp;nbsp;v$sqlstats change for akb,0 exec=25 dskrds=972408 lio=649644&lt;/div&gt;&lt;div&gt;16:22:19.5811 &amp;nbsp;v$sqlstats change for akb,0 exec=25 dskrds=974238 lio=649644&lt;/div&gt;&lt;div&gt;16:22:21.0560 &amp;nbsp;v$sqlstats change for akb,0 exec=26 dskrds=975423 lio=649644&lt;/div&gt;&lt;div&gt;16:22:21.0572 &amp;nbsp;PRC end...&lt;/div&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;&lt;div&gt;16:24:12.4322 &amp;nbsp;Finished with PRC execution test.&lt;/div&gt;&lt;div&gt;16:24:17.9893 &amp;nbsp;Finished v$sqlstats collection&lt;/div&gt;&lt;/div&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;While I learned nothing new, it&amp;nbsp;reinforced&amp;nbsp;what I previous learned...again.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Results: Procedure With Simple SQL and dbms_output&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For the next experiment, I created a simple procedure containing the&amp;nbsp;&lt;i&gt;simple&lt;/i&gt;&amp;nbsp;SQL statement and I included a call to another procedure, &lt;b&gt;dbms_output&lt;/b&gt;. Here it is:&lt;/div&gt;&lt;/div&gt;&lt;pre&gt;&lt;code&gt;create or replace procedure op_test7&lt;br /&gt;as&lt;br /&gt;begin&lt;br /&gt;declare&lt;br /&gt;   nothingness number;&lt;br /&gt;begin&lt;br /&gt;   select /* orapub 7 */ avg(t0.object_id+t1.object_id+t2.object_id)&lt;br /&gt;   into   nothingness&lt;br /&gt;   from   test1 t0, test1 t1, test1 t2&lt;br /&gt;   where  t0.object_id=t1.object_id and t1.object_id=t2.object_id;&lt;br /&gt;&lt;br /&gt;   dbms_output.put_line('orapub 7 put_line');&lt;br /&gt;end;&lt;br /&gt;end;&lt;br /&gt;/&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div&gt;The application user script is exactly the same as the previous three procedure tests. The only difference is the variable &lt;b&gt;prc_name&lt;/b&gt; is set to &lt;b&gt;op_test7&lt;/b&gt;. The experiment ran in two different versions; the first was with serveroutput &lt;b&gt;off&lt;/b&gt; and the other it was set to &lt;b&gt;on&lt;/b&gt;. The serveroutput setting had no affect on the execution count and results were exactly the same as the previous procedure experiments.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;While I learned nothing new, it strengthened what I previous learned...again.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Conclusions&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Sometimes it is very difficult to design an experiment peering into the unknown. In this particular experiment I needed to create a way to understand &lt;i&gt;when&lt;/i&gt; and &lt;i&gt;what&lt;/i&gt; changed in a &lt;b&gt;v$&lt;/b&gt; view. My solution was to create a tight loop looking for a very specific change in the underlying &lt;b&gt;x$&lt;/b&gt; fixed view and when a change was detected it was timestamped and recorded in the results table. I also had specific application code performing specific actions while recording into the same results table when the application started and stopped its activity. A simple results table query enabled me to study the timing of events, which easily brought to light, in this specific set of experiments, &lt;i&gt;when&lt;/i&gt; &lt;b&gt;v$sqlstats&lt;/b&gt; is changed and &lt;i&gt;what&lt;/i&gt; is changed. As you will see in subsequent blog posts, it was very important for me understand this.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What I learned specifically about &lt;b&gt;v$sqlstat&lt;/b&gt;s timing:&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Upon the &lt;i&gt;starting&lt;/i&gt; and &lt;i&gt;ending&lt;/i&gt; of an anonymous SQL statement or a procedure, the &lt;b&gt;v$sqlstats&lt;/b&gt; row is immediately refreshed if it exists&amp;nbsp;or inserted if it does not exist.&lt;/li&gt;&lt;li&gt;Besides the beginning and ending ref
