Wednesday, January 26, 2011

Library Cache Visualization Tool: How To

Some background...I think what separates Oracle DBAs is their ability to communicate. If you take the time to study (my courses and books can help), every DBA will eventually become an Oracle technology expert. But from this point is where our uniqueness become more interesting as we each set out on a different career path.

To improve my teaching I spend countless hours developing stories, role-plays , and various entertaining ways to transfer very complicated topics to DBAs. But what has been frustrating for me was the lack of visualization. Pictures, white boards, and flip charts are OK, but I wanted to do something that's visually amazing, flexible, and free. Finally, I discovered that Mathematica could help me in this quest. So last year I embarked on a journey to visualize aspects of Oracle's technology.

I started with Oracle's buffer cache. Thousands of you downloaded the tool and the read the associated blog entry (its documentation). A few months ago I created an initial visualization tool for Oracle's library cache. The visualizations where jaw dropping but it was too highly abstracted for my liking, didn't relate enough with production systems, and had no associated documentation. Finally, this has changed...

Oracle's library cache (LC) is amazing. And viewing it on your PC is even more amazing and very enlightening. I just completed a significant update (still free) that is much more realistic, based on an 11g library cache dump and allows you to enter details from your real production Oracle system! Yes, it's very cool! You can download OraPub's Library Cache Visualization Tool here for free.

The purpose of this blog entry is to introduce you to the new version (2g) of OraPub's Library Cache Visualization Tool and also to introduce you (if you desire) to Oracle's library cache. I hope you enjoy this blog entry and the visualization tool!

Library Cache Introduction

Within Oracle's shared pool exists the library cache (LC). The library cache can be abstracted and viewed like a classic library's card catalog system. The cards are located by hashing to the correct catalog and then sequentially searching through the catalog looking for the desired card. Each card references a book's location and a few other details.

It's Searched

Oracle's LC is searched using a hashing algorithm. Suppose our server process is parsing the SQL statement, select * from dual. The server process attempts to minimize the time and resources required to parse the statement. One strategy employed is to check and see if the SQL statement has already been parsed and its cursor available in the library cache. But the server process must first locate the specific cursor reference, just like we must first locate a printed book's reference card.

The server process will pass the SQL statement text to a hash value generation function that will transform the text (e.g., select * from dual) into a hash value (e.g., rT4rif87ujc). The hash value is then passed to hash function which, will output a number within a specified range (e.g., 0,...,350000).  The output number (e.g., 3216) is a reference to what is called a hash bucket. If our SQL statement's cursor exists in the library cache it will have a reference that is associated with this hash bucket (e.g., 3216).

Each hash bucket can have an associated chain. The chain length could be zero (which is actually the most likely on real Oracle systems) or it could be perhaps two or even three references long. The references are more properly called handles, because they reference a chunk of memory within Oracle's shared pool. Continuing my example, the server process jumps to the hash bucket (e.g., 3216) and begins sequentially scanning its associated hash chain searching for my SQL statement's hash value (e.g., rT4rif87ujc). If it finds the hash value (the cursor reference) that's a soft parse, if not, then it must create the cursor plus a bunch of other stuff...this is known as a hard parse.

The image below is a visual example of unrealistically simplified LC hashing structure...but it serves our purpose here wonderfully.
The LC hashing structure modeled above has 16 hash chains. Most chains contain two references, that is, handles. Suppose in my quest to locate my SQL statements cursor (hash value, CSR 5), I hash to bucket, LC 3. The B is for beginning and the E represents the chain's end. My server process will now begin to sequentially search the LC hash chain LC 3 looking for my cursor, represented as CSR 5. It will quickly find it! Now my server process has access to all sorts of details related to the SQL statement and does not have to build the cursor, which is relatively expensive and part of the hard parsing process.

References and Relationships

The LC is responsible for maintaining the relationships between objects in the LC...and it's complex. This is why the visualization tool is so cool because we get a glimpse of this complexity. As a quick example, the LC will maintain the relationships between tables and SQL statements (more specifically child cursors). This way if the table is altered, Oracle knows which child cursors and cursors to invalidate. This invalidation process will propagate to associated views, procedures, functions, triggers, packages, etc. Oracle tries very hard to limit this propagation as it has some nasty repercussions (that I will not discuss here).

The LC handles point to memory that contains information about procedures, functions, tables, views, synonyms, cursors, child cursors, and I'm sure there are others. Not only are there references to these nodes, but references to other nodes. For example a parent cursor (select * from customers) can reference a child cursor (which has a specific execution plan) and a table (customers). When you add in the connections with procedures and functions it's gets pretty crazy. Or how about tables that are referenced in multiple statements or when a cursor is shared by multiple procedures, functions, or packages...

The image below is an example of the relationship between 2 cursors (parent cursors), their associated 4 child cursors, and their associated 5 tables. As you can see, with only a few objects and especially when sharing occurs, the relationships can become quite complex.
If I create a table, a couple of SQL statements, a procedure, and I force a couple of child cursors by changing the optimization mode, and finally dump the library cache to a trace file, I am able to diagram their relationships. But it takes awhile. Mapping more than a couple of objects would quickly turn into a nightmare!

The Core Objects

As I mentioned, the LC references and contains meta data about Oracle objects. Not all objects defined in Oracle's data dictionary, but only those that have been recently referenced (there are exceptions) and have some cached information. For example, you can not get a good idea about the number of tables referenced in the LC by issuing a select count(*) from all_tables.

You can get a list of all the types or namespaces currently residing in Oracle's LC by issuing this simple command:
select namespace from v$librarycache order by 1;
On my 11g system there was 17 namespaces. The parent cursors and child cursors (more below) are contained within the SQL AREA namespace. Personally, I find the LC dictionary views inadequate. For me, if I want to get an idea of what is going on in the LC, I dump it to a trace file. Here's how to do that:
alter session set max_dump_file_size=unlimited;
alter session set events 'immediate trace name library_cache level 10';
Of course the next question is, "Where the heck is my trace file?" In 10g and earlier in SQL*Plus issue a show parameter user_dump_dest. In 11g, here's one way to get the full path to the trace file.
select tracefile from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));
If you look closely in the trace file you will eventually see the word FullHashValue followed by a bunch of text. Look to the right and you will see Type=. Below is an example from a recent trace file with the hash value and identifier dramatically shortened.
FullHashValue=9ba Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=395 OwnerIdn=5
This object is, obviously, a cursor. If I skip down a few lines I see this:
Parent Cursor:  sql_id=8zu55nbpz8fdu parent=0x3030fe60 maxchild=2 plk=n ppn=n
Notice the maxchild=2 entry. This means this cursor (i.e., parent cursor) has two child cursors. If was to jump back up a couple lines in the trace file, I would find.
 Child:  id='0' Table=30310d58 Reference=303103e4 Handle=3030fb9c
 Child:  id='1' Table=30310d58 Reference=303105b4 Handle=30303c24
These are the references to the two child cursors. Interestingly, if I search for the findme table reference in the trace file, it will have a reference section and I will see it has references to handles 3030fb9c and 30303c24. So you can see there is a link from the table to the associated child cursors.

A child cursor is required for different execution plans for the syntacally identical cursor. For example, if I run the below code, one parent cursor and two child cursors will result.
alter session set optimizer_mode=all_rows;
select * from findme;
alter session set optimizer_mode=first_rows;
select * from findme;
This is one reason why if you query from v$sql you can get more than one row returned. The situation quickly becomes complex when adding even a single procedure that references the findme table. Each procedure has a reference to the child cursor of any SQL or other procedures and functions contained within it. And all the table's referenced by the procedure's child cursors also point to the procedure itself. It can become confusing very quickly.

The Relationship Mapping

With just the simple situation outlined above, plus some details I conveniently left out, you can see:
  • Every child cursor references a parent cursor.
  • A parent cursor may reference a child cursor, but not always.
  • Every procedure references the child cursor of SQL, procedures, etc. it calls.
  • Table references point to all procedures and child cursors that reference it.
I think you can see that with just a few objects, the connections will soon look like sub-atomic particles or astronomical objects! But that's what makes this all so interesting...

For my LC visualization tool I made the decision to not include packages, procedures, functions, and triggers (and other objects). Their inclusion would have signifiantly increased the complexity of the visualization beyond being useful. If you keep in mind that what you are seeing is probably the least complex situation given the inputs then you'll be fine. Even the introduction of a few procedures will significantly increase the node connections.

Tool Introduction and Control

You can download OraPub's Library Cache Visualization Tool for free from OraPub's web-site here. As the tool's page indicates and links to, you will need to download and install Mathematica's free player. It works just like the Acrobat Reader in that the reader is free, but to create the document you must license Acrobat, which in this case means Mathematica. (Note: OraPub has a special relationship with Mathematica's company, Wolfram and can arrange for a discounted license. Just email OraPub for details.)

This posting is based on version 2g of OraPub's LC Visualization Tool. Other versions will operate similarly, but the images shown below could be somewhat different.

Note: You can always reset the visualization settings by clicking on the upper right hand plus sign and then selecting Initial Settings.

When you first run OraPub's LC Visualization Tool, it will look like this:
If you just can't wait, click on Preset and select Intro 2. Click on the image and drag your mouse to rotate the image...more on this below.

When you first run the tool, as shown above, a very simple, very unrealistic, highly abstracted, yet very useful and cool LC visualization appears along with all your control options. Here's a short description of the tool's control mechanisms:
  • Cursors w/Child Cursors is the number of parent cursors that have one or more child cursors.
  • Cursors wo/Child Cursors is the number of parent cursors that do not have a child cursor. While we typically think of parent cursors always having at least one child cursor, if you examine a library cache dump you will quickly notice this is not true. For demonstrations, I usually set this to zero.
  • Child Cursors is the total number of child cursors in the library cache. If you were to execute a SQL statement, dump the LC and examine the trace file, you will find your SQL statement's parent cursor and you will notice it has one child cursor. The number of child cursors be set to at least the number of Cursors w/Child Cursors. If not, the tool should automatically reset the number of child cursors equal to the number of parent cursors with children. It is common for a parent cursor to have multiple child cursors.
  • Unique Table/Views is the total number of unique tables and views. While there are other objects such as synonyms and sequences in the LC, for simplicity they are not represented. A real child cursor will reference at least one table, view, synonym, etc. In this tool, a child cursor will reference one or more unique table/view.
  • Table/View Share % is the percentage of unique tables that each child cursor will be associated with. This injects the messy pointer reality that a table will likely be associated with many child cursors. You will notice that when we increase this parameter, the number of memory references dramatically increases. For example, a customer table will likely be associated with literally hundreds of SQL statement child cursors. This tool will evenly (i.e., uniformly) distribute the tables to the child cursors. In a real Oracle system, obviously this will not occur as some tables will be referenced by more child cursors than other tables.
  • LC Hash Chains is the number of library cache hash chains. Each LC bucket will have a hash chain, though in production systems most chains will contain no references. At the top of an Oracle 11g LC dump, you will notice a very nice table that shows the chain length by the count of chains. You may find that most of your chains have a length of zero. For fast searches we want a chain length of only one. Hashing is a fascinating topic and is very important for Oracle memory management. This is why I gave quite a bit of space on this topic in both the buffer cache and library cache chapters in my book, Oracle Performance Firefighting. For simple demonstration purposes I usually set the number of LC hash chains to around the number of tables and child cursors. In real Oracle systems, there can easily be over 100,000 chains!
  • To Plot allows us to focus on only the LC chains, the other objects in the LC cache (cursors, child cursors, and tables/views), or all the objects. This is a great way to study each topic without the complexity of additional memory links. But as you'll see, when objects are fully shown the complexity can be dazzling! Parent cursors without a child cursor are only displayed with the Full option.
  • Scale % allows you to enter production size values for the parent cursors, etc. but only visualize a percentage of them. This is important because a meaningful visualization of a production system must be scaled down or it will take quite a long time to create a meaningless visualization. Before you enter production size values, set the Scale % to 0.1. Then increase one tenth of a percentage point at a time until the desired visualization appears. Click on the control's plus sign and then click on the just displayed plus sign to jump 1/10 percentage point. This will also convey just how massive and complex Oracle's LC is. The scaling algorithm is very crude: the result is simply the the given percentage multiplied by the given number of objects. For example, if the number of unique tables is 20000 and the scale percentage is 10%, 2000 tables will be visualized...much too many by the way.
  • Preset was implemented so we can quickly display some basic pre-defined visualizations. This is important: If you want to modify a preset, you must first set the Preset to Custom. If you forget to do this, your changes will immediately be overwritten by the displayed preset values.
  • Circle Size allows the two-demonentional (2D) visualization circle sizes to be adjusted. Based on the number of objects displayed and the viewing area, for a visually pleasing image you sometimes will want to adjust the colored circle size.
  • Visualization options provide you with a number of ways to view the same exact information. The Point option simply shows a single point for every node. This allows lots of information to be displayed two-dimensionally. You can mouse-over each point to see what it is. The 2D option displays each node as a circle with the node abbreviation text visible (usually visible). This is a fantastic way to learn about the LC: With all the memory object sharing and list connections, as the number of displayed items increases, the value of the two dimensional view quickly diminishes. The 3D option shows each node as a small sphere and allows you to view lots of information in a more realistic way. The spheres are color coded based on the selected Color Scheme (next item). The 3D view opens up a whole new way to study the LC. But eventually the 3D option can become cluttered and so the 3D Wire option changes the spheres to points and allows the object name to be displayed on a mouse-over (you have to be zoomed in pretty close though).
  • Color Scheme allows you to pick the visualization color scheme. This allow you to pick a more personally pleasing visualization. But even more important, I find that every projector/beamer projects the same color differently. By trying the various color schemes, you should be able to find one that works well. Using the default color scheme of Multi, parent cursors are blue-ish, child cursors are brown-ish, tables/views are yellow-ish, and the beginning and ending of LC hash chains are dark green.
Using the Tool - First Time

This is what you will see when you first run the tool. (It's the same image as the one shown at the very top.)  If you have messed with the settings, just click on the Preset Intro 1.
Notice that each object type has a distinct color and abbreviation. By default, parent cursors are brown-ish and labeled as "CSR". Child cursors are brown-ish and labeled as "CCSR". Tables are yellow-ish and labeled as "TBL". Also notice the all the cursor and table related settings match what is being displayed.

There are two key items I want to highlight.

First is the Table/View Share % is set to zero. Notice in the above visualization no table is shared with another child cursor. Below is the result with sharing set to 100%. You can do this yourself; set the Preset to Custom and then click on the Table/View Share % 100 option. To get a good looking image, on my screen I also had to set the Circle Size to 0.15.
In the above image the tables are 100% shared. Because there are three child cursors (CCSR1,...CCSR3), each table node has three links.

The second item I would like to highlight is the LC hash chains are not shown in the above image. This is because the To Plot option is set to Objects. If I click on LC Chains, as the image below shows, only the LC chains and their contents is displayed. On my screen, I also increased the Circle Size back to 0.3.
Interestingly, as a library cache dump indicates and visualized above, the child cursors are not directly on a hash chain. However, tables referenced in a parent cursor and and the parent cursors are directly associated with a hash chain. Referencing the image above, if a change was made to table TBL 2, Oracle would end up hashing to LC chain 2 (LC 2B, the "B" is for beginning and the "E" is for ending) and start sequentially scanning looking for the handle. TBL 2 is the first entry on LC chain 2. Oracle would then invalidate the table, follow the reference from the Table 2 entry to its associated child cursors (CCSR1, CCSR2, CCSR3) and then also to the cursors (CSR 1, CSR 2, CSR3)...invalidating them all!

In a real production Oracle system, most LC chains are only one object in length. So the above image would never, we hope, actually occur. While not shown, if you increase the number of LC hash chains to a few more than the number of objects you will get a truer looking abstraction.

Displaying the LC chains and the other objects separately is a wonderful way to learn about LC structures. But we miss out on understanding the complexity. The image blow has the To Plot set to Full, thereby displaying all the objects with their associated links. For my screen also changing the Circle Size back to 0.15 looked the best.
Because there are only a few objects in this visualization we can pretty much follow all the links. But even now, it's tricky. Plus they can easily be overlapping links that are not visible from a two dimensional perspective. Changing the Orientation (located at bottom of the tool) can sometimes help when there are overlapping lines. The image below has the Visualization set to 3D plus I rotated the image and zoomed in so get a very nice image.
The objects are color coded, so with a limited number of objects we can see exactly what's going on. The green sphere's are the beginning and ending points of the LC chains. Using the default Color Scheme, Multi cursors are white-ish, child cursors are brown-ish, and tables/views are yellow-ish. For three dimensional images you may want to change the default Color Scheme.

You will notice even with thousands of nodes, Mathematica typically plots the LC chain begin and end points relatively far away from the mass of nodes. And when the number of nodes increases as well as the number of LC chains (remember most LC chains have a length of only one) the image looks like an exploding star.

Using the Tool - More Complexity

For this visualization select the PresetIntro 2. This is a more realistic, though still highly abstracted, LC visualization. Immediately your eyes will pick up on a number of things.
With more LC chains their beginning and end points look like pins in a pin cushion! Having many short LC chains enable very fast searches. The dense center is due to intense table/view sharing and to a lessor extent because there are still fewer LC chains than tables and child cursors, so the chains will be longer than zero or one.

Now change the Preset to Custom, (Don't forget to do this or your changes will be undone.) and change the Table/View Share % from 100 to 25, then to 50, then 75, and back to 100. Notice that as sharing increases so is the linking singular intensity. While sharing resources can save memory, the relationship establishment and maintenance dramatically increases (think: CPU consumption and serialization control). Also, the likelihood of a hot spots affecting overall performance increases, whereas when the sharing is minimal, widespread hotspot impact is less likely. Below is the image with Table/View Share % set to 50 and I also zoomed into the image and rotated it until I was satisfied with the image.
The image below I simply zoomed into the center looking over the shoulder of one of the clusters. ...and we wonder why its common to experience LC cache latch or mutex contention?...
The image below shows plotting (set To Plot) only the LC chains with the Visualization set to Points. While the 2D visualization is much more interesting, I'm now trying to convey that the hash chain length is still on average more than one (actually more like 2+) and I'm trying to get all this in a small area, hence the use of the Points visualization.
If you where to click on the Objects or Full To Plot option, the image will be very crowded and pretty much worthless. When you see this happen, it's time to think about a 3D view or reduce the scale. Below is the visualization with the Table/View Share % of 50, a To Plot of Full and the Visualization set to Points. Not real useful but the symmetry is beautiful!
To made the visualization more meaningful, I set the Visualization to 2D, reduced the Scale % to 10 and set the Circle Size to 0.15. This gives me a scaled down version of the LC. Notice that you can see the beginning of the two clusters that were very apparent in the non-scaled 3D visualization. Pretty cool, eh?
I think you get the idea! But what about real production system?

Using the Tool - Production Systems

Now that you are familiar with the tool, the next step everyone wants is to plug their real data into the model! Please remember the tool only shows some of the objects in the library cache and abstractions are made...but still we can learn a lot plus gain a much higher respect for Oracle's shared pool and library cache...not to mention the kernel developers and architects.

Get some real data

The first thing we need to do is extract some key information about the current state of the library cache. As an example, after I dumped the LC as demonstrated above (way above), I then run a simple shell script which parses through the file and summarizes the elements we need. You can download and see this shell script here. Here is an example of what you could see just after an 11g instance recycle.
$ getLCinfo.sh prod18_ora_21939.trc

Library Cache Dump Summary for Oracle 11g

--------------------------------------------
Procs & Funcs                      : 60
Parent Cursors                     : 23802
       w/ child cursors            : 13870
       wo/child cursors            : 9932
Child Cursors                      : 23579
              per parent (w/csr)   : 1.70
Unique Tables & Views              : 2970
--------------------------------------------

Enter the data into the tool

The first thing you should do is set the Preset to Custom, then set the Scaling % to 0.01, and finally set the Visualization to Points. A large 3D visualization could takes minutes to render. Do yourself a favor and start with a very scaled down and simple visualization and then start making changes. Now you can carefully enter the numeric values.

The only value we don't have a production value for is the Table/View Share %. Without a more complex gathering script, you'll need to make an educated guess. A good general guess would be around 25%.  A large system with only a few tables will approach 100%. A system with thousands and thousands of tables and views will approach 10%. These are simply my best guesses. It's very interesting to increase the shared percentage and see how dramatically the complexity increases!

We also need a value for the number of LC hash chains. You can determine this by looking near the top of the LC dump trace file. If Oracle tries to default the number of LC hash chains to have an average length between zero and one. Even on a very small system, the default number of LC hash can be over 100000. To reduce visualization clutter, I usually set the number of LC hash chains to the sum of child cursors and tables/views. This will ensure the default chain length is one. If this produces too much clutter, just reduce the value while knowing on a real system there are many, many chains with a length of zero. For this example, I entered 17000 LC hash chains.

The image below is the above data carefully entered along with the recommended initial settings. You could have also selected a Preset of Prod 1. It's not too exciting, but our goal is to get something visual and get it quickly. Now we can alter the visually related parameters to get a more useful and better looking visual.
One of the changes I like to make is slowly increase the Scale % and watch the complexity increase. One way to do this is to click on the big "+" sign in the Scale % control box. On my system with this visualization and clicking on the play icon, the visualization essentially locks up Mathematica...so be careful. To get the below images, I repeatedly clicked on the "+" icon and then to decreasing the scale I clicked on the "-" icon. Below is the series of images, each with an increasing scale percentage.
Using The Tool - Myself

For myself, I use the tool mainly while teaching and when working with my customers (that is, my consulting work). I find that creating a visual model of a real system or even a simple example quickly builds a conceptual framework of the LC architecture. Then I can focus on the particular aspect I'm interested in; LC latch/mutex contention, parsing, share pool memory management, hashing, etc.

Using the Tool - Now it's Your Turn

You have access to a powerful library cache visualization tool. Now it's your turn to begin exploring. My recommendation is to first use the provided presets and then customize the control settings. Then gather real production data from your system and begin using the tool to gain insights and increase your communication prowess...after all, like I mentioned at the start of this blog entry, I think what separates Oracle DBAs is their ability to communicate.

Thanks for reading and I hope you find this tool immensely gratifying!

Craig.



P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub.com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email.

Tuesday, January 11, 2011

When Is V$SQLSTATS Refreshed?

Collecting performance data can sometimes be very simple. But at other times it can be a nightmare. Not only is there the classic possibility of the collection impacting the system your monitoring, but multiple sessions can be involved plus the differences in timing can be microseconds. And the closer you get to the database kernel, the trickier it gets. This past week I came across what I initially thought would be a very simple collection question, "When is an entry made into v$sqlstats?" It turns out, it's more complicated then I anticipated and I thought you might be interested in the answers plus the scripts I used to perform the experiments.

What's So Complicated?

Ask yourself these questions regarding when an entry is made into v$sqlstats.

When a statement has started running?
When a statement completes?
As a statement is running?
Are all statistics updated at the same time?
Is it the same for procedures, functions, anonymous PL/SQL blocks, standard SQL entered in SQL*Plus?

What initially seemed to be a very simple question, turned out to be much more complicated. The only way to answer these questions was to set up an experiment... so that's what I did.

The Experimental Design

The general data collection strategy I used requires two users; monitoring and application. There is actually a third, the data repository user which, I used system.

I created a single text file that contains an short introduction, all the code, explanations how to run the experiment yourself, and the results I will detail below. This single file can be downloaded by clicking here.

The monitoring user continuously checks the v$sqlstats view for a change in the specific SQL we're looking for. By default the change check is in a very tight loop with no delay. Through the variable troll_delay_in you can insert a delay. Actually, I query from v$sqlstats's underlying fixed view x$kkssqlstat because this is lower overhead and I was already using the x$ for some other experiments, so it ended up this way. In both my experiments and in this blog, when I reference v$sqlstats I am also referring to its underlying x$ fixed view. The experimental results are inserted into the op_results_raw table, which I created within the system schema. (I feel uncomfortable about creating objects in the sys schema.) An important aspect of the experimental design is all result entries are timestamped. It is the timestamped entries that enable us to see what occurred and in what order. As you'll see, this strategy works very nicely.

The application user runs the SQL that we closely monitor. I have four tests; basic select, a procedure with the same select, a procedure that runs the select twice, and a procedure that calls another procedure. As with the monitored user, the application user inserts the results into the op_results_raw table owned by the user system.

The SYSTEM user owns the results table, op_results_raw. It probably would have been cleaner to have the application user be the owner, but for now it is what it is.

One of the keys to make this work is clearly identifying the application SQL. To do this we need its sql_id and plan_hash_value columns. While I'm not displaying all the actual experimental code (but I provided the download link above), I think this is important and versatile enough to post. So here it is:
def the_test_sql=" * from dual"

select &the_test_sql;
select sql_id,plan_hash_value,substr(sql_text,1,60) the_sql
from v$sqlstats
where sql_id = (select prev_sql_id from v$session where sid=(select sid from v$mystat where rownum=1));
Notice that immediately after I run the above application SQL, I query from the v$sqlstats and v$session looking for the "just run" sql_id and plan_hash_value. This works wonderfully! Here is the output on my test environment.
SQL_ID        PLAN_HASH_VALUE THE_SQL
------------- --------------- ----------------------------------------
3fbqsb5wsumj4       272002086 select  * from dual
Make sure you have serveroutput set to off when you run this or it will not do what we want.

My test server was running a single CPU with four cores on Linux with Oracle Release 11.2.0.1. It is a small database but large enough for this experiment.

As mentioned above, the experimental results from both the monitoring and the application user are inserted into the op_results_raw table. Here is an application user code snippet. For readability, I took out a bunch of lines from the middle. Obviously, before the below code is run the variable the_test_sql must be set.
def results_owner_in=system
insert into &results_owner_in..op_results_raw values (systimestamp,'Started with SQL execution test.');
insert into &results_owner_in..op_results_raw values (systimestamp,'SQL start...');
select &the_test_sql;
insert into &results_owner_in..op_results_raw values (systimestamp,'SQL end...');
exec dbms_lock.sleep(7);
insert into &results_owner_in..op_results_raw values (systimestamp,'SQL start...');
select &the_test_sql;
insert into &results_owner_in..op_results_raw values (systimestamp,'SQL end...');
insert into &results_owner_in..op_results_raw values (systimestamp,'Finished with SQL execution test.');
commit;
To help me interpret the results, I insert various timing markers, such as Started the SQL execution test. I also want to know, for the monitoring and application user, when their duties started and ended. I wanted to ensure my monitoring started before and ended after the application testing. I also inserted a relatively long timing gap between the repeated SQL executions. In the example above, I used seven seconds. This makes understanding and interpreting the results much easier because the common events are clustered by time.

While I will detail the actual experimental results, so you can see how I reported the results here is a snippet. Note: For readability I replace the actual sql_id and plan_hash_value with 7,7. I did this kind of thing throughout this blog posting.
SQL> l
  1  select to_char(mark_time,'HH24:MI:SS.FF4') Mark,
  2         happening
  3  from   op_results_raw
  4* order by mark_time
SQL> /

MARK           HAPPENING
-------------- -----------------------------------------------------------------
13:59:20.9452  Started v$sqlstats collection
13:59:21.0485  v$sqlstats change for 7,7 exec=20 dskrds=6736027 lio=922219
13:59:26.4489  Started with SQL execution test.
13:59:26.4513  SQL start...
13:59:26.4519  v$sqlstats change for 7,7 exec=20 dskrds=6736027 lio=922219
13:59:26.4522  v$sqlstats change for 7,7 exec=21 dskrds=6736027 lio=922219
...
Remember, all the code, results, a more detailed explanation is available in a single text file by clicking here.

Results: Simple SQL Statement

The simple SQL statement is:
select /* orapub 3 */ avg(t0.object_id+t1.object_id+t2.object_id) from test1 t0, test1 t1, test1 t2 where t0.object_id=t1.object_id and t1.object_id=t2.object_id
On my system its sql_id is 7un4pb76n3sqc and plan_hash_value is 772987886 which, for readability were replaced below with 7,7. Here is the beginning of the output:
MARK           HAPPENING
-------------- ----------------------------------------------------------------
13:59:20.9452  Started v$sqlstats collection
13:59:21.0485  v$sqlstats change for 7,7 exec=20 dskrds=6736027 lio=922219
13:59:26.4489  Started with SQL execution test.
13:59:26.4513  SQL start...
13:59:26.4519  v$sqlstats change for 7,7 exec=20 dskrds=6736027 lio=922219
13:59:26.4522  v$sqlstats change for 7,7 exec=21 dskrds=6736027 lio=922219
13:59:28.4311  v$sqlstats change for 7,7 exec=21 dskrds=6759000 lio=964528
13:59:30.4383  v$sqlstats change for 7,7 exec=21 dskrds=6768115 lio=968314
...
20.9452: The statistics collection was enabled before anything else started.
21.0485: The collection has never seen the statement before, so it thinks the v$sqlstats entry has just been inserted or refreshed. This is incorrect, so disregard this entry.
26.4489: There was a clear ~5 second gap. This is how long it took me to copy and paste the application use code into its sqlplus session...without making a mistake. In the Experimental Design section above, you can see the actual SQL that caused this entry.
26.4513: This application user entry was made just before the simple SQL was run.
26.4519: The SQL statement started so an entry is made into v$sqlstats (actually the row refreshed, that is, updated) and our monitoring user detects this, pulls from v$sqlstats, and records some basic information that we see above. Notice that compared to the 21.0485 entry, there is no change shown to trigger this entry. However, the monitoring user did in fact detect a change by comparing the previous and current last_update_change column.  That column must have changed but the execution column had not yet been changed, or at least we don't see the change in the v$ fixed view. There could be many reasons for this, such as there is no read consistency with performance data. But notice the next entry.
26.4552: In less then 2 1000ths of a second we detect another change and we can see the number of executions has increased. I suspect we caught the v$ fixed view in the middle of an update...remember there is no read consistency in performance views! So it appears the execution column is updated when the statement begins.
28.4311: It has been about two seconds since the last v$ entry so another row is inserted into the v$ table and we see the results here. In this case both the disk reads and logical buffers reads have increased, but the number of executions has not changed... because the number of executions changed when the statement started.
30.4383: It's been about two seconds...so we get another entry into the v$ table. This will go one and on until the statement completes.
...
13:59:56.5887 v$sqlstats change for 7,7 exec=21 dskrds=7037188 lio=968314 13:59:58.5933 v$sqlstats change for 7,7 exec=21 dskrds=7059028 lio=968314 14:00:00.5970  v$sqlstats change for 7,7 exec=21 dskrds=7083143 lio=968314
14:00:00.8328  SQL end...
14:00:07.8464  SQL start...
14:00:07.8472  v$sqlstats change for 7,7 exec=22 dskrds=7085215 lio=968314
14:00:09.6418  v$sqlstats change for 7,7 exec=22 dskrds=7111639 lio=1014409
14:00:11.6577  v$sqlstats change for 7,7 exec=22 dskrds=7119745 lio=1014409
...
I am not showing all the entries and have jumped to the last few before the statement finishes. Directly above are the final entries for the above execution (SQL end...) and then you will notice the next execution will begin (SQL start...). Here is an explanation:

56.5887: The statement is still running and the normal two second insert into v$sqlstats occurred, detected by our monitoring user, and recorded in our results table.
58.5933: Same as with 56.887.
00.5970: The statement completes and immediately the v$sqlstats row is refreshed. Notice the v$sqlstats row did not refresh in the usual two second cycle. As soon as the statement completed, the row was refreshed, detected by our monitoring user, and recorded in our results table.
00.8328: The application user inserts a row telling us our simple SQL statement has indeed finished. Notice the monitoring user has no idea this has occurred, hence there is v$sqlstats entry.
07.8464. About seven seconds later(!!) the simple SQL statement is about to be run once again.
07.8472. The statement begins and the monitoring user detects a refreshed row in v$sqlstats. Notice the execution count has been incremented to 22. Once again, for this SQL statement, the execution is incremented at the start of the SQL statement. Also, the v$sqlstats entry does not match the two second pattern. Again, it appears when a statement starts a row is immediately refreshed in v$sqlstats. We can expect the next v$sqlstats refresh to occur around the 09.8472 time.
09.6418. As expected about two seconds later, the next v$sqlstats refresh is made because the SQL statement is active. We can see that some of activity is related to the increase in disk reads.
011.6577. ...same as the above entry...
...

From this experiment I learned:
  1. Upon a select statement start, the v$sqlstats row is either inserted or refreshed (if it already exists).
  2. The execution count is incremented when a select begins.
  3. Statistics are refreshed in v$sqlstats about every two seconds, except when the statement begins and ends, in which a change immediately occurs.
A word of caution. This test was specifically for a select statement. While I do additional experiments related to procedures, I did not do experiment with inserts, updates, delete, or DDL. While I suspect DML will follow this pattern, I have no experimental evidence to back this up... there are other things I need to attend to...

Results: Procedure With Simple SQL Statement

For the next experiment, I created a simple procedure containing the simple SQL statement. Here it is:
create or replace procedure op_test4
as
begin
declare
   nothingness number;
begin
   select /* orapub 4 */ avg(t0.object_id+t1.object_id+t2.object_id)
   into   nothingness
   from   test1 t0, test1 t1, test1 t2
   where  t0.object_id=t1.object_id and t1.object_id=t2.object_id;
end;
end;
/
Here is the application user script, with the middle parts removed, I copy and pasted into the application user's sqlplus session. The prc_name variable was set to op_test4.
def results_owner_in=system
insert into &results_owner_in..op_results_raw values (systimestamp,'Started with PRC execution test.');
insert into &results_owner_in..op_results_raw values (systimestamp,'PRC start...');
exec &prc_name;
insert into &results_owner_in..op_results_raw values (systimestamp,'PRC end...');
exec dbms_lock.sleep(7);
...
insert into &results_owner_in..op_results_raw values (systimestamp,'PRC start...');
exec &prc_name;
insert into &results_owner_in..op_results_raw values (systimestamp,'PRC end...');
insert into &results_owner_in..op_results_raw values (systimestamp,'Finished with PRC execution test.');
commit;
For readability, I replaced the sql_id with abc. I did not change the plan_hash_value. It was indeed 0 and was 0 for all procedures I have investigated. I'm going to comment on briefly on the results.
MARK           HAPPENING
-------------- ---------------------------------------------------------------
17:20:13.7554  Started v$sqlstats collection
17:20:13.7887  v$sqlstats change for abc,0 exec=3 dskrds=760213 lio=124219
17:20:17.6207  Started with PRC execution test.
17:20:17.8577  PRC start...
17:20:17.8584  v$sqlstats change for abc,0 exec=3 dskrds=760213 lio=124219
17:20:19.4002  v$sqlstats change for abc,0 exec=3 dskrds=782882 lio=146913
...
13.7554: The monitoring user has begun looking for change in v$sqlstats.
13.7887: Since we just started monitoring and have never seen an entry in v$sqlstats for this sql_id and plan_hash_value, our monitoring user detects this "change" and records the event. Again, disregard this entry.
17.6207: About four seconds after monitoring begins the procedure test is about to begin. Again, this time lag is due to me copying and pasting the application code (shown in part above) into the application user's session.
17.8577: The procedure is about to begin.
17.8584: Immediately an entry is made into v$sqlstats and the monitoring user detects this and records a row in our results table. Notice there has been no change in the execution count. It was 3 before we started the procedure and it is still three!
19.4002: Two seconds later, as expected since the procedure is running, the v$sqlstats row is refreshed and the monitoring user detects the change. In this case we can see there is an increase in both physical disk reads and logical IO. Notice the execution count has still not changed!
...
17:20:47.5802  v$sqlstats change for abc,0 exec=3 dskrds=1071028 lio=149206
17:20:49.5903  v$sqlstats change for abc,0 exec=3 dskrds=1100540 lio=149206
17:20:50.0857  v$sqlstats change for abc,0 exec=4 dskrds=1109395 lio=149206
17:20:50.1243  PRC end...
17:20:57.1721  PRC start...
17:20:57.1727  v$sqlstats change for abc,0 exec=4 dskrds=1109395 lio=149206
17:20:58.6511  v$sqlstats change for abc,0 exec=4 dskrds=1131141 lio=170985
17:21:00.6632  v$sqlstats change for abc,0 exec=4 dskrds=1143823 lio=174193
...
47.5802: The procedure is still running and the still consuming computing resources.
49.5903: Two seconds later, as expected since the procedure is running, the v$sqlstats  row is refreshed, the monitoring user detects the change, and inserts a row into the results table.
50.0857: Before the next two second cycle occurs, the procedure finishes, v$sqlstats immediately refreshes the row, and our monitoring user detects the change and stores the results. Notice the execution count is finally incremented! So in this test, the execution count was incremented when the procedure completed not when the procedure started.
50.1243: The application user posts that the procedure has ended.
57.1721: After our specified seven second delay another identical procedure execution is about to begin.
57.1727: It does begin, the v$sqlstats row is refreshed, the monitoring user detects and records the change. Notice the execution count did not increment.
58.6511: Nearly two seconds later, but strangely closer to 1.5 seconds later, the v$sqlstats row is refreshed.
00.6632: Two seconds later, the v$sqlstats row is again refreshed.
...
17:21:28.8233  v$sqlstats change for abc,0 exec=4 dskrds=1433791 lio=174193
17:21:30.8343  v$sqlstats change for abc,0 exec=4 dskrds=1453958 lio=174193
17:21:31.1759  v$sqlstats change for abc,0 exec=5 dskrds=1458564 lio=174193
17:21:31.1773  PRC end...
17:21:38.1846  PRC start...
17:21:38.1851  v$sqlstats change for abc,0 exec=5 dskrds=1458564 lio=174193
17:21:39.8759  v$sqlstats change for abc,0 exec=5 dskrds=1482325 lio=197985
17:21:41.8949  v$sqlstats change for abc,0 exec=5 dskrds=1490264 lio=199180
31.1759: Breaking the second cycle because the procedure finished, the v$sqlstats row is refreshed. Notice the execution count is incremented to five.

From this experiment I learned, more like confirmed:
  1. Upon a select statement start, the v$sqlstats row is either inserted or refreshed (if it already exists).
  2. The execution count is incremented when a select begins.
  3. Statistics are refreshed in v$sqlstats about every two seconds, except when the statement begins and ends, in which a change immediately occurs.
Results: Procedure With Two Simple SQL Statements

For the next experiment, I created a simple procedure containing two of the simple SQL statement once. Here it is:
create or replace procedure op_test5
as
begin
declare
 nothingness number;
 i number;
begin
    for i in 1..2
    loop
       select /* orapub 5 */ avg(t0.object_id+t1.object_id+t2.object_id)
       into   nothingness
       from   test1 t0, test1 t1, test1 t2
       where  t0.object_id=t1.object_id and t1.object_id=t2.object_id;
 end loop;
end;
end;
/
The application user script is exactly the same as the previous procedure test. The only difference is the variable prc_name is set to op_test5. For readability, I replaced the sql_id with xyz but did not change the plan_hash_value of zero. Here are the results.
MARK           HAPPENING
-------------- --------------------------------------------------------------
18:50:26.0991  Started v$sqlstats collection
18:50:26.1003  v$sqlstats change for xyz,0 exec=20 dskrds=13208886 lio=1041322
18:50:31.8619  Started with PRC execution test.
18:50:31.8632  PRC start...
18:50:31.8638  v$sqlstats change for xyz,0 exec=20 dskrds=13208886 lio=1041322
18:50:33.1397  v$sqlstats change for xyz,0 exec=20 dskrds=13213295 lio=1045734
18:50:35.1491  v$sqlstats change for xyz,0 exec=20 dskrds=13235541 lio=1066291
...
18:51:35.4911  v$sqlstats change for xyz,0 exec=20 dskrds=13855325 lio=1091260
18:51:37.5069  v$sqlstats change for xyz,0 exec=20 dskrds=13884991 lio=1091260
18:51:38.9183  v$sqlstats change for xyz,0 exec=21 dskrds=13907265 lio=1091260
18:51:38.9194  PRC end...
18:51:46.2494  PRC start...
18:51:46.2513  v$sqlstats change for xyz,0 exec=21 dskrds=13907265 lio=1091260
18:51:47.5500  v$sqlstats change for xyz,0 exec=21 dskrds=13917643 lio=1101648
18:51:49.5649  v$sqlstats change for xyz,0 exec=21 dskrds=13936621 lio=1116247
...
18:52:51.8954  v$sqlstats change for xyz,0 exec=21 dskrds=14586303 lio=1141216
18:52:53.1498  v$sqlstats change for xyz,0 exec=22 dskrds=14605637 lio=1141216
18:52:53.1511  PRC end...
...
18:57:34.2478  v$sqlstats change for xyz,0 exec=25 dskrds=17242358 lio=1341040
18:57:36.2597  v$sqlstats change for xyz,0 exec=25 dskrds=17262000 lio=1341040
18:57:37.8937  v$sqlstats change for xyz,0 exec=26 dskrds=17280340 lio=1341040
18:57:37.8945  PRC end...
18:57:37.8957  Finished with PRC execution test.
18:57:56.1001  Finished v$sqlstats collection
If you have followed the previous two experiments, you should have no problem understanding the above. The same pattern is followed as with the above procedure. While I learned nothing new, it reinforced what I previous learned.

Results: Procedure With Simple SQL and Procedure Call

For the next experiment, I created a simple procedure containing the simple SQL statement and I included a call to another procedure, dbms_random. Here it is:
create or replace procedure op_test6
as
begin
declare
   nothingness number;
begin
   select /* orapub 6a */ avg(t0.object_id+t1.object_id+t2.object_id)
   into   nothingness
   from   test1 t0, test1 t1, test1 t2
   where  t0.object_id=t1.object_id and t1.object_id=t2.object_id;

   select /* orapub 6b */ dbms_random.random into nothingness from dual;
end;
end;
/
The application user script is exactly the same as the previous procedure test. The only difference is the variable prc_name is set to op_test6. For readability, I replaced the sql_id with akb. The results are exactly was with the previous two procedure experiments!
MARK           HAPPENING
-------------- ---------------------------------------------------------------
16:21:57.9886  Started v$sqlstats collection
16:21:57.9894  v$sqlstats change for akb,0 exec=25 dskrds=937907 lio=624657
16:22:05.7840  Started with PRC execution test.
16:22:05.7847  PRC start...
16:22:05.7849  v$sqlstats change for akb,0 exec=25 dskrds=937907 lio=624657
16:22:07.5003  v$sqlstats change for akb,0 exec=25 dskrds=963609 lio=649644
16:22:09.5084  v$sqlstats change for akb,0 exec=25 dskrds=965463 lio=649644
16:22:11.5073  v$sqlstats change for akb,0 exec=25 dskrds=967173 lio=649644
16:22:13.5417  v$sqlstats change for akb,0 exec=25 dskrds=969048 lio=649644
16:22:15.5542  v$sqlstats change for akb,0 exec=25 dskrds=970743 lio=649644
16:22:17.5569  v$sqlstats change for akb,0 exec=25 dskrds=972408 lio=649644
16:22:19.5811  v$sqlstats change for akb,0 exec=25 dskrds=974238 lio=649644
16:22:21.0560  v$sqlstats change for akb,0 exec=26 dskrds=975423 lio=649644
16:22:21.0572  PRC end...
...
16:24:12.4322  Finished with PRC execution test.
16:24:17.9893  Finished v$sqlstats collection
While I learned nothing new, it reinforced what I previous learned...again.

Results: Procedure With Simple SQL and dbms_output

For the next experiment, I created a simple procedure containing the simple SQL statement and I included a call to another procedure, dbms_output. Here it is:
create or replace procedure op_test7
as
begin
declare
   nothingness number;
begin
   select /* orapub 7 */ avg(t0.object_id+t1.object_id+t2.object_id)
   into   nothingness
   from   test1 t0, test1 t1, test1 t2
   where  t0.object_id=t1.object_id and t1.object_id=t2.object_id;

   dbms_output.put_line('orapub 7 put_line');
end;
end;
/
The application user script is exactly the same as the previous three procedure tests. The only difference is the variable prc_name is set to op_test7. The experiment ran in two different versions; the first was with serveroutput off and the other it was set to on. The serveroutput setting had no affect on the execution count and results were exactly the same as the previous procedure experiments.

While I learned nothing new, it strengthened what I previous learned...again.

Conclusions

Sometimes it is very difficult to design an experiment peering into the unknown. In this particular experiment I needed to create a way to understand when and what changed in a v$ view. My solution was to create a tight loop looking for a very specific change in the underlying x$ fixed view and when a change was detected it was timestamped and recorded in the results table. I also had specific application code performing specific actions while recording into the same results table when the application started and stopped its activity. A simple results table query enabled me to study the timing of events, which easily brought to light, in this specific set of experiments, when v$sqlstats is changed and what is changed. As you will see in subsequent blog posts, it was very important for me understand this.

What I learned specifically about v$sqlstats timing:
  1. Upon the starting and ending of an anonymous SQL statement or a procedure, the v$sqlstats row is immediately refreshed if it exists or inserted if it does not exist.
  2. Besides the beginning and ending refresh/insertion, about every two seconds the v$sqlstats row is refreshed with the latest statistics.
  3. For a select statement the execution count is incremented when the select begins.
  4. For a procedure, even when calling other procedures, the execution count is incremented when the main procedure ends.
Thanks for reading!

Craig.


P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email. 

Tuesday, January 4, 2011

IO Read Wait Occurrence Mismatch - Part 2

In Part 1 of this series, it was clearly demonstrated the number of IO read requests is not likely to equal the number of wait event IO read occurrences. This concerned me.

My primary concern centered with my systematic quantitative performance analysis methods. I always quantify my analysis so I have numbers to back up my recommendations. This also allows me to anticipate and compare the impact of various performance solutions. If the system under examination is heavily read IO centric, the IO read wait time is a key input parameter in my analysis and therefore, I need to have confidence in the read IO wait time. This "mismatch" concern lead to me to research the various IO read related instance statistics (which I have not blogged about) and the relationship between IO block reads, IO requests, and read wait occurrences.

What I discovered is that it is not a simple task to determine the number of physical IO read requests for each classification (e.g., direct path read requests associated with temporary segments). What began as a simple quest to predict IO read wait time without depending on read related wait occurrences ended up being a major research project that is still not completed...but I digress...

Simply put, Oracle blocks are physically read by IO requests which, may result in an IO read wait occurrence.

If you don't have the time (or desire) to read this entire blog entry, here is a summary of this blog entry:
  1. An IO read wait occurrence is different than a read request. A request is truly a request from Oracle to the IO subsystem. A read wait occurrence should only occur when the process must wait for blocks so it can continue processing.
  2. The request to occurrence mismatch is to be expected when asynchronous IO (async IO) or something similar is active. Greater async IO activity is likely to increase the difference between read wait occurrences and read requests.
  3. It appears total IO read wait time sourced entirely from the wait interface is fine, but don't count on it being perfect.

Requests vs Occurrence

In Part 1 of this series I demonstrated the number of Oracle IO read requests is very unlikely to equal the number of IO read wait occurances. I believe there are two core possibilities and I wouldn't be surprised if there are others. First, it is because the existance of asyncrounous IO. Second, Oracle does not guarentee all IO waits are instrumented. Let's take a look at each.

Asynchronous IO.

Asyncronous IO allows a process/thread to submit a single multiblock request and start processing the results as soon as some of the results become available. In contrast, when a process issues a syncrounous IO request it cannot process the results until all the requested blocks have been returned. As reported in my previous blog entry (Part 1), whether a synchronous or an asynchronous request, it appears that Oracle tracks each IO request correctly and makes the number of requests available through the physical read total IO requests instance statistic.

It can get more complicated than this. I'm not an IO operating system expert, but I can see that even without async IO technically in use, a process with multiple threads each making synchronous IO read calls will effectively be doing asynchronous IO. I like to keep in mind that Oracle kernel developers and the porting groups are always looking for ways to increase IO performance and will use whatever technique they can find to reach their goals. I have also noticed that in more recent Oracle versions, there are more IO related system calls.

A read wait occurrence is different than an IO read request. Oracle should not register a wait occurrence (along with the associated time) unless the process is truly waiting. For a synchronous IO this is easy to understand; the process asks the time, makes the synchronous IO request, receives all the blocks, and asks for the time gain. In this synchronous scenario, we can see the process retrieved multiple blocks, and is able to both count the wait occurrence and to time the IO request, that is, the wait time.

For an asynchronous IO call things become more complicated. (Remember that an Oracle process is either consuming CPU or waiting for something.) After the async IO multiblock call, as long as Oracle is busy processing the results, it does not post a wait event. Why? Because it is not waiting on IO. It is consuming CPU processing the IO is has already received. However, at anytime before all the blocks have been returned and the Oracle process has completed processing all the IO it has available, then the process will post an IO read wait occurrence and record the time.

The following statement should make perfect sense: In a synchronous world the number of IO read waits equals the number of IO read requests. But in a mixed sync and async environment, the number of IO read requests can exceed the number of IO read wait occurrences.

So is there truly a wait occurrence mismatch? Yes and in most situations it should be this way. The word request is purely an IO read request. The word occurrence refers to a wait occurrence. A more correct blog title would have been, "IO Read Request and Read Wait Occurrence Mismatch."

Oracle's Non-Instrumentation.

Oracle does not guarantee all waits are instrumented. Not that Oracle wants it that they, but perhaps a kernel developer made a mistake or for some reason choose to not instrument the call. Instrumentation consumes resources, so ironically performance may be a motivating factor for not instrumenting. Interestingly, in 11g Oracle added a wait event named, "unspecified wait event" and you may see some time associated with it...not a great way to build confidence in the the wait interface! So it is reasonable to assume that not all IO read wait time is accounted for.

Direct Reads vs Cached Reads

Most DBAs will tell you async IO is more active when direct reads are heavily used. I have not tested this but have no information to contradict the assertion. [Background: Direct reads do not place the block(s) into the buffer cache before they are processed. This can enable faster per process IO throughput but the blocks are not shared (no cache IO) with other processes which can force repeat IO calls (physical IO) thereby slowing block throughput (physical IO + cache IO). So Oracle must choose the access strategy wisely.]

The table below contains actual Oracle performance statistics from four different systems. Here is a quick explanation of each column.

  1. Sample. Simply my internal identifier for the data set.
  2. WI Direct Read Occurs. The sum of v$system_event.total_waits for all IO direct read events.
  3. WI Cache Read Occurs. The sum of v$system_event.total_waits for all IO non-direct read events.
  4. Total Occurs = WI Direct (2) + WI Cache (3) Read Occurs
  5. Percent Direct Occurs = WI Direct (2)  /  Total Occurs (4)
  6. Instance Statistic, Total Read IO Requests. The value of v$sysstat, physical read total IO requests.
  7. Occurs/Requests = Total Occurs (4)  /  Total Read IO Requests (6)
All but the final sample are based on internal of activity from either a Statspack or AWR report. The final sample (Interactive) contains data since the instance had started.

When there is little direct IO (col 2 / col 3 = col 5), we might expect to see the number of read wait occurrences (col 4) close to the number of IO requests (col 6).

The most startling example is the SIG sample. In this production system, there is virtually no direct reads (col 2) yet there is a significant difference between the total read wait event occurrences (col 4) compared to the total read requests (col 6). In fact, the read wait occurrences (column 4) only account for 67% (col 7) of all IO read requests (col 6).

These samples demonstrate that async IO can be active in both direct read and cache read environments.

Total IO Read Wait Time

It appears the most accurate way to gather the total IO read wait time is through the wait interface (session, group of sessions, instance, and system) and of course, by tracing a process. I am very pleased about this as it means I do not need to create a predictive algorithm based on input from both the wait interface and the instance statistics. (As I mentioned at the top, this seemingly simple exercise is extremely completed.) Instead, I simply query from the wait interface as I have always been doing.

Remember there are multiple read wait events. Here is a list that, at this time, I believe contains all read IO wait events: db file sequential read, db file scattered read, direct path read, direct path read temp, db file parallel read, control file sequential read, and log file sequential read.

Concluding Thoughts

Like I wrote in Part One of this series, Oracle has done a great job instrumenting its kernel code. Such a great job in fact, that most performance analysts rely heavily on the data it provides. But that doesn't mean we shouldn't challenge what it is telling us if the supporting evidence shows otherwise. In my case, I got sloppy with the words request and occurrence and it lead me down a very interesting path that ended up bringing me back to the wait interface. I hope I have clearly shown that an IO read request does not always result in an occurrence of a read wait event. For me personally, my confidence in Oracle's wait interface is back up to where it once was; great but not perfect.

As I mentioned in my "P.S." below, I don't check for comments on my blog, so when someone emails me about an entry enabling wonderful dialog, like Jonathan Lewis did regarding Part 1 of this series, I appreciate it very much.

Thanks for reading!

Craig.


P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub .com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email.