Monday, September 17, 2012

How to create LOTS of child cursors.

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

The Back Story

With all the blog postings and forum mishegas about mutex issues, bind mismatch, and child cursors, you would think it would be a simple thing to create hundreds if not thousands of child cursors. Well... it wasn't.

My original intention for this posting was to gather experimental evidence about the performance impact of having hundreds or thousands of child cursors. While it was easy to create a few child cursors (details below), it was difficult to create hundreds of child cursors. I'm also noticing what many DBAs think what is a "bind mismatch" is really not a bind mismatch. So I figured it was worth a separate posting about how to create child cursors using different methods and also how to easily create lots of child cursors.

Why So Difficult?

It was actually very frustrating. Sure it's easy to create a few child cursors. But it was not easy to create hundreds or to cause the infamous "bind mismatch." Why?

I suspect it's a combination of using current versions of Perl's DBI, PLSQL, adaptive cursor sharing, and more current Oracle releases. Oracle knows having hundreds or thousands of child cursors can cause severe performance issues, so I suspect (perhaps hope) they have been and are still working on ways to reduce the number of child cursors.

Plus I'm not using Java in my experiments. When I have talked with DBAs who have personally experienced this mutex contention related to hundreds of child cursors, sloppy Java coding always seems to be at the center.

Seeing What's Happening

This entry is centered on the shared pool's library cache, cursors, and child cursors. For a good introduction including jaw-dropping visuals, check out this blog posting. Every one of our production SQL statements has a related cursor consuming memory in the library cache along with at least one child cursor. Here is a simple example of how you can check this out.
SQL> ALTER SYSTEM FLUSH SHARED_POOL;

System altered.

SQL> select count(*) from dba_objects where object_id=123;

  COUNT(*)
----------
         1

SQL> select sql_id, plan_hash_value, child_number, executions
  2  from   v$sql
  3  where  sql_text like 'select count%%object_id=123';

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS
------------- --------------- ------------ ----------
bx8xqbbk0q4hh      3299497030            0          1
Let's create another child cursor... it's easy: change the optimizer mode and then re-run the exact same SQL statement.
SQL> alter session set optimizer_mode=first_rows_1;

Session altered.

SQL> select count(*) from dba_objects where object_id=123;

  COUNT(*)
----------
         1

SQL> select sql_id, plan_hash_value, child_number, executions
  2  from   v$sql
  3  where  sql_text like 'select count%%object_id=123';

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS
------------- --------------- ------------ ----------
bx8xqbbk0q4hh      3299497030            0          1
bx8xqbbk0q4hh      3299497030            1          1
But why was another child cursor created? Let's ask Oracle. Each SQL ID child cursor has a row in v$sql_shared_cursor. For each cursor other than the initial (child number 0), one of the 63 possible flags will be set to a Y.

For example, let's find out, by the way of v$sql_shared_cursor, why the above example SQL statement has two cursors.
SQL> select *
  2  from   v$sql_shared_cursor
  3  where  sql_id='bx8xqbbk0q4hh';

SQL_ID        ADDRESS          CHILD_ADDRESS    CHILD_NUMBER U S O O S L F E B P I S T A B D L T B I I R L I O E M U T N F
------------- ---------------- ---------------- ------------ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
A I T D L D B P C S C P T M B M R O P M F L P L A F L R L H P B
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
bx8xqbbk0q4hh 00000002B3A6BE50 00000002B3AAF320            0 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N
N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N

bx8xqbbk0q4hh 00000002B3A6BE50 00000002B393A718            1 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N
N N N N N N N N N N N N N N N N N Y N N N N N N N N N N N N N N
Notice the first child cursor (0) is all N's. That's because you will always have a child cursor for our SQL statements. (I wrote "our" because it is common for LC objects to not have a child cursor. If you do a library cache dump, you'll notice this surprising phenomenon.)

If you look closely at child cursor 1, you will notice the "Y" is at the 49th position. What is the 49th position mean? Look at the 49th reason. I added some numbers to the columns to make finding the reason (column name) easier:
SQL> desc v$sql_shared_cursor
 Name                       Type
 -------------------------- -------------
 SQL_ID                     VARCHAR2(13)
 ADDRESS                    RAW(8)
 CHILD_ADDRESS              RAW(8)
 CHILD_NUMBER               NUMBER
1 UNBOUND_CURSOR            VARCHAR2(1)
 SQL_TYPE_MISMATCH          VARCHAR2(1)
 OPTIMIZER_MISMATCH         VARCHAR2(1)
 OUTLINE_MISMATCH           VARCHAR2(1)
 STATS_ROW_MISMATCH         VARCHAR2(1)
 LITERAL_MISMATCH           VARCHAR2(1)
 FORCE_HARD_PARSE           VARCHAR2(1)
 EXPLAIN_PLAN_CURSOR        VARCHAR2(1)
 BUFFERED_DML_MISMATCH      VARCHAR2(1)
10 PDML_ENV_MISMATCH        VARCHAR2(1)
 INST_DRTLD_MISMATCH        VARCHAR2(1)
 SLAVE_QC_MISMATCH          VARCHAR2(1)
 TYPECHECK_MISMATCH         VARCHAR2(1)
 AUTH_CHECK_MISMATCH        VARCHAR2(1)
 BIND_MISMATCH              VARCHAR2(1)
 DESCRIBE_MISMATCH          VARCHAR2(1)
 LANGUAGE_MISMATCH          VARCHAR2(1)
 TRANSLATION_MISMATCH       VARCHAR2(1)
 BIND_EQUIV_FAILURE         VARCHAR2(1)
20 INSUFF_PRIVS             VARCHAR2(1)
 INSUFF_PRIVS_REM           VARCHAR2(1)
 REMOTE_TRANS_MISMATCH      VARCHAR2(1)
 LOGMINER_SESSION_MISMATCH  VARCHAR2(1)
 INCOMP_LTRL_MISMATCH       VARCHAR2(1)
 OVERLAP_TIME_MISMATCH      VARCHAR2(1)
 EDITION_MISMATCH           VARCHAR2(1)
 MV_QUERY_GEN_MISMATCH      VARCHAR2(1)
 USER_BIND_PEEK_MISMATCH    VARCHAR2(1)
 TYPCHK_DEP_MISMATCH        VARCHAR2(1)
30 NO_TRIGGER_MISMATCH      VARCHAR2(1)
 FLASHBACK_CURSOR           VARCHAR2(1)
 ANYDATA_TRANSFORMATION     VARCHAR2(1)
 INCOMPLETE_CURSOR          VARCHAR2(1)
 TOP_LEVEL_RPI_CURSOR       VARCHAR2(1)
 DIFFERENT_LONG_LENGTH      VARCHAR2(1)
 LOGICAL_STANDBY_APPLY      VARCHAR2(1)
 DIFF_CALL_DURN             VARCHAR2(1)
 BIND_UACS_DIFF             VARCHAR2(1)
 PLSQL_CMP_SWITCHS_DIFF     VARCHAR2(1)
40 CURSOR_PARTS_MISMATCH    VARCHAR2(1)
 STB_OBJECT_MISMATCH        VARCHAR2(1)
 CROSSEDITION_TRIGGER_MISMATCH VARCHAR2(1)
 PQ_SLAVE_MISMATCH          VARCHAR2(1)
 TOP_LEVEL_DDL_MISMATCH     VARCHAR2(1)
 MULTI_PX_MISMATCH          VARCHAR2(1)
 BIND_PEEKED_PQ_MISMATCH    VARCHAR2(1)
 MV_REWRITE_MISMATCH        VARCHAR2(1)
 ROLL_INVALID_MISMATCH      VARCHAR2(1)
 OPTIMIZER_MODE_MISMATCH    VARCHAR2(1)
50 PX_MISMATCH              VARCHAR2(1)
 MV_STALEOBJ_MISMATCH       VARCHAR2(1)
 FLASHBACK_TABLE_MISMATCH   VARCHAR2(1)
 LITREP_COMP_MISMATCH       VARCHAR2(1)
 PLSQL_DEBUG                VARCHAR2(1)
 LOAD_OPTIMIZER_STATS       VARCHAR2(1)
 ACL_MISMATCH               VARCHAR2(1)
 FLASHBACK_ARCHIVE_MISMATCH VARCHAR2(1)
 LOCK_USER_SCHEMA_FAILED    VARCHAR2(1)
 REMOTE_MAPPING_MISMATCH    VARCHAR2(1)
60 LOAD_RUNTIME_HEAP_FAILED VARCHAR2(1)
 HASH_MATCH_FAILED          VARCHAR2(1)
 PURGED_CURSOR              VARCHAR2(1)
 BIND_LENGTH_UPGRADEABLE    VARCHAR2(1)
The 49th column is, "OPTIMIZER_MODE_MISMATCH". So the child cursor was created because we set the optimizer mode to something different than when the initial child cursor was created.

For the more hard core DBAs, a library cache dump is even more satisfying!
SQL> alter session set max_dump_file_size=unlimited;

Session altered.

SQL> alter session set events 'immediate trace name library_cache level 10';

Session altered.

SQL> select tracefile from v$process where 
  2  addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));

TRACEFILE
-----------------------------------------------------------------------------
/home/oracle/base/diag/rdbms/prod23/prod23/trace/prod23_ora_23631.trc
Looking in trace file and searching for "object_id=123", I found this (and removed most of the contents):
Bucket: #=70160 Mutex=a8958e00(0, 20, 0, 6)
  LibraryHandle:  Address=b3a6be50 Hash=e40b1210 LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD
    ObjectName:  Name=select count(*) from dba_objects where object_id=123

      FullHashValue=158e40ec5d962fc9bea3b65ae40b1210 Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=3825930768 OwnerIdn=5
    LibraryObject:  Address=9fea1c60 HeapMask=0000-0001-0001 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
      DataBlocks:
        Block:  #='0' name=PCUR^e40b1210 pins=0 Change=NONE
          Heap=b3ae3c38 Pointer=9fea1d48 Extent=9fea1c30 Flags=I/-/P/A/-/-
          FreedLocation=0 Alloc=1.515625 Size=4.000000 LoadTime=8700560730
      ChildTable:  size='16'
        Child:  id='0' Table=9fea2b60 Reference=9fea25b0 Handle=b3aaf320
        Child:  id='1' Table=9fea2b60 Reference=9fea28d0 Handle=b393a718
Do you see the above two child cursor entries? I find seeing this in the LC dump very satisfying. Much more so than looking at a v$ view.

While the "optimizer mode mismatch" is an interesting and very easy way to learn about child cursors, it's not likely to be a performance issue.

The BIND_EQUIV_FAILURE situation

While trying to cause the infamous bind mismatch reason for creating child cursors, I stumbled across the bind equivalent failure situation. This occurs, in part, when a child cursor cursor was created with a column value selectivity of x and now a new column value is supplied with a very different selectivity. For example, suppose there are 99 rows where column c1 equals "F" and 1 row where column C1 equals "M". If the initial query is something like "...where c1=:b1" where :b1 is "F" and then another execution has the bind variable :b1 set to "M" a bind_equiv_failure can result... but not always based on my testing. (Hope that made sense.)

Here's how I caused this:
SQL> set tab off
SQL> set linesize 180
SQL> drop table bogus;
SQL> create table bogus as select * from dba_objects;
SQL> exec DBMS_STATS.GATHER_TABLE_STATS('SYSTEM', 'BOGUS', METHOD_OPT => 'FOR COLUMNS SIZE 3 OBJECT_TYPE');
SQL> select object_type,count(*) from bogus group by object_type order by 2 desc;
OBJECT_TYPE           COUNT(*)
------------------- ----------
SYNONYM                  27800
JAVA CLASS               22920
VIEW                      5074
INDEX                     4156
TABLE                     3084
TYPE                      2783
PACKAGE                   1313
PACKAGE BODY              1253
LOB                        900
JAVA RESOURCE              834
TRIGGER                    613
JAVA DATA                  328
FUNCTION                   303
INDEX PARTITION            300
TYPE BODY                  237
SEQUENCE                   232
LIBRARY                    183
PROCEDURE                  160
TABLE PARTITION            144
OPERATOR                    55
XML SCHEMA                  51
QUEUE                       39
CONSUMER GROUP              25
RULE SET                    23
PROGRAM                     19
JOB                         14
EVALUATION CONTEXT          14
JOB CLASS                   13
CLUSTER                     10
RESOURCE PLAN               10
UNDEFINED                    9
DIRECTORY                    9
INDEXTYPE                    9
WINDOW                       9
CONTEXT                      7
DIMENSION                    5
SCHEDULER GROUP              4
SCHEDULE                     3
MATERIALIZED VIEW            3
JAVA SOURCE                  2
DESTINATION                  2
LOB PARTITION                1
EDITION                      1
RULE                         1

44 rows selected.

SQL> ALTER SYSTEM FLUSH SHARED_POOL;

System altered.

SQL> variable obj_type_var varchar2(100);
SQL> exec :obj_type_var := 'SYNONYM';

PL/SQL procedure successfully completed.

SQL> select count(*), max(object_id) from bogus where object_type = :obj_type_var ;

  COUNT(*) MAX(OBJECT_ID)
---------- --------------
     27800          74718

SQL> select sql_id, plan_hash_value, child_number, executions
  2  from   v$sql
where  sql_text like 'select count%bogus%';

  3  
SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS
------------- --------------- ------------ ----------
b4bbq6byn8dsd      3200977001            0          1
There are 27800 rows where the object_type is "SYNONYM". Now let's try and force the creation of a second child cursor by setting the bind variable :obj_type_var to something where there are no rows! Like this:
SQL> exec :obj_type_var := 1;

PL/SQL procedure successfully completed.

SQL> select count(*), max(object_id) from bogus where object_type = :obj_type_var ;

  COUNT(*) MAX(OBJECT_ID)
---------- --------------
         0

SQL> select sql_id, plan_hash_value, child_number, executions
  2  from   v$sql
  3  where  sql_text like 'select count%bogus%';

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS
------------- --------------- ------------ ----------
b4bbq6byn8dsd      3200977001            0          2
Nope... Didn't happen. Let's try again with the bind variable value of "JOB" with which there are 14 rows.
SQL> exec :obj_type_var := 'JOB';

PL/SQL procedure successfully completed.

SQL> select count(*), max(object_id) from bogus where object_type = :obj_type_var ;

  COUNT(*) MAX(OBJECT_ID)
---------- --------------
        14          57372

SQL> select sql_id, plan_hash_value, child_number, executions
  2  from   v$sql
  3  where  sql_text like 'select count%bogus%';

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS
------------- --------------- ------------ ----------
b4bbq6byn8dsd      3200977001            0          2
b4bbq6byn8dsd      3200977001            1          1
Great! Another child cursor was created. (Interestingly, if I switch the order of the bind variable settings (JOB vs 1), the exact same thing will occur.) Why did Oracle decide to create another the child cursor? Let's check using v$sql_shared_cursor. If you count starting at the N's you'll notice the Y occurs in child cursor 1 at position 19.
SQL> select * from v$sql_shared_cursor where sql_id='b4bbq6byn8dsd';

SQL_ID        ADDRESS          CHILD_ADDRESS    CHILD_NUMBER U S O O S L F E B P I S T A B D L T B I I R L I O E M U T N F A I T D
------------- ---------------- ---------------- ------------ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
L D B P C S C P T M B M R O P M F L P L A F L R L H P B
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
b4bbq6byn8dsd 00000002B7D2F810 00000002B7B98410            0 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N
N N N N N N N N N N N N N N N N N N N Y N N N N N N N N

b4bbq6byn8dsd 00000002B7D2F810 00000002B7C4C210            1 N N N N N N N N N N N N N N N N N N Y N N N N N N N N N N N N N N N N
N N N N N N N N N N N N N N N N N N N N N N N N N N N N
Looking at the describe of v$sql_shared_cursor above, position 19 is "BIND_EQUIV_FAILURE". So, this is yet another way to create child cursors. But still I suspect this is unlikely to cause crazy poor performance.

Since counting N's is actually pretty irritating, here is another simpler query.
SQL> select sql_id, child_number, bind_equiv_failure
  2  from   v$sql_shared_cursor 
  3  where  sql_id='b4bbq6byn8dsd';

SQL_ID       CHILD_NUMBER B
------------- ------------ -
b4bbq6byn8dsd            0 N
b4bbq6byn8dsd            1 Y

The BIND_LENGTH_UPGRADEABLE situation

When trying to force a bind mismatch, I also stumbled across what's called a bind length upgradeable. There is a lot on the internet about this, so I won't get into the details. But I will say this: When a child cursor is created Oracle appears to pre-allocate some internal bind variable value space based on the variable definition. If the existing bind variable (variable b1 varchar2(5)) is redefined (variable b1 varchar2(33)) and the space is significantly larger (and that's what the other blog entries get into), a new child cursor will be created. Interestingly, the trigger lengths are around 1, 33, 129, and 2001. Here is a simple example of this:
SQL> alter system flush shared_pool;

System altered.

SQL> variable b1 varchar2(5);
SQL> exec :b1 := 'abc';

PL/SQL procedure successfully completed.

SQL> update bogus set owner='zzz' where object_type = :b1;

0 rows updated.

SQL> select sql_id, plan_hash_value, child_number, executions
  2  from   v$sql
  3  where  sql_text like 'update bogus set%';

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS
------------- --------------- ------------ ----------
bcbtftw6dur97      3773790677            0          1
Now I will redefine bind variable :b1 but notice the contents is exactly the same ("abc").
SQL> variable b1 varchar2(33);
SQL> exec :b1 := 'abc';

PL/SQL procedure successfully completed.

SQL> update bogus set owner='zzz' where object_type = :b1;

0 rows updated.

SQL> select sql_id, plan_hash_value, child_number, executions
  2  from   v$sql
  3  where  sql_text like 'update bogus set%';

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS
------------- --------------- ------------ ----------
bcbtftw6dur97      3773790677            0          1
bcbtftw6dur97      3773790677            1          1

SQL> select sql_id, child_number, BIND_LENGTH_UPGRADEABLE
  2  from   v$sql_shared_cursor 
  3  where  sql_id='bcbtftw6dur97';

SQL_ID       CHILD_NUMBER B
------------- ------------ -
bcbtftw6dur97            0 N
bcbtftw6dur97            1 Y
I can see this situation perhaps could cause a problem, if you had many columns and different programs with the same exact SQL but defining the variable many different lengths. While this may seem like a potential problem, it's not as bad as you might think. Once a variable size has been increased, resetting it smaller does not appear to shrink it back down. This will eliminate the exponential situation that we'll see in the bind variable mismatch situation below. To prove this non-exponential child cursor growth point, consider the code below:
outfile=doit.sql
rm $outfile

echo "set tab off" >> $outfile
echo "set linesize 130" >> $outfile
echo "ALTER SYSTEM FLUSH SHARED_POOL;" >> $outfile

for b1 in 1 33 129 2001
do
  for b2 in 1 33 129 2001
  do
        echo "variable b1 varchar2($b1);" >> $outfile
        echo "variable b2 varchar2($b2);" >> $outfile
        echo "exec :b1 := 'abc';" >> $outfile
        echo "exec :b2 := 'abc';" >> $outfile
        echo "update bogus set owner='zzz' where object_type = :b1 or object_type = :b2;" >> $outfile
  done
done
echo ""
echo "select sql_id, plan_hash_value, child_number, executions from v\$sql
where sql_text like 'update bogus set%';" >> $outfile
echo "exit;" >> $outfile
echo "------------------------------------------------------"
sqlplus system/manager @$outfile
If you run this, this is what you'll see:
SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS
------------- --------------- ------------ ----------
9kxv8qtnq8mmw      3773790677            0          1
9kxv8qtnq8mmw      3773790677            1          1
9kxv8qtnq8mmw      3773790677            2          1
9kxv8qtnq8mmw      3773790677            3          1
9kxv8qtnq8mmw      3773790677            4          4
9kxv8qtnq8mmw      3773790677            5          4
9kxv8qtnq8mmw      3773790677            6          4

7 rows selected.
Most people would incorrectly assume (as I did) there would be 2^4 or 4^2 child cursors. So it turns out that the bind length upgradable situation is not likely to cause a severe performance problem.

The Bind Mismatch Situation

Over the past few years, there has been a lot of talk about a SQL statement cursor having hundreds or even thousands of child cursors. Why? Because performance comes to a standstill and sometimes the instance has to be restarted. Yes, it's that bad.

Simply stated, a bind variable mismatch occurs when a bind variable is defined one way and then another, but the "another" is not different enough to cause an error yet it is different enough to cause Oracle to create an additional child cursor.

This may seen like an easy situation to cause, but it is not. PLSQL and Perl are very good at changing or adapting a bind variable type to the currently existing definition. What I'm told is that with Java, it's actually very easy to cause the problem. In fact, when I talk with DBAs who have experienced mutex contention because of hundreds of child cursors, sloppy Java coding always seems to be somehow involved.

But why a performance problem?

To understand why a mismatch can cause severe performance problem, we need to understand what the server process is doing. To parse a SQL statement the server process must find, and if not found then create, a suitable child cursor. But here is the issue: When a SQL statement is hashed, it is hashed to the parent cursor (mutex in shared mode required) and then it appears each existing child cursor is sequentially accessed trying to find a suitable match (mutex in shared mode required). [Note: my next blog entry will address this "it appears" issue specifically.] If a suitable child cursor is not found, then a new child cursor is created, which requires parent and child mutex access in exclusive mode. So if there are hundreds or thousands of child cursors involved you can see that even with very fast CPU cores a popular SQL statement could result in a nightmarish performance situation.

Here is how to quickly demonstrate a bind variable mismatch. Save the below code snippet to t1.sql:
select count(*), max(object_id) 
from   bogus 
where  object_id = :obj_id1_var or object_id = :obj_id2_var ;
Now in SQL*Plus do this:
ALTER SYSTEM FLUSH SHARED_POOL;
variable obj_id1_var nchar(5);
exec :obj_id1_var := '1';
variable obj_id2_var nchar(5);
exec :obj_id2_var := 1;
@t1.sql
Now run this:
SQL> select sql_id,plan_hash_value,child_number, executions "execs"
  2  from   v$sql
  3  where  sql_text like 'select count%max%bogus%';

SQL_ID       PLAN_HASH_VALUE CHILD_NUMBER execs
------------- --------------- ------------ ----------
f5qpt4fmb4hkk      3200977001            0          1
Just what we expected; parent cursor with a single child cursor. Now lets redefine bind variable :obj_id2_var, run the select statement, and check the child cursor situation like this:
variable obj_id2_var number;
exec :obj_id2_var := 1;
@t1.sql

SQL> select sql_id,plan_hash_value,child_number, executions "execs"
  2  from   v$sql
  3  where  sql_text like 'select count%max%bogus%';

SQL_ID       PLAN_HASH_VALUE CHILD_NUMBER execs
------------- --------------- ------------ ----------
f5qpt4fmb4hkk      3200977001            0          1
f5qpt4fmb4hkk      3200977001            1          1
Why was the new child cursor created?
SQL>select * from v$sql_shared_cursor where sql_id='f5qpt4fmb4hkk';
 
SQL_ID       ADDRESS        CHILD_ADDRESS CHILD_NUMBER U S O O S L F E B P I S T A B D L T B I I R L I O E M U T N F
------------- ---------------- ---------------- ------------ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
A I T D L D B P C S C P T M B M R O P M F L P L A F L R L H P B
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
f5qpt4fmb4hkk 00000002B3B30280 00000002B3A93F18     0 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N
N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N

f5qpt4fmb4hkk 00000002B3B30280 00000002B3AA80D8     1 N N N N N N N N N N N N N N Y N N N N N N N N N N N N N N N N
N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N
Notice the "Y" is at position 15 which, referring back near the top of the blog, is the "BIND_MISMATCH" reason for the child cursor creation!

Here is a simpler query:
SQL> select sql_id, child_number, BIND_MISMATCH
  2  from   v$sql_shared_cursor 
  3  where  sql_id='f5qpt4fmb4hkk';

SQL_ID       CHILD_NUMBER B
------------- ------------ -
f5qpt4fmb4hkk            0 N
f5qpt4fmb4hkk            1 Y
This clearly does NOT demonstrate a problem. It simply demonstrates that in this closely orchestrated situation I was able to cause two child cursors by changing the bind variable definition enough (not it's length as with BIND_LENGTH_UPGRADEABLE situation) to cause another child cursor creation but not changed enough to cause a SQL statement failure.

The real question is, could this situation become an exponential situation? For example, if there were 3 bind variable could the situation have the potential of causing a 2^3 child cursors. The only way to really know this is create a very simple and very repeatable experiment that anyone could run for themselves... and that's what I've done below.

Here is what I did and what you can do. I basically automated the steps I used above to create a bind mismatch resulting in two child cursors and scaled it up to 5 bind variables, not just two.

Here what I did: As the Oracle Linux user, I copy and pasted the below text on the Linux command line:
outfile=doit.sql
rm $outfile

echo "set tab off" >> $outfile
echo "set linesize 130" >> $outfile
echo "ALTER SYSTEM FLUSH SHARED_POOL;" >> $outfile

for b1 in number nchar\(5\)
do
  for b2 in number nchar\(5\)
  do
    for b3 in number nchar\(5\)
    do
      for b4 in number nchar\(5\)
      do
        for b5 in number nchar\(5\)
        do
          echo "variable b1 $b1;" >> $outfile
          echo "exec :b1 := '1';" >> $outfile
          echo "variable b2 $b2;" >> $outfile
          echo "exec :b2 := '1';" >> $outfile
          echo "variable b3 $b3;" >> $outfile
          echo "exec :b3 := '1';" >> $outfile
          echo "variable b4 $b4;" >> $outfile
          echo "exec :b4 := '1';" >> $outfile
          echo "variable b5 $b5;" >> $outfile
          echo "exec :b5 := '1';" >> $outfile
echo "
select count(*), max(object_id) 
from   bogus 
where  object_id = :b1 or object_id = :b2 or object_id = :b3 or object_id = :b4 or object_id = :b5;
" >> $outfile
        done
      done
    done
  done
done

echo ""
echo "select sql_id,plan_hash_value,child_number, executions from v\$sql where sql_text like 'select count%bogus%';" >> $outfile
echo "exit;" >> $outfile
sqlplus system/manager @$outfile
The final select statement will look something like this:
SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER EXECUTIONS
------------- --------------- ------------ ----------
acxkxa5d11ffz      3200977001            0          1
acxkxa5d11ffz      3200977001            1          1
. . . 
acxkxa5d11ffz      3200977001           30          1
acxkxa5d11ffz      3200977001           31          1

32 rows selected.
Is that cool or what! But it gets better because look at why Oracle created the 32 child cursors.
SQL> select sql_id,address,child_number,child_address,bind_mismatch
  2  from   v$sql_shared_cursor
  3  where  sql_id='acxkxa5d11ffz';

SQL_ID                 ADDRESS CHILD_NUMBER CHILD_ADDRESS    B
------------- ---------------- ------------ ---------------- -
acxkxa5d11ffz 00000002B7C57E60            0 00000002B75F7448 N
acxkxa5d11ffz 00000002B7C57E60            1 00000002B7B39998 Y
. . .
acxkxa5d11ffz 00000002B7C57E60           30 00000002B7C400C0 Y
acxkxa5d11ffz 00000002B7C57E60           31 00000002B7C498E0 Y

32 rows selected.
The potentially performance killer point here is with only 5 bind variables, 32 child cursors were created. Let's put a formula to this.

child cursors = bind variable definitions ^ number of bind variables

Which in our case was 2^5 which equals 32. If you're getting nervous, I understand. Anything to the power of something should catch our attention.

Scaling up to production...

There are three points I want to highlight:

1. While 32 child cursors is not likely to cause a performance problem, it could. In very high concurrency situations the shared mutex acquisition and child cursor scanning could result in CPU subsystem saturation and mutex contention.

2. Based on my experiments and the difficulty in creating bind mismatch using PLSQL or Perl, I will always ask if there is Java code involved in the mutex constrained application.

3. A more likely and very serious performance situation is when there is a SQL statement with perhaps 10, 20, 30, or perhaps 40 bind variables written in Java where the SQL is the same but the bind variable definition is inconsistent resulting in only 2 variations. This means where could be up to 2^10 , 2^20 , 2^30 or 2^40 that is, 1024, 1048576, 1073741824, or 1099511627776 child cursors! Unless Oracle limits the number of child cursors (which I'm told it does in certain Oracle 11g versions to 1024 child cursors), you're in serious trouble.

What does this mean for us?

There are many ways multiple child cursors are created. If you have mutex related contention, check the version count of your active SQL statements (i.e., being executed) AND determine why Oracle is creating the child cursors. Don't assume it's a bind mismatch, check in v$sql_shared_cursor.

This begs the questions: How many child cursors are OK? And are there other factors, like concurrency? And then what are our options? These questions are what my next posting will focus on.

But before I could study this I needed a simple, repeatable, and verifiable way to create lots and lots of child cursors. As usual I learned more than I expected and I hope enjoyed this posting!

Thanks for reading!

Craig.