Thursday, February 25, 2010

How mutexes sleep

As with Oracle locks and latches, mutexes are used to ensure serial access to Oracle structures. They both follow a shockingly simple yet efficient algorithm of spinning to acquire the control structure (thereby consuming CPU) and sleeping when spinning does not quickly acquire the control structure (posting a wait event).

But there are significant differences. Locks are defined to work on relational structures, like tables and rows, are orderly (no randomness and very boring), and consume more resources (per acquisition) than latches and mutexes. Whereas latches and mutexes were specifically designed to work with memory structures. When there are locking issues, enqueue related wait events become significant but when memory serialization issues become an issue latch and mutex related wait events become significant. So there are differences not only in name, but also in usage and resource requirements.

Mutexes were original created for high concurrency thread control and are part of the standard C programming library of functions. You can even do a man -k mutex on U/Linux systems to get a glimpse of the various mutex related functions. Try that with a latch and nothing is returned. Oracle uses mutexes to control serialization access to certain internal memory structures. As of 11g, only library cache related mutexes exist, but this will undoubtedly change.

Mutexes offer the kernel developer flexibility and offer significant performance benefits over latches. While I have much more detail in my Oracle Performance Firefighting book's chapter on Serialization Control, I'll quickly summarize mutex benefits. First mutexes are a library function available to Oracle kernel developers, so as you would expect they will be more efficient. My Oracle contacts tell me that on average, acquiring a mutex requires 30 to 35 instructions whereas latches require 150-200. This translates into reduced CPU consumption and therefore faster acquisition times or the the ability to process more work (more activity per second). The memory footprint is also supposed to be smaller by around 100 bytes, but honestly I don't think Oracle or anyone really cares about hundreds, thousands, or even hundreds of thousands of bytes of memory. Mutexes can pin cursors (so they are not deallocated) much quicker than latches. One of my experiments showed mutexes pinned and unpinned cursors 11% faster than traditional latches. But the big benefit is they provide more granular memory structure control, which reduces what's called false contention.

False contention results when say, a latch is used protect a complex memory structure where two processes need to access different parts of the memory structures yet because there is only a single control structure, one process will get access and the other will not. Even though they will not access the same piece of memory! Because mutexes can be more flexibly defined to be associated with a memory structure, Oracle can more naturally associate a mutex with small pieces of memory. For example, within the library cache, each cursor, child cursor, and table has an associated mutex. (I have dumped the library cache and demonstrated this.) When latches are used entire library cache chains could be protected by a single latch (I have not dumped a library cache in quite a while using only latches to 110% confirm this). With increased granularity comes increased concurrency, which means decreased CPU consumption and Oracle wait time to get the same amount of work done.

When a server or background process attempts to acquire a latch or a mutex in willing-to-wait mode they repeatedly ask for the control structure until either they acquire the control structure or they reach their spin limit. Latches spin up to the instance parameter _spin_count (unless latch classes are used) and for mutexes Oracle has a hard limit of 255. If the control structure is not acquired through spinning, the process sleeps.

This is where it gets really interesting! For latches, the system call select is used to pause the spinning for a defined period of time. For mutexes, Oracle has options, but has chosen to have the sleeping process yield the CPU. Essentially, the sleeping process put itself at the end of the run queue. This is in a way a self adjusting sleep. If the CPU subsystem is not all that busy, then the sleep time will be near zero and the server or background process will quickly resume spinning. But if the CPU subsystem is very busy, the process will patiently wait for its turn to consume CPU once again. (more about this below)

OK... let's take a step back and I'll show you how we can see this take place. Here's what an Oracle system can look like when severe mutex contention exists. The cursor: pin S is the result of Oracle processes sleeping (and the relatively high CPU consumption) is due in large part to spinning on the mutex trying to acquire it. But CPU is also used (and lots of it) to actually manage the library cache.


SQL: @rtpctx.sql
Remember: This report must be run twice so both the initial and
final values are available. If no output, press ENTER about nine times.

Database: prod18                                             21-JAN-10 12:57pm
Report:   rtpctx.sql           OSM by OraPub, Inc.           Page        1
                     Response Time Activity (61 sec interval)

                                       Time       Avg     Time        Wait
RT Component                           (sec)      % RT    Waited (ms) Count(k)
-------------------------------------- ---------- ------- ----------- --------
CPU consumption: Oracle SP + BG procs     248.585   97.92       0.000        0
cursor: pin S                               5.100    2.01      42.500        0
os thread startup                           0.030    0.01      15.000        0
control file parallel write                 0.020    0.01       1.000        0
db file async I/O submit                    0.010    0.00       0.769        0


If you're wondering how I forced the cursor: pin S wait event, just have a few processes do this:

declare
  i number;
  j number;
  cursor abc is select count(c1) into j from test_table where rownum=1 ;
begin
  loop
    for i in 1..1000000
    loop
        open  abc;
        close abc;
    end loop;
    sys.dbms_lock.sleep(1);
  end loop;
end;
/

Opening a cursor requires it be pinned in shared mode and part of that process is acquiring the cursor's mutex in shared mode, hence the wait cursor: pin S.

The way to actually see the difference with how a mutex sleeps compared to a latch is to operating system trace a process that is spending lots of time acquiring mutexes. While the above script was being run by a few processes (only takes a few), I traced it using the Linux command strace. Here are the results for a 60 second sample and then followed by a 180 second sample.

[oracle@fourcore mutex_calls]$ ps -eaf|grep oracleprod18
oracle   12647 12646 59 14:32 ?        00:09:11 oracleprod18

[oracle@fourcore mutex_calls]$ strace -rcp 12647
Process 12647 attached - interrupt to quit
Process 12647 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 47.40    0.015586           0   1538041           getrusage
 33.49    0.011013           0   1230577           gettimeofday
 19.11    0.006284           0    615192           times
  0.00    0.000000           0        57           sched_yield
  0.00    0.000000           0         1         1 semtimedop
------ ----------- ----------- --------- --------- ----------------
100.00    0.032883               3383868         1 total

[oracle@fourcore mutex_calls]$ strace -rcp 12647
Process 12647 attached - interrupt to quit
Process 12647 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 46.09    0.039050           0   4194768           getrusage
 35.27    0.029884           0   3356216           gettimeofday
 18.63    0.015787           0   1677838           times
  0.01    0.000007           0       157           sched_yield
  0.00    0.000000           0         1         1 semtimedop
------ ----------- ----------- --------- --------- ----------------
100.00    0.084728               9228980         1 total

On average this struggling server process yielded CPU control around once every 0.9 seconds. The CPU run queue was not that long however because the process was able to "wake up" and start spinning once again in nearly no time at all (see the seconds column).

The sched_yield call simply relinquishes the processor voluntarily without blocking. The process will then be moved to the end of the queue for its static priority and a new process gets to run. What is so cool is if the relinquishing process is the only process in the highest priority list at that time, this process  will continue to run after a call to sched_yield().

This is significant because instead of a process waiting a defined period of time, say 10ms, it will simply move itself to the end of the queue. This avoids the unfortunate situation where there is idle CPU resources while the process is sleeping! It's kind of like the lead goose in a large geese V formation. The lead goose is taking all the heat up front. After a while of this, it backs off and let's another goose take some heat. Eventually the goose will work its way back up to the front.

While this post is about how mutexes sleep, a little more practically speaking (but not much), once you determine there is significant mutex contention, you need to understand specifically what the particular mutex is telling you and what the application is trying to do that is stressing Oracle so much. So far every time I have seen mutex contention and talked with DBAs with systems experiencing mutex contention, we have been able to understand how and why their application is stressing the mutexes so much. In each case, by creatively altering the application, the significant mutex contention was eliminated.

Thanks for reading and I look forward to meeting in you person one day.

Craig.

3 comments:

  1. well im not sure of how many mutexes sleep but i got sllimping problems cause i got sleep sex disorder.

    ReplyDelete
  2. Craig,

    Thanks for writing this blog. I seem to facing some mutex lock issues. However, I also notice that many processes are hanging on futex system calls. For example:

    $ strace -rcp 15788
    Process 15788 attached - interrupt to quit
    [ Process PID=15788 runs in 32 bit mode. ]
    Process 15788 detached

    I have seen this on both RHEL 4 and RHEL 5 versions and 11.1.0.7 instances. Have you see this issue before and is it normal?

    -Ravi

    ReplyDelete
  3. Oops, sorry I messed up. The above lines don't show futex calls:

    $ strace -p 15788
    Process 15788 attached - interrupt to quit
    [ Process PID=15788 runs in 32 bit mode. ]
    futex(0xf7e5ebe8, FUTEX_WAIT, 15789, NULL
    Process 15788 detached
    applmgr@vq2lr105:>strace -rcp 15788
    Process 15788 attached - interrupt to quit

    ReplyDelete