Wednesday, June 16, 2010

Altering insert commit batch size - Part 4

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

If you have encountered the log file sync wait event before, you know it can be very difficult to resolve. And even when a solution technically makes sense it can be very challenging to anticipate the performance improvement. In the three previous blog postings in this series I explored the wait event and its common cause (rapid commits), and setup an experiment to demonstrate that by increasing the inserts per commit (batching the commits) Oracle was able to insert more rows per second. We explored the core response time components (service time, queue time, arrival rate, number of "servers") and finally created a response time curve representing the current performance situation!

One important value of the response time curve is helping others to visually see that the performance is precarious (or at least non-optimal). I've never met a manager who didn't quickly grasp that being in the "elbow of the curve" was a bad thing. Most people inherently feel in their gut that it's not the place to be. So then the logical question is, "How do we get out of the elbow of the curve?" And that, is when you list off your fantastic performance solutions!

In this blog entry I want to demonstrate how you can take your performance solutions and inject them into the response time plot. This allows you to visually show others how you will "get out of the elbow of the curve." Plus, if/when you are challenged or you'd like to provide more detail, you have the numerics to back up your analysis. So let's get started!

Thinking Consistently

All performance solutions will in some way alter the core response time variables; service time, queue time, arrival rate, or M. Having a solid and consistent definition of the variables are key to understanding how a performance solution will alter one or more these variables. So let's quickly review them from an Oracle system perspective.

Service time is the CPU consumed per unit of work. For example, 0.0914 ms/insert or perhaps 5.01 ms/pio, where "pio" is Oracle physical block reads, which I commonly call, physical IO.

Queue time is the non-idle Oracle wait time per unit of work. For example, 0.1356 ms/insert or perhaps 115.35 ms/pio.

Arrival rate is the work processed (i.e., the workload) per unit of time. For example, 10.8445 inserts/ms or perhaps 0.0516 pio/ms. (This is not a 100% correct definition, but it's close enough for this analysis.)

M is the number of effective transaction servers, such as a CPU core or an active IO "device." For example, 1.4235 devices or perhaps 0.2695 devices. (Defining an IO "device" is out of scope for this blog entry, but you can scan my Firefighting book for the word, here.) M can be difficult to solve, so this free msolver web application can help.

How Solutions Change the Parameters

As I mentioned above we can boil down any Oracle performance solution to primarily changing one of the above parameters, and probably only either the service time, arrival rate, or the number of servers (M). Let me give you a few examples:
  • If, in your SQL tuning prowess, you reduce the number of physical IOs and PIO is the unit of work, then you are reducing the arrival rate, that is, the number of PIOs that occur over the sample (e.g., AWR report) interval of time.
  • If, in your Oracle internals mastery, you increased the number of cache buffer chain latches to reduce cache buffer chain latch contention, you are reducing the CPU consumed/required to process a single logical IO, which means the service time will be reduced.
  • If, in your financial wizardry you beg, buy, or steal (By stealing, I'm referring to virtual machine power of course.) CPU cores, then you are increasing the number of effective servers, M.
  • If, in your gut you know that, if you increase the Oracle buffer cache less physical IOs will be required, then you are reducing the arrival rate, that is, the number of PIOs that occur over an interval of time.
In general, here's how to work through understanding how your change will affect either the service time, arrival rate, or the number of servers (M). Please read this slowly...

If the service time is going to decrease, then your performance solution will, on average over an interval of time, reduce the amount of CPU Oracle consumes to process a single unit of work. This typically requires some kind of efficiency improvement, like tuning Oracle or using faster CPUs.

If the arrival rate is going to decrease, then your solution will, on average over an interval of time, reduce the amount of work to be processed per unit of time. You are essentially reducing the workload, that is, requiring the system to process less units of work. Workload balancing and SQL tuning are good examples of reducing the arrival rate.

If the number of servers is going to change (increase or decrease), then there will need to be change in the number of effective CPU cores or IO devices. This typically requires a physical configuration change and therefore is not very likely in most Oracle systems. However, a virtual machine can "easily" alter the number of CPU cores, thereby directly affecting the number of servers.

Focus on the inputs, not the outputs. I also find it's helpful to focus on the single most pronounced change. For example, if the service time will decrease, then the system will be able to process more work per unit of time. This means the arrival rate will most likely increase... but this arrival rate increase is the result, that is, the output of our solution, not the input. Focus on what you will change, enter that into the response time formula (as I'll show below), and let the graph appear before you eyes. Then you can see, for example, that the arrival could possibly increase before the response time started to dramatically increase. So focus on the inputs and let the math take care of the output.

How this works in practice is that for each one of your proposed performance solutions, you ask yourself how the solution will impact the service time, arrival rate, or the number of servers. You can create a simple matrix to help organize your thoughts.

I'm not implying this is a simple thought process. Personally I find this distillation process very difficult. It forces me to bring together all my Oracle internals, application SQL tuning, operating systems, and queuing theory knowledge and experience while at the same time keeping the above core definitions consistent. It's just plain hard. However, the process purifies my thinking and forces me to really think through just why a particular performance solution will make a difference. And if I can do that, then I'm able to explain the performance situation and solution much better others... not to mention the confidence I gain in anticipating the impact of my proposed solution. (Or arguing with a vendor about why their recommendation makes absolutely no sense.)

So if you want to get good at this, it will take some practice. (And if you want to, email me your thought process. I will email you back.) It takes an entire day of working with my students in my Advanced Oracle Performance Analysis course before I'm able to open up the discussion of how nearly a dozen performance solutions will effect the service time, arrival rate, or the number of servers. As a teacher, it's one of my life's highlights because I know everyone in the class is about to have at least one "ah-ha moment." Guaranteed!

Putting This Into Practice

Enough talk, let's apply this to reality. I'm going to continue using our commit batch size experiment because we have actual real numerical results when the batch size was increased. The full experimental results can be found in Part 2 of this blog series and the top few can be found in Part 3. I have duplicated the just the top few lines below:

Batch Arrival Rate       % Service Time       %   Queue Time       % Respone Time       %
  Size (inserts/ms)  Change (ms/inserts)  Change (ms/inserts)  Change (ms/inserts)  Change
------ ------------ ------- ------------ ------- ------------ ------- ------------ -------
     1      10.8445    0.00       0.0914    0.00       0.1356   -0.00       0.2270    0.00
     2      13.8570   27.78       0.0722  -21.01       0.1071  -20.96       0.1794  -20.98
     4      16.2470   49.82       0.0613  -32.95       0.0914  -32.61       0.1527  -32.75
     8      17.7815   63.97       0.0564  -38.34       0.0855  -36.91       0.1419  -37.49

Based on the data we pulled from the actual experimental results, with a batch size of one we observed:

Arrival Rate (L): 10.8445 inserts/ms
Service Time (S): 0.0914 ms/insert
Queue Time (Q): 0.1356 ms/insert
Number of Servers (M): 1.4235 (derived using the msolver web application)

In the previous blog (Part 3) I showed a graph plotting the above arrival rate and response time (service time + queue time). Then applying the CPU-based response time formula while varying only the arrival rate, the associated response time curve appeared and ran directly through our datapoint!

Now let's take a common log file sync solution and think through how that specific solution will affect either the service time, arrival rate, or the number of servers. It's paramount to remember the unit of work is a single insert and the unit of time is constant, in this case a single millisecond. Supposing our performance improving solution is to increase the batch size from one to four, let's think about how that will affect the service time, arrival rate, and number of servers.

Service time will decrease because Oracle has become more CPU efficient per insert. It should take less CPU time to insert each row because by batching our inserts into a single commit, Oracle distributes the commit CPU consumption overhead across more inserts. This decreases the CPU overhead, that is, consumption per insert. So we expect the CPU consumed per insert, that is the service time, to decrease.

As an input parameter, the arrival rate will remain the same because I am not pushing/forcing more work through the system. I am not starting more insert processes, asking employees to work faster, or bringing more employees on-line. However, I do expect the system to be able to process more work per time, but I am not forcing this change. Remember, I am looking for input changes, not output results.

The number of servers (M) will not change. We are modeling this situation as a CPU bound system and since we are not altering the number of CPU cores the number of servers will not change.

Now the question becomes, "Just how much will the service time decrease?" This is where your experience, gut feeling, the risk of being wrong and the resulting impact, and testing all come together. I highly recommend you not do this in isolation. If you are having a good day, you're likely to be optimistic. And of course, if you're having a bad day, you'll be pessimistic. So discuss this with at least one other person.

Fortunately I did run a test! In this situation, the test is the experimental results shown in the table above. The only input change is the service time while keeping constant the arrival rate and the number of servers. The experimental result table above shows that with a batch size of four, the actual service time decreased by 32.95%, down to 0.0613 ms/insert. Therefore:

Arrival Rate (L): 10.8445 inserts/ms (no input change, but hope the result will increase)
Service Time (S): 0.0613 ms/insert (we decreased to match our experimental results)
Number of Servers (M): 1.4235 (no change because the CPU configuration is the same)

Placing these parameters into the CPU based response time formula, we have:

R(cpu) = S / ( 1 - (SL/M)^M ) = 0.0613 / (1 - (0.0613*10.8445/1.4235)^1.4235) =  0.926363

The experimental result table above show that with a batch size of four, the actual service time decreased by 32.95% down to 0.0613 ms/insert and the actual response time was 0.1527 ms/insert. However, we anticipated (never use the word predict or forecast using this analysis method...more below about this) the response time would drop to 0.926363 ms/insert. So in this situation it appears we anticipated a significantly greater benefit then actually occurred? Nope, read on...

This is important to understand: The experimental results shown in the table above are based on a real Oracle system were we allowed the arrival rate to increase to a steady-state and is not based on mathematics! In contrast, our calculated response time is based on a frozen arrival rate, a frozen number of servers, and we only changed the service time. Because our response time figure is based on a lower arrival rate (10.8445 vs 16.2470), we would expect our response time calculation to be less.

Now I am going to substitute the actual experimentally observed arrival rate (16.2470 inserts/ms) into the CPU based response time formula. The inputs and the calculated response time are now:

Arrival Rate (L): 16.2470 inserts/ms
Service Time (S): 0.0613 ms/insert
Number of Servers (M): 1.4235

R(cpu) = S / ( 1 - (SL/M)^M ) = 0.0613 / (1 - (0.0613*16.2470/1.4235)^1.4235) = 0.153798

Wow! What this means is if we entered the observed increased workload (that is, the arrival rate) into our formula, our calculations would have been within a celebratory 0.7%. This is amazing and should provide some confidence when using this analysis method. Don't ever let someone get away with saying Oracle systems don't behave in a query theory like way! Can I be so bold as to say, they have not freak'n idea what they are talking about!?

For those of us who understand better with pictures, let's look at the before and after graphs. The below graph was created using my free Response Time Comparison MS-Excel based spreadsheet.


Our baseline situation is the blue point on the blue response time curve. This represents the system with one insert per commit. Based on our experiment, when we increased the batch size from one to four, we observed the service time decreased (0.0613 ms/insert) and we recalculated the response time (0.9264 ms/insert), which is represented by the red point on the orange curve. Notice that along with the service time decrease the entire response time curve shifted down and to the right resulting in the orange response time curve. (I will blog about this response time curve shift another time.) We then entered the observed four inserts per commit arrival rate (16.2470 inserts/ms) into our response time calculation and the the orange point resulted showing that at an arrival rate of 16.2470 inserts/ms the response time is 0.154 ms/inserts, which is nearly perfect to what what actually occurred! So in this case our model did a great job at anticipating what was to occur. (This is also why it is important to test our proposed solutions whenever possible. The test provided us with a reliable service time.)

Very Cool Yes, But Caution Please

If you're like many DBAs, this type of analysis is very exciting because it opens up an entirely new method to effectively communicate a complex performance situation, brings about a deeper level of performance understanding, and allows us to anticipate change.

But just how good is this "anticipation?" While the example I used worked nearly perfect (and was not selected among many examples), it's not always this good and more importantly, the performance model developed is not robust.

Let me contrast the anticipatory work I just did to a true predictive analysis forecast. What we did was quickly develop a performance model based on a single data point. Yes, the data point is based on a real system, contains actual Oracle activity over an interval of time, and standard performance mathematics was used. That's the good news and why this works fantastic for quick work in an existing firefighting situation. However, there is plenty of bad news! To name just a few weaknesses: Our model is based on a single data point, performance can dramatically fluctuate during intense times, and what scares me the most is, our analysis contains absolutely no statistical work. This means we have absolutely no mathematics behind the "plus and minus" of our "forecast."

Does this mean we throw out all the work we have just done? Absolutely not! It's much, much better than using our gut, our experience, group think, marketing hype, vendor promises, or the latest performance tuning fad. At a minimum it will force you to think through why a performance solution should work, why for some strange reason you want to aggressively implement a particular solution, allows you to communicate more effectively, and provides a reference point for your work.

My point is, you must develop a proper and robust predictive model to answer questions like, "What will happen if we we insert this new SQL statement?" Or, "Will the system be able to handle the load in nine months?" Or, "Should be purchase these new CPUs?" These questions are based on you not being in a firefighting situation. This means you have time to gather lots of data and perform a solid predictive performance analysis.

Concluding Thoughts

Over the past few blog entries we have explored why increasing the commit batch size can make a dramatic performance impact when presented with the log file sync wait event. If you have followed this entire series, you know we've covered a lot of ground:

1. Understanding what the log file sync wait event means.
2. Presented one of the most common solutions, that is, increasing commit batch size.
3. Modeled the system in a traditional performance mathematics way, that is, response time.
4. Created a graph to visually demonstrate the situation.
5. Discussed how and why we expect our batching solution to impact the key performance variables.
6. Altered the service time to reflect increasing the batch size from one to four.
7. Anticipated the response time change both numerically and graphically.
8. Demonstrated our model anticipated response time nearly perfect!
9. Discussed why this analysis is only appropriate for performance firefighting work.

Wow!

Sales Pitch: Creating robust forecasting models is what my Oracle Forecasting & Predictive Analysis course is all about. What I've showed you in this blog series is part of what I teach in my Advanced Oracle Performance Analysis course and is introduced in the last chapter of my Oracle Performance Firefighting book. If you enjoyed what you have read, then you'll thoroughly enjoy these courses.

This brings me to the end of this blog series. Now it's time to go back and finish the parallelism blog series. Plus I've been working on ways to visualize Oracle internal structures... so there is some good stuff to be posted!

Thanks for reading!

Craig.



P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at craig@orapub.com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email.

24 comments:

  1. Great post, but R(cpu) should be 0.092 not 0.92. Caused a lot of head scratching until I recalculated it. Please update.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Thank you for such a nice article keep posting, I am a Regular
    Visitor of your website
    android
    apkzm
    apps apkzm

    ReplyDelete
  4. Automated Forex Trading :roboforex login Is An Automated Forex Investing Software. It Is An Algorithmic Trading Software That Provides Automated Forex Trading Signals.

    ReplyDelete
  5. Forex Website Seo are a quick and easy way to increase the visibility of your website on the internet. There are many types of backlinks that you can purchase, including forum posts, blog posts, articles, press releases, social media posts, and more. We have the experience and expertise to create the perfect Forex Website Seo for your needs.

    ReplyDelete
  6. Tradingzy Is The Best And Most Reliable Seo Company And Offers A Wide Variety Of Forex Seo Reseller , Including An Affordable Monthly Subscription Service, One-time Links, And Blog Comments. With Any Of These Services, You Can Be Assured That Your Site Will Rank Higher In The Search Engine Results Page And Receive An Influx Of Targeted Traffic.

    ReplyDelete
  7. You have hundreds of online brokers to choose from, so we are here to help you make an informed decision. Online Stock Broker provides many resources including a comparison system based on the top brokers in the industry, Brokers Review And There Login Details to help you find the one that is best for you.

    ReplyDelete
  8. Get the latest Phoenix Craigs List real-time quote, historical performance, charts, and other financial information to help you make more informed investment decisions. You'll get an overview of a stock's key fundamentals including revenue and earnings as well as valuation measures such as price/earnings ratio. Beyond fundamentals, you'll find technical analysis from experts, news, historical trends, and predictors - all available in Our Servlogin Webpage.

    ReplyDelete
  9. At Market Forex Net , we believe in honesty, transparency and trust. We started Market Forex net to educate traders and help them with their trading techniques. We publish reviews of brokerages, strategies, investment tips, guides and much more.

    ReplyDelete
  10. Check The Acam Stock Overview To Monitor A Stock And View Live Price Changes As They Occur, Including Volume And Share Changes. With Our Live, Real Time Stock Market Overview App, You No Longer Need To Refresh Your Browser Or Watch Multiple Computer Screens To View Real Time Stock Prices.

    ReplyDelete
  11. Never miss a stock price change from Acam Stock with our Live, Real Time Stock Market Overview! Most of the data is updated every 5 minutes to ensure that you are getting all of the latest Acam Stock change alerts as soon as they happen.

    ReplyDelete
  12. How To Start Forex Broker is a comprehensive guide to deciding on the best Forex Brokers, Trading Costs and Fees. Find out which brokers offer the best value for money?

    ReplyDelete
  13. At Ati Testing Login , you will find everything that you need related to your account and your choices as as a customer. Complete information about, News & Events, Credit Cards, Bill Pay, Rewards, Customer Service and much more.

    ReplyDelete
  14. Ati Testing Login will help you to choose the bestAti Testing Login online account for you. We are committed to providing excellent online recommendation for home buyers and sellers of personal information.

    ReplyDelete
  15. Start Forex Brokerage is a comprehensive guide to deciding on the best Forex Brokers, Trading Costs and Fees. Find out which brokers offer the best value for money?

    ReplyDelete
  16. The most comprehensive guide to Forex Brokers and Start Forex Brokerage , showing you what you can save or earn by choosing the best broker for your needs – from Highest to Lowest Charges. Useful for both beginner and seasoned trader.

    ReplyDelete
  17. Ati testing login Overview: Stay ahead of the market with our live and real time stock market overview. Get all of your favorite stocks in one place, and get alerted to live news at the same time!

    ReplyDelete
  18. The Gteh Stocktwits Overview offers real time stock price updates. All you need to do is open the application and you can view Streaming stock prices of your favorite stocks.

    ReplyDelete
  19. The most comprehensive guide to Forex Brokers and Become A Forex Broker , showing you what you can save or earn by choosing the best broker for your needs – from Highest to Lowest Charges. Useful for both beginner and seasoned trader.

    ReplyDelete
  20. Cheapest White Label Forex Broker will enable you to get all the benefits of the MetaTrader 4 automated trading platform and MetaQuotes language with even more exciting functionalities for your own brand

    ReplyDelete
  21. Tradingzy is the best and most reliable Seo company And offers a wide variety "Forex Seo Services. We have a team of professional traders and marketing experts who have extensive experience in the field of digital marketing and forex trading, so we can offer you the best services.

    ReplyDelete
  22. This is our list of the Best MT4 Forex Brokers In Netherland . Each broker has been carefully selected and reviewed before appearing on this list. We take pride that all of the brokers listed pay their affiliates promptly and give you excellent trading education.

    ReplyDelete
  23. Here are our top picks for the Best MT4 Forex Brokers In Netherland . We chose only regulated and reputable brokers who have low fees and offer reliable trading platforms to their traders.

    ReplyDelete