We all know in our gut that increasing the number of batch process parallel streams has its benefit limits. And in fact, we have a hunch that if we get carried away and spawned perhaps 100 parallel streams the batch process could actually take longer. Why this "diminishing returns" effect occurs is what I want to explore.
In Part 2 of this series I introduced a plot showing batch process duration versus the number of parallel streams. This fairyland best-case-scenario is shown as the blue line in the below plot.
The general relationship between streams, workload, and process duration is:
duration = total workload / parallel streams
But the reality can be much different, as shown by the red line in the above plot. What occurs in reality is for each additional parallel stream, the system effectively receives only a fraction of parallel stream benefit. So when two parallel streams exist, the system may effectively receive only 1.75 streams, not 2. And if three parallel streams exist, the system may effectively receive only 2.31 streams, not 3.
Why this loss in effective processing power? This is a vast topic centered around scalability. I want to limit our exploration by focusing only on increasing the number of batch processing parallel streams.
But if you want to dig into scalability and explore the benefits of say, adding additional CPU cores, then I encourage you to read the Scalability chapter in my book entitled, Forecasting Oracle Performance. There is also an interesting spreadsheet anyone can download for free, full of fascinating scalability related math which, also includes the basis for which you can determine the CPU core scalability for your system. But this is not the focus of today.
There are a surprising number of reasons why we can't just keep adding additional batch process parallel streams and expect the duration to continually improve.
First, remember that our best case scenario is the blue line in the above plot. Even in our best case scenario adding more than ten streams is not likely to produce a significant improvement.
Second, when we parallelize, a serial process must be split into multiple parallel executing processes. There is overhead, that is, time is consumed in splitting the serial process. This becomes especially challenging (and extremely frustrating) when the process was initially designed to run serially. I'm sure there are books written on this subject alone.
Third, when we parallelize, the likelihood of concurrency issues increase. The parallel processes may be referencing rows in the same blocks or tables (sure Oracle has row level locking, but buffers can still be busy...), they may be requesting duplicate full table scans (though Oracle, we hope, will determine the optimal execution plan), and there may be control structures like sequence numbers or status codes that must be maintained. My point here is that when the process is run serially none of this is an issue, that is, there is no chance of a concurrency issue. If we start increasing the number of parallel streams we could push the unique combination of Oracle's configuration, the power of the operating system, and design of the application into manifesting various enqueue, buffer busy, read by other session, or latch free waits. When altering a process that was initially designed to run serially, we have to be very careful about introducing concurrency issues.
Fourth, when we parallelize we could introduce processing dependancies that did not exist before. Perhaps steps A, B, and C must complete before step D can begin. In this scenario, the parallel streams involved in processing steps A, B, and C must somehow know to stop and wait (ouch!) before step D begins. A dependency laden batch process will severely limit the potential advantages gained by introducing parallelization. And it also severely limits the number of parallel streams that will make a real difference in the batch duration.
Fifth, it follows that if we split a process into multiple parallel streams there may be the need to join/merge the results together once again. This is highly application dependent, but it could represent a significant amount of time that simply did not exist when running the process serially. If this join/merge operation must occur, it is also highly likely dependency issues must be controlled.
I'm sure there are other performance inhibitors not listed above, but I think you get the idea... that adding additional parallel streams may be both technically challenging (if not impossible) and will likely force additional code to be run.
Let's interject some numbers. Suppose a serial batch process takes 2 hours to complete. Some initial parallelization strategy designs indicate that running six parallel streams will introduce about 5 minutes of process splitting time, 5 minutes of concurrency contention, 10 minutes of process dependency contention, and 10 minutes to merge the final results. So while we are able to technically run the process in six parallel streams, we have introduced 30 minutes of additional time as a direct result of our parallelization efforts. In our six stream parallelization dream-world, a serial 2 hour process will complete in 20 minutes (120 min / 6 streams). Injecting some reality, which is estimated to be 30 minutes, with six parallel stream actual process duration will be closer to 50 minutes (20 min + 30 min). So instead of the serial process taking 120 minutes, when running as six parallel streams we expect it to run in around 50 minutes, which is less than half the time. Perhaps this is a fantastic result and just what is needed.
But the decisions are just beginning. How confident are we in our parallelization estimates? What is the impact if we are wrong? How much time will take the developers and DBAs to parallelize the process? And of course there is the nagging question of, "Will the system be able to handle the increased workload when we let loose six parallel streams?" The increased workload by increasing parallelization will be the topic of exploration in my next blog entry.
So while increasing parallelization can produce amazing results, it does introduce and intensify factors that we may not have had to deal with when the batch process ran serially.
Thanks for walking with me through this parallelism journey,
P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at email@example.com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email.