Everyone in the world knows that if you want a long process to complete sooner you should consider parallelization. For example, businesses run their systems on multiple divisional computers, DBAs import by the user, soda drinkers use multiple straws, multiple iphone application updates are downloaded at the same time, and Oracle has a variety of parallelization schemes such as multiple background processes, parallel query, and even RAC. So parallelization is not something foreign to us. In fact, it seems rather natural.
But everyone also knows in their gut diminishing returns plays a part. That is, parallelization benefits slowly decrease as the parallelization increases. And in fact, at some point the task may take longer to complete because we got carried away with parallelization. So the lurking question is, what is the optimal number of parallel streams that will minimize the task duration time. This is what I want to ponder over the next few weeks.
My goal is for you to successfully select a batch process that is taking too long, consider how you might parallelize it, gather classic performance statistics, and determine the optimal number of parallel streams, that is determine the parallelization sweet spot.
My plan is to take this step by step, digging deeper and deeper as we go. The topics I'm sure I'll touch on will be scalability limitations of various kinds, process and transaction response time and elapsed time, utilization, gathering performance data, developing a mathematical model to help us understand and anticipate the value of our strategies, and combining activity from both the CPU and IO subsystems. That's a lot to cover, but we'll work through this together step-by-step.
Thanks for joining me in this quest!
P.S. If you want me to respond to a comment or have a question, please feel free to email me directly at email@example.com. I use a challenge-response spam blocker, so you'll need to open the challenge email and click on the link or I will not receive your email.