Years ago profile directed code reorganization (a sub-class of Basic Block Transforms) were implemented to solve what might appear to be an unrelated problem. The problem tackled by these profile directed basic block reorganizations is decreasing the number of last level cache misses in a server. They do this by organizing frequently accessed code segments together and moving rarely executed code segments. The biggest gain is that seldom executed error handling code can be moved away from frequently executed application code. I’ve seen reports of error handling code making up more than 40% of an application. Moving this code away from the commonly executed mainline code allows fewer processor cache lines to support program execution which demands fewer memory faults. Error handling code will execute more slowly but that is seldom an issue. Profile directed basic block transforms need to be trained on “typical” applications workloads and code that typically executes together will be placed together. Unfortunately, “typical” is often an important, industry standard benchmark like TPC-C so sometimes “typical” is replaced by “important” :-). Nonetheless, the tools are effective and greater than 20% improvement is common and we often see much more. All commercial database servers use or misuse profile directed basic block reorganizations.
Once the initial module is downloaded, the application is live and running and the rest of the modules are brought down asynchronously or faulted in as needed. Many applications due these optimizations manually but this is a nice automated approach to the problem