The overall performance of systems is often an afterthought. Developers spend months building a system, often muttering things like “we have plenty of CPU”, “we can add more servers” or “parallel processing is more trouble than it’s worth” or the even more popular “premature optimization is the root of all evil”. After all, Donald Knuth said that last quote, and he was quoting Sir Tony Hoare, so it must be right!
The quote is correct, it’s just taken out of context.
If you are responsible in any way for the system's production you need to stamp on this attitude quickly, before this complacency turns into the acceptance of mediocrity.
Assuming that you missed your chance to do this, at some stage the system is declared complete because it meets the customer’s functional requirements. It also runs slowly. Pathetically slowly. Maybe users think it has crashed (apparently your system has seven seconds to respond before this occurs) or they are faced with crashes, graphs and charts taking minutes to draw, progress bars, loading screens, “(Not responding)” warnings in window title bars or any of a number of other indications that at least some of your architects and developers got things horribly wrong.
When you ask why the system is slow, you should expect to be told that it’s because “the algorithms are really complex”, “there’s a lot of data to process”, “those graphs/charts take a long time to render” and various other excuses. These statements are almost always, shall we say, inaccurate at best.
So now, in addition to your other problems, you need to make sure the system runs at a reasonable speed. How do you do this?
Firstly you need to find some developers who are a little obsessed with performance. Not too obsessed because with performance, like any other area of engineering, it’s easy to head into the world of diminishing returns. Spending weeks to get a 99% reduction in overall processing times is probably worth it. Spending the same amount of time to reduce the time it takes to log in from 4 seconds to 3 is a waste. The more experienced the developers are, the more likely they are to be able to gauge what to optimize.
These developers will each have their own areas of specialization and experience; make sure you have a reasonable spread of experience so that the whole system gets covered.
They could dive straight into optimizing the design of the entire system. This has pros and cons: the pros include it being the right thing to do. The cons include it often involving a rewrite of the system. An alternative is to refactor and/or optimize each subsystem in turn, starting with the one that performs the worst. Once it is no longer the bottleneck, move to another subsystem. Repeat this and with luck your system will reach a level where its performance is acceptable.
About now, many experienced architects and developers will think that I’ve lost touch with reality, or that I generally don’t know what I’m talking about. Now seems like a good time to repeat the following:
- The solution I suggested: rewriting or refactoring parts of a system, is far from ideal. Redesigning the whole architecture is strictly the right way to do it. But:
- You already have a system for which your company paid a lot of money. Do you really think you can just dump it and start again?
The system is at least running. Hopefully the promise of constant improvements (and contractual obligations) will prevent too many customers from dropping your company.
Remember the problem you’re facing is that the “getting it right” ship sailed long ago. You are in the position of having to make the system less dire, and still letting the customers who pay your salary get some value from it.
Think of this approach as a crowd of people (your subsystems) being chased by a bear (your performance problems). If you’re in the crowd your first thought might be that you need to run faster than everyone else. But, when you think about it, that’s wrong. You don’t have to be faster than everyone, you have to be faster than someone. Second slowest is probably fast enough.
Of course, none of this would have happened if you’d just kept a look-out and avoided the bear in the first place.
"Definition: Premature pessimization is when you write code that is slower than it needs to be, usually by asking for unnecessary extra work, when equivalently complex code would be faster and should just naturally flow out of your fingers."