Have you ever written or used a system that runs perfectly when you test it on your machine, keeps running well through test, but then starts to slow down rapidly when it has to process more data in production?
Does adding more customers cause your system to slow down disproportionately?
When the company CEO stands up at meetings and says “We have x new customers!”, do you smile on the outside, because it's a good thing, but you're also thinking “This is going to cause so many problems!”
If you find yourself asking “Why has this code started to slow down so much?” you're in the right place.
The reason the system is slowing down dramatically when you try to make it process more data is that it isn't scalable. Actually that's not quite right – being scalable means that a system doesn't slow down disproportionately when you stress it with more data, so not being scalable is just a symptom. The real problem that's actually causing the slowdown is that the code is inefficient, possibly dreadfully so. Still, they say the first step towards solving a problem is realizing that you have one, and now you know you have a problem.
So, the evidence that the code is inefficient is there, but how bad is the problem? How do we work this out? Big O notation, and the analysis that goes with it, is a great way to understand what's going on, and why your code's behaviour is so dismal. Once you understand it you'll have a better idea of how to solve your scalability problems.
What Is Big O Notation?
Big O Notation is a way to describe the performance and efficiency of a process or algorithm. It allows you to compare solutions on the basis of their efficiency alone, with no danger of you being persuaded by nonsense like “CPU/memory are cheap”. Big O Notation is used PURELY to analyze the algorithms; anything else is irrelevant.
When Should I Use It? And When Shouldn't I?
If you want to compare the efficiency of a number of possible solutions to a problem, or if you're experiencing performance problems with your existing solutions, working out the Big O Notation for them is a great first step.
How Do I Do That?
I've heard people ask whether they need a Computer Science degree to analyze algorithms – you don't. What you DO need is the ability to look at algorithms in a slightly abstract way – don't worry if you've never done that before, it just takes practice. What you need to look for is the relationship between the number of data items being passed to the algorithm, and the worst-case number of operations it takes to process them. Operations in this case might be iterations or recursive calls – anything that takes any time for the system to do.
Working through some of the more common types of algorithm should help illustrate how it's done.
Constant time is as good a complexity as you can get: it means that regardless of the number of items involved, the time to carry out the algorithm is constant. The only way to beat that would be to have an algorithm whose execution time is reduced as you add items, and that's not going to happen.
This level of performance is what you get when you access a specific item in an array (although not when you search within an array), when you add or remove items to or from a stack, queue or linked list. If you want a collection that lets you search, insert and remove items with O(1) complexity, you need to use a hashtable. If you want to use a slightly more complex collection than a hashtable, with the same performance, remember that a lot of dictionary and set classes in various languages use hashtables to handle the keys.
Say you have a collection of values, and you need to find a particular value within that collection. If the values are in an unsorted list or array, you have to go through the whole collection to find the value. Doubling the size of the collection doubles the number of values you have to search through. So, this algorithm has a complexity of O(n).
You might think that, on average, you only have to check half of the items in the collection; after all there's a 50% chance of the search finding the value in the first half of the collection, so the time should be proportional to:
n / 2
None of that matters though, because we only take notice of the dominant term – all we care about here is the value of n, so the complexity is still O(n).
Often you can do better than O(n), but the general guideline is that, if you have to access every item in a collection, O(n) is the best you'll be able to do.
Continuing with the idea of a search in an array, it's possible to improve the O(n) performance of an unsorted array quite a lot. If you sort the collection before you search within it, you can use a binary search. If you've never seen or used a binary search before, a reasonable description - ignoring the need to round positions up or down to the nearest whole number - is:
Assume n is the number of items in the collections d is n / 2 REPEAT: Is the value at position d equal to the desired value? If yes: The value has been found If no: divide d by 2 If the value at this position is less than the desired value: Move to the value d items BEFORE the current item else the value is greater than the desired value: Move to the value d items AFTER the current item UNTIL either the value if found, or d < 1. If that's the case, the item isn't in the collection.
Analyzing this algorithm, you can see that adding one or two items to the collection isn't going to make much difference to its performance. On the other hand, doubling the size of the collection will add one more step to the worst-case performance. Doubling again will add another step. You've probably noticed by now that this relationship – doubling one value adds one to another – is logarithmic. In this case, because the collection is being divided into two at each step, it's log to the base of 2, but this gets generalized to:
This is an impressive improvement over O(n); for example a collection of just over a million items will take, at worst, around 20 iterations to find a value within it. Due to the performance of this algorithm, it's a standard feature of C# (BinarySearch is a member of numerous collection classes) and C++ (std:binary_search is an STL method in the algorithm header). Perhaps oddly, Python doesn't provide a binary search as standard. It does provide the “in” method, which is just syntactical candy for the contains function. For lists, Python only gives a complexity of O(n).
Something that's worth considering in this case is the number of times you'll be searching in the array – if you're only going to do it once or twice, then the time it will take to sort the array may be a significant part of the overall time to carry out the whole sort and search operation. If the array is going to be searched more than a few times, the sort time becomes less significant overall and you probably want to sort it and use a binary search.
This is a slightly odd complexity, in that you usually only see it with sorts – heapsort, quicksort and treesort, for example. These are quite standard, and the performance of a specific sort is usually included in the documentation for the library that provides it.
It's also the complexity you get if you iterate through a collection, and use a binary search to find matching items in another collection. In this case the n comes from the iterations, and the log n from the binary search.
In the grand scheme of things, it's usually classed as “not bad”, but not fantastic. Doubling the amount of data processed by an algorithm with this complexity more than doubles the time it takes to run, but at least it's not as bad as some. If it's the best you can get, it's worth sticking with it.
This group of complexities is generally regarded as “not ideal”, but really they range from “bad” to “awful”. Let's start with one that's just “bad”.
Instead of simply searching for a value in a collection, imagine you're searching two collections to try to find values that are common to both. You might write something like this to start with:
def common_in_two(first, second): result =  for f in first: if f in second: result.append(f) return result
If you run it with 10 items in each collection, it runs through 100 times, which probably appears to be fast enough during test. If you have 1000 items in each though, there are 1,000,000 iterations, and you're almost certainly going to notice that. This algorithm has a complexity that is proportionate to the product of the number of items in each container, and that's represented by O(n2). This is also the complexity of a JOIN between two unindexed database tables, for good reason – you're doing the same thing but the tables are the collections. This is why they invented database indexes.
Adding another collection, or table, to the search above that we used to illustrate O(n2) behaviour increases the exponent by one, so:
def common_in_three(first, second, third): result =  for f in first: if f in second and f in third: result.append(f) return result
has a complexity of O(n3).
This level of performance is certainly the sort of thing you don't want infecting your production environment if you can possibly help it. Anything with this level of performance is going to give you problems.
In both of the examples here, searching for common items in 2 or 3 collections respectively, the algorithm is about as inefficient as it is possible to get, yet developers produce this sort of thing all the time. Then they wonder why their systems are slow. It's especially frustrating because these algorithms can be rewritten easily to use Python sets, or some other hash-based collection in other languages:
def common_in_three_fast(first, second, third): second_set = set(second) third_set = set(third) result =  for f in first: if f in second_set and f in third_set: result.append(f) return result
The code above is syntactically identical, yet because two of the collections are copied into sets, the overall complexity becomes:
O(n) x O(1) x O(1)
The first collection could be copied into a set as well, but as every item in it gets used, we wouldn't gain anything from doing that.
These complexities are really something you need to try to avoid; they're dreadful and if they're used anywhere that processes more than a few items of data, they'll be the part of the system that drags everything else down. Thankfully they don't crop up often, but when they do they make their presence felt. In the same way as a horrible disease...
I've only seen a couple of examples of this complexity, one is a way to solve the Towers of Hanoi problem through recursion. I'm sure it's the same solution our school Computer Studies teacher gave as an example of recursion way back when Pascal was the language we used!
The other algorithm that has this complexity is solving that interview favourite, the Fibonacci series. There are a few ways of solving this popular problem, but one of the ones that you might use if you weren't too experienced follows this logic:
The Fibonacci series is:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34,...
F0 = 0
F1 = 1
and thereafter, for n > 1:
Fn = Fn-1 + Fn-2
So, you calculate the Fibonacci series by adding together the previous two members of the series. Something like this, perhaps:
def dreadful_fibonacci(number): if number <= 0: return 0 elif number == 1: return 1 return dreadful_fibonacci(number - 2) + dreadful_fibonacci(number - 1)
Looking through the code, each call to the function makes two calls recursively to itself, until it reaches its limiting condition. So, the number of calls made for an argument of n is 2n, which means the complexity is exponential:
This can be generalized as:
The base of the complexity equation corresponds to the number of recursive calls made within the function involved in the algorithm.
So, if a method makes three recursive calls, its complexity would be:
Every time each method is called, the overall execution time is multiplied by the base. To see just how bad that is, the Fibonacci series calculation above takes over a million times longer (220) to calculate the 21st item in the series than it did for the first.
The Travelling Salesman Problem can be solved by brute force, by working out every permutation of the cities to be visited. The number of permutations of n items, P(n), is n! or n factorial:
n x (n – 1) x (n – 2) x ... x 4 x 3 x 2 x 1
It makes sense then, that the algorithm to list all of the permutations has a complexity of:
This is a truly dreadful level of performance – it's the sort of thing you should only use if you really have to. Generally you don't, and even with the Travelling Salesman Problem there are other algorithms to solve it. Let's back away from this one while we can...
It's easy to write code – as they say – but it's much harder to write good code. The simplest of choices, say using a list instead of a set or hashtable, can make the difference between code that runs well regardless of the amount of data it's processing, and code that runs well on your test systems but then collapses under the stress of the real world.
Scalability problems can occur anywhere in systems, whether on the front end or server-side, and there's no excuse for them. Saying “I've been a developer for X years and I've never needed to know about Big O Notation” isn't something to be proud of. It just means you've either been lucky or, more likely, the problems you caused were encountered and fixed by other developers.
If you can produce scalable code, your systems will run much more efficiently than they otherwise would, and you don't have to resort to throwing hardware, and therefore money, at the problem and hoping it works. Being able to write scalable code and improving system efficiency also looks good to your current employer and, when you put it on your resume, it looks good to prospective employers too. It's a win-win situation.