Iteration is the process of repeatedly executing the same piece of code, usually on different data. This data can be taken from a collection, file, stream or any other source. A source that allows this is called “iterable”, and the object that allows code to iterate across the collection or source is called an “iterator”.
It all seems straightforward so far; let’s take a closer look.
Starting from the basics, this is the way I was originally taught to implement iteration:
var
total: int
begin
total := 0;
array c = // create or assign an array here
for i := 0 to len(c) – 1 do
begin
total := total + c[i]
end;
end.
This was a long time ago, in a country far, far away (no, really!), but the basic principle is the same. The code above (it’s Pascal by the way!) is working through the collection c, from the first to last item, and is accessing the items by index. It then runs some code on that item – in this case it just adds the value to a total. This works, but it’s not particularly efficient if c is a linked list, which it might well be. If that’s the case, then every access to an item is O(n) time. In some implementations every call to the len
function, returning the length of the collection, is also O(n), and happens every time we run around the loop. It’s important to note that, even though there is iteration happening here, there isn’t actually an iterator.
An iterator provides a more abstract way of carrying out iteration. From a software engineering point of view, a little more abstraction is usually a good thing, and can actually make things easier to understand. Iterators are usually coupled fairly tightly with the collection they iterate across. This allows them to be more efficient than they otherwise would be, because they often have access to the internal structure of the collection. For example, an array iterator and linked list iterator might present the same interface, but would each have specialized implementations specific to the collections. Because of this increased coupling, iterators are usually created by the collection object itself, and then used by external code. Sometimes they’re defined in the collection object too. A typical implementation looks like this:
For example, in C++ an iterator across a vector of integers looks something like this:
vector<int> collection = ... // Assign collection here
std::vector<int>::const_iterator iter;
int total = 0;
for (iter = collection.begin(); iter != collection.end(); ++iter)
{
// Do something
total += *iter;
}
From the code, you can see that the iterator is a separate class, albeit defined within the collection class, and instances are returned from the collection’s own begin() and end() methods. The vector class is a standard C++ collection, but if you create your own collection class you should create your own iterator based on the existing classes at: https://www.cplusplus.com/reference/iterator/
It’s worth mentioning at this point that C++ is more advanced than many languages, in that it allows you to create iterators that run in reverse as well as forwards. This is very useful, and it’s unfortunate that many newer languages aren’t that flexible by default.
Using C# is slightly different, in that you need to implement the IEnumerable<T>
interface, which specifies a function GetEnumerator()
returning an enumerator implementing IEnumerator<T>
. For more details refer to the MSDN at https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ienumerator-1?view=net-5.0.
Python iterators are slightly different again: there’s a built-in iter()
function that returns an iterator for any object or collection. If that’s not quite what you need, you can write a custom iterator class, or add iteration functions to a collection class. This example shows a basic Python iterable class which implements its own iterator via the __next__()
function, and raises the StopIteration
exception for every call to __next__()
once the iteration is complete.
class Squares:
def __init__(self, max_count):
"""
Create an instance of the class returning the squares of
the first max_count natural numbers.
:param max_count: The number of squares to return.
"""
self.max_count = max_count
self.value = 0
def __iter__(self):
"""
Return an iterator for this object. This object represents an
iterable collection, but is also an iterator because it
implements the __next__() method.
:return: An iterator for this object.
"""
return self
def __next__(self):
"""
Calculate and return the next value in the series.
:return: The next value, or raise a StopIteration exception
if there are no further calls.
"""
if self.value < self.max_count:
self.value += 1
return self.value * self.value
else:
raise StopIteration
for s in Squares(10):
print(f'{s}')
The example above uses the same class for both iterable object and iterator. This is fine for a simple class that doesn’t do anything too complex, but what about if we want to do something slightly more complicated, like maintain two separate iterators on one collection? If you want multiple iterators, they’re going to have to be separate objects, so you’ll need to have separate classes for the object and the iterators. The __iter__()
function will then return a different iterator object every time it is called.
class Cubes:
def __init__(self, max_count):
"""
Create an instance of the class returning the cubes of
the first max_count natural numbers.
:param max_count: The number of cubes to return.
"""
self.max_count = max_count
self.value = 0
class CubesIterator:
"""
An iterator on a Cubes collection. Separate from the Cubes
class so that we can have more than one iterator simultaneously.
All that this class needs to know is the parent collection, and
the iterator's "position" within the collection. The collection
will handle everything else.
"""
def __init__(self, collection):
self.collection = collection
self.current_position = 0
def __next__(self):
"""
Return the next value from the parent collection.
:return: The next value, or raise a StopIteration exception
if there are no further calls.
"""
self.current_position += 1
return self.collection.item(self.current_position)
def __iter__(self):
"""
Return an iterator for this object. This object represents an
iterable collection, but is also an iterator because it
implements the __next__() method.
So that we can use separate iterators on this collection, each
call to this method returns a new iterator.
:return: An iterator for this object.
"""
return self.CubesIterator(self)
def item(self, position):
"""
Returns the item in the collection representing value.
:param position: The 'position' whose value we want.
:return: The value at the given 'position'
"""
if position <= self.max_count:
return position * position * position
else:
raise StopIteration
for s in Cubes(5):
for t in Cubes(5):
print(f'Cube {s} {t}')
With this code, the dependencies are minimal in that the iterator objects only know about the parent collection and the iterator’s current position within it. The iterators contain no details of the limits or working of the iterable object, and nor should they – that’s the iterable object’s responsibility.
Things get even more interesting when we’re trying to iterate over a slowly-responding object. Normally you’d use asynchronous functions, and obviously the __iter__()
and __next__()
functions are synchronous. To handle this, Python versions after 3.5 use the __aiter__()
and __anext__()
functions. __aiter__()
is a synchronous method returning an asynchronous iterator, and __anext__()
is an asynchronous method returning an awaitable object returning the next value in the collection, or raising a StopAsyncIteration
exception if there are no members left in the collection. This is an example of how they can be used:
class HardMathCollection:
def __init__(self, max_count):
"""
Create an instance of the class returning the result of the
hard calculation on the first max_count natural numbers.
:param max_count: The number of squares to return.
"""
self.max_count = max_count
self.value = 0
def __aiter__(self):
"""
Return an asynchronous iterator for this object. This object
represents an asynchronous iterable collection, but is also
an asynchronous iterator because it implements the __anext__() method.
:return: An asynchronous iterator for this object.
"""
return self
async def do_time_consuming_thing(self, value):
"""
Represents a thing that's so difficult, we've had to make it
asynchronous so we can get on with something else while we wait.
:param value:
:return:
"""
pass
async def __anext__(self):
"""
Calculate and return the next value in the series.
:return: The next value, or raise a StopIteration exception
if there are no further calls.
"""
if self.value < self.max_count:
self.value += 1
return self.do_time_consuming_thing(self.value)
else:
raise StopAsyncIteration
Restrictions
Iterators have limitations though. The data source they’re iterating over should be immutable, even though the objects within it don’t always have to be. For example, once you create an iterator on a collection, adding or removing objects from that collection can, but doesn’t always, cause problems. It really depends on the way the collection and iterator are written; you may be lucky and everything might work correctly, or your code could start raising exceptions and failing in unexpected way. The uncertainty is something you definitely want to avoid.
Having said that, you can often change the objects held within the collection. If you have a collection of objects with, say, an id attribute, you can change this without affecting the iterator. Unless, of course, the language you’re using creates a new copy of the object or changes the order of the collection as a result of this.
This restriction is even more evident when you start using iterators in different threads from the collection, and even each other. In this situation it’s very easy to get into a mess very quickly. If you’re using multiple threads, you need to write thread-safe collections unless they’re already provided or available to you.
Summary
Iterators are an abstraction that hides the specialized code involved in working through a set of data, and running the same code on each item. Like most good abstractions they make the resultant code simpler and more obvious: when you're using an iterator you're not using the iterable data source at all. That means you have no idea of the type of the source the data is coming from - if you're using an array or a set to hold the data, the iterator will return it in exactly the same way. This is a good thing; the iterator takes care of all of the differences, and you only have to concern yourself with the data it returns.
Iterators can also abstract the collection a little more than just running through it. You could implement an iterator that returned the items in a collection sorted in a specific order. Maybe an iterable object whose iterator(s) return the prime factors of the number you initialize it with. Things are improved even more when you realize that, if you do the calculations in the __iter__()
method, iterators allow you to do lazy evaluation with all of the time and memory saving that gives you. What you do with all of this flexibility is really up to you.