Copyright 2017 Jason Ross, All Rights Reserved

User Rating: 0 / 5

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive
 

We've already seen a simple method for checking whether a particular number is prime. At the moment it's not particularly efficient, but we can improve it later. Now let's look at a way to generate prime numbers.

We'll start with a very old, but reliable method, called the "Sieve of Eratosthenes", or "Eratosthenes Sieve".

Assume that you want to discover all of the prime numbers up to 100. Firstly make a list of all of the natural numbers from 2 up to and including 100; it's probably easier to do this in a grid:

- 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100

The first number is 2, which is prime. Work through the numbers and cross out all of the multiples of 2.

- 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100

The next number which is not crossed out is 3. This is therefore prime too. Cross out all multiples of 3:

- 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100

4 is crossed out, as it's a multiple of 2. 5 is the next number which hasn't been crossed out, so it isn't prime. Cross out all of its multiples.

- 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100

Continue in this way until you run out of numbers to cross out. The numbers which are not crossed out are prime, those which are crossed out are not.

- 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100

One way to implement this algorithm would be a literal translation in code. We could create an array of numbers, and then flag them as either prime or not. Then we drop the non-prime numbers, and return the primes. This is a reasonable way to work for small ranges of numbers, but as the range increases in size, the amount of memory used increases linearly.

Also, the algorithm visits each value multiple times - once for each of its factors - ideally once a number has been flagged as non-prime, we wouldn't need to bother with it again. Generally the fewer times each number is visited, the more efficient the algorithm is.

Let's start by looking at the problem in a slightly different way. Instead of allocating a potentially huge array and then running through marking off prime multiples, why not just run through the numbers and check each of them against a list of prime factors? We still have to compare each number to each of the prime factors less than its square root*, but once we've found a factor, we can stop the comparisons because we know the numbers isn't prime.

The code below does just that. If there are any prime numbers which are less than or equal to the maximum value supplied, the function performs the following:

Start with 2, and add it to the cache of prime numbers

For each of the following numbers, check to see whether the cache contains a factor of the number. If it contains a factor, the number cannot be prime so move on to the next one. If the cache does not contain a factor, the current number is prime so add it to the cache and return it to the caller.

namespace SoftwarePragmatism
{
  public class EratosthenesGenerator
  {
    public EratosthenesGenerator ()
    {
    }

    #region IPrimeGenerator[System.Int64] implementation

    public IEnumerable<long> Primes (long maximumValue)
    {
      if (maximumValue >= 2L)
      {
        List<long> primes = new List<long> ();

        primes.Add (2L);
        yield return 2L;

        bool isPrime = true;

        for (long currentValue = 3L; currentValue <= maximumValue; currentValue += 2L)
        {
         isPrime = true;

        for (int i = 0; (i < primes.Count) && (primes[i] * primes[i] <= currentValue) && isPrime; ++i)
         {
           if (currentValue % primes[i] == 0)
            {
             isPrime = false;
           }
         }

         if (isPrime)
         {
           primes.Add(currentValue);
           yield return currentValue;
         }
       }
     }
   }

   #endregion
 }
}

Although this method works, it's not ideal - there are some obvious inefficiencies:

  • If you use it to generate one series of prime numbers, and then want to generate another series, it recalculates everything because there's no caching built in.
  • For each potential prime number, every number in the 'primes' collection is tested as a divisor. The 'primes' collection contains the number 2, but because 'currentValue' starts at 3 and is incremented by 2 for each iteration, it can never be even. So, removing 2 from the 'primes' collection removes an unnecessary test for every 'currentValue'.

We also can't be sure just how efficient this method is - we don't know for sure until we've profiled it in some way. This is a separate topic, but I'll cover that later.

There are other ways of generating prime numbers as well as the Sieve of Eratosthenes, and it may be beneficial to use them in some circumstances. So, it's probably worth deciding on an interface which can be used in place of the concrete class that I've implemented here.

Something like the following will do for the moment - we can always change it later:

namespace SoftwarePragmatism
{
 public interface IPrimeGenerator
 {
   IEnumerable<long> Primes (long maximumValue);
 }
}

Then we change the declaration of EratosthenesGenerator to:

namespace SoftwarePragmatism
{
  public class EratosthenesGenerator
  {
   // Other code definitions remain the same.

If we're abstracting the implementation of the prime number generator, we should avoid having to know too much about it at all. We could keep the following way of creating generators:

IPrimeGenerator generator = new EratosthenesGenerator();

But then we need to change every "new" statement whenever we decide to use a different type of generator. If we implement a Factory pattern, where we use one class to generate another, we could use the following:

IPrimeGenerator generator = PrimeGeneratorFactory.CreateGenerator();

This allows us to change the default generator by changing the code inside the CreateGenerator method, and we can overload CreateGenerator to accept a parameter to specify the type of generator to create.

The first version of the factory is:

namespace SoftwarePragmatism
{
  public class PrimeGeneratorFactory
  {
    public PrimeGeneratorFactory()
    {
    }

    /// <summary>
    /// Create a new prime number generator. This is the default method and
    /// returns an <see cref="EratosthenesGenerator"/>.
    /// </summary>
    /// <returns>
    /// A <see cref="IPrimeGenerator"/>
    /// </returns>
    public static IPrimeGenerator CreateGenerator()
    {
      return new EratosthenesGenerator();
    }
  }
}

As usual, this just provides us with something to get us going. We will almost certainly change it later.