GenPrime

From Ferrous Moon Research

Jump to: navigation, search
This article was written by Tycho.
This document is classified ICARS::0, and is for public inspection.

GenPrime is a simple prime number generation algorithm which can be used as a great case-in-point learning experience for how to optimize. This page will largely serve as a sort of 'how to' on optimization. I will be taking the program from the most naïve code to the most optimized.

This article has been criticized because it doesn't show optimization at work, but I steadfastly believe this is not so. Finding the right algorithm is a very very large part in optimization. While there are small tricks, such as eliminating recursive algorithms to get 10-200% performance increases, finding the right algorithm will actually give you the largest speed gains. In the case of GenPrime, finding the correct algorithm gave us over a 9700% speed increase. No joke.

People have a tendency to try to highly optimize the wrong algorithm, which really won't give you much speed improvement. For example, in GenPrime Test 1 (below), you could probably optimize the 75,000+ seconds to be 74,000+. Whoop dee doo. Minor optimizations to an algorithm which takes the wrong approach won't help much. If you are truly focusing on optimizing the function, look for errors in logic. I show this in the optimization sections below, and end up getting a massive speed increase. We go from finding one million primes in 75,683 seconds to finding one million primes in 7.78 seconds. If you choose to argue that this isn't an example of optimization, you are sorely mistaken.

Contents

Properties of Primality

Before we go through the process of writing a prime number generation algorithm, I want to explain the properties of prime numbers which we will use as constraints in our program.

A prime number (or a prime) is a natural number that has exactly two (distinct) natural number divisors. Any natural number greater than 1 has the two divisors 1 and itself, so a prime number has no other divisors.

So by this rule, the first ten primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, and 29.

GenPrime Test 1

This program is far from finished. I will explain both the isPrime() and genPrime() functions only in this first section, just so you understand how it works. Here's the code we're starting with.

isPrime()

 bool isPrime ( int _number )
 {
   if ( _number == 2 ) return true;
   for ( int div = 2; div < _number; div++ )
   {
     if ( _number % div == 0 )
       return false;
   }
   return true;
 }

The isPrime() function takes one input (_number) and tests whether or not the number has any divisors. If it doesn't, it returns 'true', indicating that the number tested is prime. Otherwise, it returns false. It must also return 'true' for primes which are difficult to test for with an algorithm. In this case, we check if _number is 2. If so, it returns true, and if not, it continues with the lengthy primality test.

genPrime()

 int genPrime ( int _maxToFind )
 {
   int count = 0;
   for ( int num = 2; count < _maxToFind; num++ )
   {
     if ( isPrime ( num ) )
     {
       count++;
       continue;
     }
   }
   return count;
 }

The genPrime() function basically loops until it finds '_maxToFind' primes. It may seem odd that we return 'count', since the value of 'count' should always be equal to the value of '_maxToFind' at the end of execution. This isn't a mistake, though, because we have to have some way to make the compiler fail to optimize the loop. If we're timing how long genPrime() takes to find a given number of primes, it will often fail to record the timing if the return value isn't going to change based on the result of a loop. Another way to force it to fail in optimizing genPrime() is to add '__asm nop;' in there, which causes the processor to execute a no-operation instruction. Doing so is a bad practice though, because it slows the program down and gives a falsely high timing result. So in this case, the return value will be 'count' only for the sake of making the compiler believe that the result of the function will be determined by the loop contained inside it.

Timings

0.000270 seconds for 100 primes
0.001104 seconds for 200 primes
0.004537 seconds for 400 primes
0.019099 seconds for 800 primes
0.081883 seconds for 1600 primes
0.353767 seconds for 3200 primes
1.524225 seconds for 6400 primes
8.542095 seconds for 12800 primes
34.141138 seconds for 25600 primes
134.060824 seconds for 51200 primes

Optimization

The timings we have right now are quite pathetic, as you'll see later. It's getting exponentially worse with each test. Using power regression, I have determined that in order to get one million primes, this version would take 75,683 seconds (21.02 hours).

Where is our bottleneck though? Well, let's first look at our isPrime() function and revise it for Test 2. First, it's logically true that if _number is composite then it can be factored into two values, at least one of which is less than or equal to half the size of _number. Doing this alone gives us a function that is 2x faster. Second, we can eliminate half of the remaining checks by only testing one even divisor, since any even number is divisible by 2. So, we try dividing by 2 first, and then test all the odd number divisor candidates. So in theory, we've sped this function to be 4x the speed it was before. Let's test our theory.

GenPrime Test 2

isPrime()

 bool isPrime ( int _number )
 {
   if ( _number == 2 ) return true;
   if ( _number % 2 == 0 ) return false;
   for ( int div = 3; div < _number / 2; div += 2 )
   {
     if ( _number % div == 0 )
       return false;
   }
   return true;
 }

Timings

0.000081 seconds for 100 primes
0.000330 seconds for 200 primes
0.001266 seconds for 400 primes
0.005058 seconds for 800 primes
0.020976 seconds for 1600 primes
0.088921 seconds for 3200 primes
0.380388 seconds for 6400 primes
2.390833 seconds for 12800 primes
9.758408 seconds for 25600 primes
36.595803 seconds for 51200 primes

Optimization

Using power regression, I have determined that in order to get one million primes, this version would take 19,255 seconds (5.35 hours). This is significantly better.

Fantastic. We've managed to speed this up quite a bit. We can make this go even faster yet, though. Let's look at optimizing isPrime() even further. isPrime() is still actually testing too many potential divisors. We made a bad assumption in v2.0, because we assumed that we had to test all the way up to half of the number. But in actuality, we only need to test up to the square root of the number. This should make a magnificent difference in Test 3. Furthermore, the genPrime() function is testing far too many numbers, because we've included even numbers as well. And of course, no even number aside from 2 is prime because even numbers are all divisible by two.

GenPrime Test 3

isPrime()

 bool isPrime ( int _number )
 {
   if ( _number == 2 ) return true;
   if ( _number % 2 == 0 ) return false;
   int limit = (int)sqrt ( (double)_number ) + 1;
   for ( int div = 3; div < limit; div += 2 )
   {
     if ( _number % div == 0 )
       return false;
   }
   return true;
 }

genPrime()

 int genPrime ( int _maxToFind )
 {
   int count = 1;
   for ( int num = 3; count < _maxToFind; num += 2 )
   {
     if ( isPrime ( num ) )
     {
       count++;
       continue;
     }
   }
   return count;
 }

Timings

0.000034 seconds for 100 primes
0.000080 seconds for 200 primes
0.000219 seconds for 400 primes
0.000570 seconds for 800 primes
0.001549 seconds for 1600 primes
0.004190 seconds for 3200 primes
0.011421 seconds for 6400 primes
0.037240 seconds for 12800 primes
0.105892 seconds for 25600 primes
0.290089 seconds for 51200 primes
0.790386 seconds for 102400 primes
2.164165 seconds for 204800 primes
5.977096 seconds for 409600 primes

Optimization

Now this is incredibly fast. It can generate one million primes in 22.15 seconds, and I didn't even have to predict that statistically, because it was practical to actually test it.

We've basically juiced the isPrime() and genPrime() functions for as much performance as possible. So now we have the correct algorithm, but we can gain further performance improvement by adding code to pre-generate around 2500 primes and then iterate through the list for the initial primality test. Doing so allows you to generate one million primes in 7.78 seconds.

Personal tools