Source code for this post.

Note: 7/12/2013 - I edited this post to replace the original Int32 implementation with a faster uint version.

8/8/2013 - added ulong (UInt64) and BigInteger versions of the algorithm described here, in the next post.

I’m working on a SparseArray class that doesn’t use Dictionary under the covers, which allows much better space efficiency, with speeds comparable to Dictionary. I’ll add it to this blog sometime in the near future, but first I’m going to post some of the pieces that went into it. One of the things it does is allow any capacity to be used, but optimizes its behavior when the capacity is a power of 2, or is a prime number. To support using prime capacities, I’ve written a NearestPrimeCeiling method which in turn relies on an IsPrime method. In the past, when I needed a primality test over the range of positive ints, I’ve used one of these quickie methods that uses trial division.

public static bool IsPrimeTrial(this uint value) { if (value < 2) return false; if ((value & 1) == 0) return value == 2; if ((value % 3) == 0) return value == 3; // for smaller values (up to several million), it's faster // to use i*i in the loop termination condition rather than // precomputing the square root of the value. //if (value == uint.MaxValue) return true; for (uint i = 5, skip = 2; (i <= sqrtMax) && (i*i <= value); i+=skip, skip=6-skip) { if (value % i == 0) return false; } return true; } public static bool IsPrimeTrialSqrt(this uint value) { if ((value & 1) == 0) return value == 2; if (value == 1) return false; if ((value % 3) == 0) return value == 3; uint endVal = (uint) ((value > int.MaxValue) ? Math.Sqrt(value) : Math.Sqrt((int)value)); for (uint i = 5, skip = 2; i <= endVal; i+=skip, skip=6-skip) { if (value % i == 0) return false; } return true; }The two methods above both do Primality testing by trial division. A minor difference between the two is that one uses the square of the loop variable to determine the terminal condition, and the other pre-computes the terminal loop value using the time expensive Sqrt method. For values up to around 10 million, the i*i version is faster. Beyond that, the Sqrt version is faster. The difference is only significant at the extreme ends of the range of positive integer values.

Although I was aware of the Miller-Rabin primality test, I put off using it because the trial division methods were ‘good enough’, and I didn't like the fact that Miller-Rabin is a probabilistic test. Both those reasons were very misguided. Compared to Miller-Rabin, trial division is not all that good, and within the ranges of int, uint, long and ulong integers, it’s quite easy to slightly modify the Miller-Rabin algorithm so that it’s deterministic (guaranteed to correctly determine primality). I initially searched for existing C# implementations, but found problems in them such as not operating over the entire range of values because of overflow in the PowMod portion of the algorithm. So, I ended up writing one.

The Miller-Rabin Wikipedia entry adequately describes the algorithm. A key component of the algorithm is randomly choosing witness values. The more witness values you use, the less likely it will return a false positive for primality. However, there are known sets of witness values that are guaranteed to correctly determine primality over limited ranges of values. For example, there are 3-witness sets that cover the entire range of unsigned 32-bit integers, and 7-witness sets that cover unsigned 64-bit integers. By replacing the random selection of witness values with code that uses pre-tested witness values, it’s easy to make a deterministic version of the Miller-Rabin algorithm over these ranges of values. The speed of the algorithm is affected by the number of witness values that are used. So it’s best to use the smallest set of witness values that will give a definitive primality result for the value you’re testing. There is a great website devoted to discovering these witness sets, not just for 32 bit integers, but for 64 bit values as well. I found it very helpful.

The charts below show the speed comparison of trial division to Miller-Rabin. The first chart shows the times (in microseconds) to determine primality of the nearest prime number to the values displayed along the x-axis. This is where trial division is at its slowest, since it must exhaust the possible factors before it can declare the value as prime.

The second chart shows the time (in microseconds) to perform the primality test on 1000 consecutive values less than or equal to the values displayed along the x-axis. Since it takes much less time to determine composite values are not prime than it does to determine if a primes are prime, the average time for checking 1000 consecutive values is greatly affected by the prime values within that range.

As you can see, Miller-Rabin destroys trial division, outperforming it over almost the entire range of unsigned 32-bit integers. These benchmarks were done using a 32-bit build. When compiled for 64-bits, the difference becomes even greater. The trial division times remain about the same, but the Miller-Rabin times improve by around 33%, because there is some ulong operations in the ModPow function.

In both graphs you can see the small jog up in time where the Miller-Rabin algorithm has to switch from using 2 witnesses to 3 at around the 700,000,000 value. From that point to the max range of uint, the execution time is nearly linear, typically taking around 2.75 microseconds to determine that a prime number is prime, and averaging 230 nanoseconds per call to IsPrime when testing consecutive values that are a mix of composites and primes. I ran the tests on a not-new run of the mill computer (P8600 @ 2.4GHz).

The code below is my basic implementation of the Miller-Rabin algorithm. Also included is the ModPow function it needs, implemented in a way that's pretty fast, and safe from the danger of overflows.

private static readonly ulong[] witnesses1 = new ulong[] { 9345883071009581737 }; private const uint witnesses1Max = 341531; private static readonly ulong[] witnesses2 = new ulong[] { 15, 13393019396194701 }; private const uint witnesses2Max = 716169301; private static readonly ulong[] witnesses3 = new ulong[] { 2, 7, 61 }; /// <summary> /// Determines if the specified value is a prime number, using the /// Miller-Rabin algorithm. /// </summary> /// <remarks>This implementation is guaranteed to deterministically /// decide primality for all positive value inputs.</remarks> /// <param name="value">The value to be tested for primality.</param> /// <returns>True if the value is prime, otherwise, false.</returns> public static bool IsPrimeMR(this uint value) { // take care of the simple cases of small primes and the // common composites having those primes as factors if ((value & 1) == 0) return value == 2; if (value == 1) return false; if ((value % 3) == 0) return value == 3; if ((value % 5) == 0) return value == 5; if (((value % 7) == 0) || ((value % 11) == 0) || ((value % 13) == 0) || ((value % 17) == 0) || ((value % 19) == 0) || ((value % 23) == 0) || ((value % 29) == 0) || ((value % 31) == 0) || ((value % 37) == 0) || ((value % 41) == 0) || ((value % 43) == 0)) { return (value <= 43); } // perform Miller-Rabin with most efficient witness array for the value being tested if (value >= witnesses2Max) return InternalIsPrimeMR(value, witnesses3); else if (value >= witnesses1Max) return InternalIsPrimeMR(value, witnesses2); else return InternalIsPrimeMR(value, witnesses1); } /// <summary> /// Determines if the specified odd, >= 3 value is a prime number, using the /// Miller-Rabin algorithm. /// </summary> /// <param name="value">The int value to be tested for primality.</param> /// <param name="witnesses">The witnesses to be used.</param> /// <returns>True if the value is prime, otherwise, false.</returns> private static bool InternalIsPrimeMR(uint value, ulong[] witnesses) { // compute n − 1 as (2^s)·d (where d is odd) uint valLessOne = value - 1; uint d = valLessOne / 2; // we know that value is odd and valLessOne is even, so unroll 1st iter of loop uint s = 1; while((d & 1) == 0) { d /= 2; s++; } // test value against each witness for (uint i = 0; i < witnesses.Length; i++) { uint a; ulong aL = witnesses[i]; if (aL >= value) { aL %= value; if (aL < 2) continue; } a = (uint) aL; if (a == valLessOne) continue; uint x = InternalModPow(a, d, value); // overflow safe version of x = Math.Pow(a, d) % value if (x == 1) continue; for (uint r = 1; (x != valLessOne) && (r < s); r++) { x = (uint)((x * (ulong)x) % value); // overflow safe version of x = (x * x) % value if (x == 1) return false; } if (x != valLessOne) return false; } // witnesses confirm value is prime return true; } /// <summary> /// Compute the mod of a number raised to the specified power. /// </summary> /// <remarks>Overflow safe for all input values. </remarks> /// <param name="b">The base number.</param> /// <param name="e">The exponent applied to the base number.</param> /// <param name="m">The modulo value.</param> /// <returns>The mod of the base number raised to the specified power.</returns> public static uint ModPow(uint b, uint e, uint m) { if (b == 0 || m == 0) return 0; return InternalModPow(b, e, m); } private static readonly uint sqrtMax = (uint) Math.Sqrt(uint.MaxValue); private static uint InternalModPow(uint b, uint e, uint m) { switch (e) { case 0: return 1; case 1: return b % m; case 2: return (b <= sqrtMax) ? (b * b) % m : (uint)((b * (ulong)b) % m); default: uint result = 1; while (true) { if ((e & 1) != 0) { result = (uint)((result * (ulong)b) % m); if (e == 1) return result; } e /= 2; b = (uint)((b * (ulong)b) % m); } } }

The final IsPrime method selects the fastest underlying algorithm based on the value being tested.

/// <summary> /// Determines if the specified value is a prime number. /// </summary> /// <remarks>This method chooses the best algorithm to use based on /// the magnitude of the specified value. For smaller values, a /// simple trial division algorithm is used. For larger values, a /// deterministic version of the Miller-Rabin algorithm is used.</remarks> /// <param name="value">The int value to be tested for primality.</param> /// <returns>True if the value is prime, otherwise, false.</returns> public static bool IsPrime(this uint value) { if ((value & 1) == 0) return value == 2; if (value < 1000000) { return value.IsPrimeTrial(); } else { return value.IsPrimeMR(); } }I did a similar implementation that works with Int32 values, but the uint version was faster, and operates over a larger range. So I scrapped the Int32 version and implemented a little int IsPrime method that just uses the uint version.

public static bool IsPrime(this int value) { if (value <= 1) return false; return ((uint) value).IsPrime(); }

Nice idea alternating 2 and 4 in the naive prime testing. I extended the idea to the primes 2, 3, 5 and 7. I should be about 1/3 faster.

ReplyDeletehttp://pastebin.com/NRvepd5S

I generated the loop body by a simple program. It could be extended to 11 or even 13, but I didn't bother to test if it's worth it.

By the way I'm not a fan of using &1 instead of %2. It shouldn't increase the effectiveness at all (it would take an especially dump optimizer to use division instead of &1). I see you using /2 instead of >>1 (the optimizer should take care of that too).

Your are correct that mod (%), multiplication and division of literals that are powers of 2 are optimized in release builds, so there is no advantage to using the bit fiddling equivalents. I know better than to bother, but old habits die hard. I agree that it's clearer to just use the normal operators.

ReplyDelete