A Quick Tutorial on Pollard's Rho AlgorithmPollard's Rho Algorithm is a very interesting and quite accessible algorithm for factoring numbers. It is not the fastest algorithm by far but in practice it outperforms trial division by many orders of magnitude. It is based on very simple ideas that can be used in other contexts as well. HistoryThe original reference for this algorithm is a paper by John M. Pollard (who has many contributions to computational number theory). Pollard, J. M. (1975), “A Monte Carlo method for factorization”, BIT Numerical Mathematics 15 (3): 331–334 Improvements were suggested by Richard Brent in a follow up paper that appeared in 1980 Brent, Richard P. (1980), “An Improved Monte Carlo Factorization Algorithm”, BIT 20: 176–184, doi:10.1007/BF01933190 A good reference to this algorithm is by Cormen, Leiserson and Rivest in their book. They discuss integer factorization and Pollard's rho algorithm. Problem SetupLet us assume that We saw the trial division algorithm Naive Trial Division Algorithm
int main (int argc, char * const argv []) { int N,i; // Read in the number N scanf("%d", &N); printf ("You entered N = %d \n", N); if (N %2 == 0) { puts ("2 is a factor"); exit(1); } for (i = 3; i < N; i+= 2){ if (N % i == 0) { printf ("%d is a factor \n", i); exit(1); } } printf("No factor found. \n"); return 1; } Let us try an even more atrocious version that uses random numbers. Note that the code below is not perfect but it will do. I am feeling very lucky today Algorithm
int main (int argc, char * const argv []) { int N,i; // Read in the number N scanf("%d", &N); printf ("You entered N = %d \n", N); i = 2 + rand(); // Gets a number from 0 to RAND_MAX if (N % i == 0) { printf(" I found %d \n", i); exit(1); } printf ("go fishing!\n"); } The I am feeling lucky algorithm (no offense to google) generates a random number between Very simple: we have precisely two factors to find Put another way, we have to repeatedly run the I am feeling lucky algorithm approximately Improving the Odds with Birthday TrickThere is a simple trick to improve the odds and it is a very useful one too. It is called the Birthday Trick. Let us illustrate this trick. Suppose we randomly pick a number uniformly at random from But suppose we modify the problem a little: we pick two random numbers Rather than insist that we pick one number and it be exactly What if we pick Computing Birthday Paradox Probability
#include <stdio.h> #include <stdlib.h> int main(int argc, int * argv){ int i,j,k,success; int nTrials = 100000, nSucc = 0,l; int * p; /* Read in the number k */ printf ("Enter k:"); scanf ("%d", &k); printf ("\n You entered k = %d \n", k); if ( k < 2){ printf (" select a k >= 2 please. \n"); return 1; } /* Allocate memory */ p = (int *) malloc(k * sizeof(int)); // nTrials = number of times to repeat the experiment. for (j = 0; j < nTrials; ++j){ success = 0; // Each experiment will generate k random variables // and check whether the difference between // any two of the generated variables is exactly 42. // The loop below folds both in. for (i = 0; i < k; ++i){ // Generate the random numbers between 1 and 1000 p[i] = 1+ (int) ( 1000.0 * (double) rand() / (double) RAND_MAX ); // Check whether a difference of 42 has been achieved already for (l = 0; l < i; ++l) if (p[l] - p[i] == 42 || p[i] - p[l] == 42 ){ success = 1; // Success: we found a diff of 42 break; } } if (success == 1){ // We track the number of successes so far. nSucc ++; } } // Probability is simply estimated by number of success/ number of trials. printf ("Est. probability is %f \n", (double) nSucc/ (double) nTrials); // Do not forget to cleanup the memory. free(p); return 1; } You can run the code for various values of
This shows a curious property. Around Suppose we pick a person at random and ask what is the probability that their birthday is the April 1st.
Well, the answer is We number the days in the year from Let us take Exploring the birthday paradox
#include <stdio.h> #include <stdlib.h> int main(int argc, int * argv){ int i,j,k,success; int nTrials = 100000, nSucc = 0,l; int * p; printf ("Enter k:"); scanf ("%d", &k); printf ("\n You entered k = %d \n", k); p = (int *) malloc(k * sizeof(int)); // We will do 1000 reps for (j = 0; j < nTrials; ++j){ success = 0; for (i = 0; i < k; ++i){ // Generate the random numbers between 1 and 365 p[i] = 1+ (int) ( 365 * (double) rand() / (double) RAND_MAX ); // Check whether a difference of 42 has been achieved for (l = 0; l < i; ++l) if (p[l] - p[i] == 0 ){ success = 1; break; } } if (success == 1){ nSucc ++; } } printf ("Est. probability is %f \n", (double) nSucc/ (double) nTrials); return 1; } We can see that with If we have Imagine a version of star trek where the enterprise docks on a strange new planet and they are unable to find out how long a year is. Captain Kirk and Officer Spock land on the planet and walk over to the birth records. They toss coins to pick people at random and look at how many people give them even odds of birthday collision. They can back out the revolution period of the planet divided by its rotational period (i.e, number of days in the year). This is all well and good, you say. How does this help us at all? Applying Birthday Paradox to FactoringLet us go back to the I am feeling lucky algorithm. We are given We can ask a different question. Instead of picking just one number, we can pick The difference between the former scheme and latter is exactly the difference between picking one person and asking if their birthday falls on April the 1st (a specific day) or picking But unfortunately, this does not save us any effort. With We can do something even better. We can pick numbers If we ask how many numbers divide If we ask how many numbers have Precisely, we have So a simple scheme is as follows:
But there is already a problem, we need to pick a number To get to Pollard's rho algorithm, we want to do things so that we just have two numbers in memory. Pollard's Rho AlgorithmTherefore, Pollard's rho algorithm works like this. We would like to generate We use a function We start with We can start off with a naive algorithm and start to fix the problems as we go. Pollard's Rho Algorithm Scheme
x := 2; while (.. exit criterion .. ) y := f(x); p := GCD( | y - x | , N); if ( p > 1) return "Found factor: p"; x := y; end return "Failed. :-(" Let us take In the table below GCD refers to
You can see that in many cases this works. But in some cases, it goes into an infinite loop because, the function For example, we can make up a pseudo random function In this case, we will keep cycling and never find a factor. How do we detect that the cycling has happened? Solution #1 is to keep all the numbers seen so far Solution #2 is a clever algorithm by Floyd. To illustrate Floyd's algorithm, suppose we are running on a long circular race track, how do we know we have completed one cycle? We could use solution #1 and remember everything we have seen so far. But a cleverer solution is to have two runners A and B with B running twice as fast as A.
They start off at the same position and when Pollard's Rho Algorithm Scheme
a = 2; b = 2; while ( b != a ){ // a runs once a = f(a); // b runs twice as fast. b = f(f(b)); p = GCD( | b - a | , N); if ( p > 1) return "Found factor: p"; } return "Failed. :-(" If the algorithm fails, we simply find a new function Now we have derived the full Pollard's rho algorithm with Floyd's cycle detection scheme. Hope you were able to follow this presentation. You may find the wikipedia presentation on cycle detection quite useful. It also explains the use of Bernt's cycle finding algorithm in this context. |