Skip to main content

Predictability in a PRNG

Aim

This post details a small experiment I did to see the difference between a Pseudorandom Number Generator (PRNG) and a Cryptographically Secure Pseudorandom Number Generator (CSPRNG)

Introduction

A PRNG produces numbers from a given set making sure the probability distribution of the generated numbers is uniform. A CSPRNG on the other hand, also ensures that the output is unpredictable even when all the previous outputs are known. Cryptographic applications typically require entities like keys, IVs to be unpredictable and that is exactly what a PRNG doesn't guarantee.

To see this in action, I wanted to pick a PRNG and see how it is predictable. A famous PRNG present in many languages' standard library is Mersenne Twister.

Introduction to Mersenne Twister

Mersenne Twister - described in this paper by Makoto Matsumoto et. al. is the basis of Python's random module. It maintains a state \(x\) of 624 32-bit integers. For \(i >= 624\), \(x_i\) is calculated using the following generator equation -

\begin{equation*} x_i = x_{i - 227} \oplus (x_{i - 624}^u|x_{i - 623}^l)A \end{equation*}

where \(x_i^u\) and \(x_i^l\) denote the first bit and last 31 bits of \(x_i\) respectively. \(A\) is a \(32 \times 32\) matrix of \(0\) or \(1\).

The values of \(x_i\) for \(0 <= i < 624\) are populated using a seed. Once that's done, whenever a random number is requested, the next element \(x_i\) is calculated using the above equation and the following operations are performed

\begin{equation*} y_i = x_i \oplus (x_i >> \mathtt{11}) \end{equation*}
\begin{equation*} y_i = y_i \oplus ((y_i << \mathtt{7}) \hspace{3mm} \mathtt{\&} \hspace{3mm} \mathtt{0x9d2c5680}) \end{equation*}
\begin{equation*} y_i = y_i \oplus ((y_i << \mathtt{15}) \hspace{3mm} \mathtt{\&} \hspace{3mm} \mathtt{0xefc60000}) \end{equation*}
\begin{equation*} y_i = y_i \oplus (y_i >> \mathtt{11}) \end{equation*}

and the result \(y_i\) is the output random number.

Predicting MT's output

To predict the next random number, we need to know the state consisting 624 32-bit integers. Since \(x_i\) s are the ones present in the state, we've to use the \(y_i\) to get to \(x_i\) by inverting the above 4 equations. In the following code, invert1 inverts 1st and 4th equations, invert2 inverts 2nd and 3rd equations.

 def invert1(y, shift):
     """Given y = x ^ (x >> shift), return x. y is a seq of bits."""
     x = y[:]
     for index in range(shift, len(y)):
         x[index] ^= x[index - shift]
     return x


 def invert2(y, shift, b):
     """Given y = x ^ ((x << shift) & b), return x. y and b are sequences of bits."""
     x = y[:]
     for i in range(shift + 1, len(y) + 1):
         x[-i] ^= x[shift - i] if b[-i] else 0
     return x

The following code uses the above functions to invert \(y_i\) to \(x_i\)

 def detemper(z):
     """Invert the 4 operations. z is an int. returns an int."""

     # Convert integer to list of bits
     z = list(map(int, list("{:032b}".format(z))))

     b = list(map(int, bin(0x9D2C5680)[2:]))
     c = list(map(int, bin(0xEFC60000)[2:]))

     z = invert1(z, 18)
     z = invert2(z, 15, c)
     z = invert2(z, 7, b)
     z = invert1(z, 11)

     # Convert the list of bits to integer
     return int("".join(list(map(str, z))), 2)

Calling detemper on the last 624 random numbers lets us recreate the state. Now, all we have to do is use the generator equation to calculate the next random number based on this state. It turns out, in Python, we don't need to write code to explicitly calculate the next random number based on the state.

MT in Python's random

Python's random module is a wrapper over the Mersenne Twister generator. The function genrand_int32 in _randommodule.c provides the same random number generator as in section Introduction to Mersenne Twister. The functions like random.randint, random.random build upon the random numbers provided by genrand_int32.

Instead of computing new random numbers on every call, genrand_int32 generates the next 624 random numbers at a time and discards the old state. It then stores an index into the state pointing to the first unused value.

Calling Random.getstate returns genrand_int32 's internal state. Its format is as follows:

(VERSION, (....state of 624 integers...., INDEX), None)

VERSION is the version of the state. It's usually 3 in newer Python versions. INDEX is the above mentioned index. Usually \(0 <= \mathtt{INDEX} <= 623\). A value of 624 means that the entire state is consumed. When that happens, the next call to getrand_int32 regenerates the entire state.

Using this information, we can find some interesting things -

# Create a new PRNG
r = random.Random()
# Find the current value of INDEX
old_index = r.getstate()[1][-1]
# Call one of the module's methods
n = r.random()
new_index = r.getstate()[1][-1]

# the random() method used 2 32-bit integers
assert (new_index - old_index) % 624 == 2

We can see that random method consumes 2 32-bit integers to generate a floating point random number.

Using setstate to predict the random number

We can use Random.setstate method to set the state of the PRNG. If we set the state to (3, (....624 integers...., 624), None, since INDEX == 624, next invocation to genrand_int32 will use the generator equation to compute the next random number. Following is the code -

 def predict_next(old_values):
     """Given 624 past outputs of the PRNG, return the next output"""
     assert len(old_values) == 624

     inverted_values = list(map(detemper, old_values))

     # Instead of using the generator equation, we can
     # make use of setstate function on a new PRNG
     ro = random.Random()
     ro.setstate((3, tuple(inverted_values + [624]), None))
     # Since INDEX is set to 624, ro will use the state to calculate
     # the next random number
     return ro.getrandbits(32)

Entire code is at https://github.com/nitishch/reverse-mt/blob/master/reverse-mt.py

Summary

Using the predict_next function, given past 624 random numbers generated by Python's Mersenne Twister PRNG, we can find out the next random number it would generate. Because of this predictability, a CSPRNG must be used for cryptographic applications. A CSPRNG creates the state from a non-deterministic source like mouse movement, disk activity etc. and has algorithms that ensure unpredictability.