# Predictability in a PRNG

Contents

## Aim

This post details a small experiment I did to see the difference between a Pseudorandom Number Generator (PRNG) and a Cryptographically Secure Pseudorandom Number Generator (CSPRNG)

## Introduction

A PRNG produces numbers from a given set making sure the probability distribution of the generated numbers is uniform. A CSPRNG on the other hand, also ensures that the output is unpredictable even when all the previous outputs are known. Cryptographic applications typically require entities like keys, IVs to be unpredictable and that is exactly what a PRNG doesn't guarantee.

To see this in action, I wanted to pick a PRNG and see how it is predictable. A famous PRNG present in many languages' standard library is Mersenne Twister.

## Introduction to Mersenne Twister

Mersenne Twister - described in this paper by Makoto Matsumoto et. al. is the basis of Python's `random`

module. It maintains a state \(x\) of 624 32-bit integers. For \(i >= 624\), \(x_i\) is calculated using the following generator equation -

where \(x_i^u\) and \(x_i^l\) denote the first bit and last 31 bits of \(x_i\) respectively. \(A\) is a \(32 \times 32\) matrix of \(0\) or \(1\).

The values of \(x_i\) for \(0 <= i < 624\) are populated using a seed. Once that's done, whenever a random number is requested, the next element \(x_i\) is calculated using the above equation and the following operations are performed

and the result \(y_i\) is the output random number.

## Predicting MT's output

To predict the next random number, we need to know the state consisting 624 32-bit integers. Since \(x_i\) s are the ones present in the state, we've to use the \(y_i\) to get to \(x_i\) by inverting the above 4 equations. In the following code, `invert1`

inverts 1st and 4th equations, `invert2`

inverts 2nd and 3rd equations.

The following code uses the above functions to invert \(y_i\) to \(x_i\)

Calling `detemper`

on the last 624 random numbers lets us recreate the state. Now, all we have to do is use the generator equation to calculate the next random number based on this state. It turns out, in Python, we don't need to write code to explicitly calculate the next random number based on the state.

### MT in Python's `random`

Python's `random`

module is a wrapper over the Mersenne Twister generator. The function genrand_int32 in `_randommodule.c`

provides the same random number generator as in section Introduction to Mersenne Twister. The functions like `random.randint`

, `random.random`

build upon the random numbers provided by `genrand_int32`

.

Instead of computing new random numbers on every call, `genrand_int32`

generates the next 624 random numbers at a time and discards the old state. It then stores an index into the state pointing to the first unused value.

Calling `Random.getstate`

returns `genrand_int32`

's internal state. Its format is as follows:

`VERSION`

is the version of the state. It's usually 3 in newer Python versions. `INDEX`

is the above mentioned index. Usually \(0 <= \mathtt{INDEX} <= 623\). A value of 624 means that the entire state is consumed. When that happens, the next call to `getrand_int32`

regenerates the entire state.

Using this information, we can find some interesting things -

We can see that `random`

method consumes 2 32-bit integers to generate a floating point random number.

### Using `setstate`

to predict the random number

We can use `Random.setstate`

method to set the state of the PRNG. If we set the state to `(3, (....624 integers...., 624), None`

, since `INDEX == 624`

, next invocation to `genrand_int32`

will use the generator equation to compute the next random number. Following is the code -

Entire code is at https://github.com/nitishch/reverse-mt/blob/master/reverse-mt.py

## Summary

Using the `predict_next`

function, given past 624 random numbers generated by Python's Mersenne Twister PRNG, we can find out the next random number it would generate. Because of this predictability, a CSPRNG must be used for cryptographic applications. A CSPRNG creates the state from a non-deterministic source like mouse movement, disk activity etc. and has algorithms that ensure unpredictability.