Ticket #3575 (new bug)

Opened 4 years ago

Last modified 2 years ago

mkStdGen and split conspire to make some programs predictable

Reported by: rwbarton Owned by: rrnewton
Priority: normal Milestone: _|_
Component: libraries/random Version: 6.10.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

The following program random.hs is intended to produce a line containing 30 random 0s or 1s. It is not an example of the best way to use System.Random, but it looks innocuous enough.

import Control.Monad
import System.Random

print0or1 = do
  g <- newStdGen
  putStr . show . fst $ randomR (0, 1 :: Int) g

main = replicateM_ 30 print0or1 >> putStrLn ""

Let's try running it a thousand times:

rwbarton@functor:/tmp$ ghc-6.10.1 -O2 --make random
[1 of 1] Compiling Main             ( random.hs, random.o )
Linking random ...
rwbarton@functor:/tmp$ for i in `seq 1 1000` ; do ./random >> random.out ; done
rwbarton@functor:/tmp$ sort random.out | uniq | wc -l
60

That's odd... there are 230 possible output lines, but when I tried to generate 1000 random ones, I only got 60 distinct outputs. Why did that happen?

One might think this is due to poor initial seeding of the random number generator (due to the time not changing very much during the test), but this is not the case. Attached is a fancier version of the program which reads an initial seed from /dev/urandom; it exhibits the same behavior.

This phenomenon is not too hard to explain. It is ultimately due to a poor interaction between mkStdGen and split. First, we need to know a bit about the design of System.Random (some statements simplified slightly for this discussion).

  • The state of the RNG consists of two Int32s, s1 and s2.
  • The initial state produced by mkStdGen almost always has s2 equal to 1. (Extremely rarely, it might have s2 equal to 2. We'll ignore this as it doesn't affect the argument.)
  • To generate a random 0 or 1, we first generate a new state using some simple functions s1' = next1(s1), s2' = next2(s2). (Note that s1 and s2 "evolve" independently.) The random value returned is the lowest bit of s1' minus s2'.
  • Splitting the generator (s1, s2) yields the two generators (s1+1, next2(s2)) and (next1(s1), s2-1).

Our program functions as follows.

  • Initialize the generator stored in theStdGen (s1 is some varying value a, s2 is 1).
  • Repeatedly split the generator, replacing it with the first output, and use the second output to generate a 0 or 1.

If we watch theStdGen while our program runs, we will see that s1 is incremented by 1 at each step, while s2 follows the fixed sequence 1, next2(1), next2(next2(1)), etc. The 0 or 1 we output at the kth step is thus the lowest bit of next1(next1(a+k-1)) minus bk, where bk is some fixed sequence. And as k varies, next1(next1(a+k-1)) turns out to be just an arithmetic sequence with fixed difference modulo a fixed prime so its lowest bits are extremely predictable even without knowing a.

This issue can be fixed to some extent, without breaking backwards compatibility, by adding another method (besides mkStdGen) to create a generator, which does not have predictable s2, and using it to initialize the system RNG.

Attachments

random-seed.hs Download (433 bytes) - added by rwbarton 4 years ago.
Alternative example program which reads a seed from /dev/urandom

Change History

Changed 4 years ago by rwbarton

Alternative example program which reads a seed from /dev/urandom

Changed 4 years ago by rwbarton

Thanks to Michael Rice for suggesting that there may be a problem:  http://www.haskell.org/pipermail/haskell-cafe/2009-October/067655.html

Changed 4 years ago by guest

Thread in one long page  here

Changed 4 years ago by simonpj

  • difficulty set to Unknown

Another thread here  http://article.gmane.org/gmane.comp.lang.haskell.cafe/64741/

It concerns Control.Monad.Random rather than System.Random, but Control.Monad.Random in turn may depend on the same underlying infrastructure; anyway it looked related.

Changed 4 years ago by simonpj

Fantastic bug report from Reid Barton! Thanks.

His report, and perhaps the other thread above, identifies serious shortcomings in System.Random. Does anyone feel able to help fix them? I think it's unlikely that GHC HQ will; we're snowed under, and we have zero expertise in random number generators.

Thanks

Simon

Changed 4 years ago by simonpj

See also #3620

Changed 4 years ago by igloo

  • milestone set to _|_

Changed 2 years ago by rrnewton

  • owner set to rrnewton
  • failure set to None/Unknown
Note: See TracTickets for help on using tickets.