Ticket #5133 (closed bug: fixed)

Opened 2 years ago

Last modified 23 months ago

Random instance for Float can generate values out of requested range

Reported by: richardsenington Owned by: rrnewton
Priority: normal Milestone:
Component: libraries/base Version: 7.1
Keywords: System.Random Cc: garrett.mitchener@…, newton@…
Operating System: Linux Architecture: x86
Type of failure: None/Unknown Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Recently I was using the standard random number generator and found it that the program was breaking with an array index out of range.

I was using the Float data type not Double. The range was the basic (0,1) range. The test code is the following.

test :: Float test = let gen = read "1005320695 588518561" :: StdGen?

in fst $ random gen

Change History

  Changed 2 years ago by scpmw

This is probably a rounding issue -- while the documentation specifies a [0,1) range, querying a Float random value gives 1.0, while querying a Double gives 0.99999998. Digging deeper, the  randomIvalDouble implementation is probably to blame: It generates the random Float using from 232 possible integer values, where single-precision Float only has 23 fraction bits and therefore will round quite a few numbers up to 1.0.

  Changed 2 years ago by daniel.is.fischer

Strictly speaking, the documentation says "normally":

-- * For fractional types, the range is normally the semi-closed interval
-- @[0,1)@.

So it's not against the letters of the spec. But it would of course be better if it didn't produce 1. However, avoiding that would cost performance. I'm going to benchmark how much.

  Changed 2 years ago by daniel.is.fischer

Adding a check for a return value of 1.0 and replacing that with 0.0 (so that all Float results are rounded to from approximately the same number of Double results - the round-to-even strategy makes the numbers not quite identical) doesn't cost much performance, actually I found no consistently measurable slowdown in my benchmarks. But I'm not entirely sure that it works reliably,

case realToFrac y of
  x -> x - int2Float (float2Int x)

failed due to using extended precision (so the result of realToFrac wasn't really 1.0, float2Int x was 0, result finally rounded to 1 anyway) when compiled with optimisations but without -msse2. I don't know whether that can also happen with

case realToFrac y of
  1.0 -> 0.0
  x   -> x

It would take someone who knows what's actually going on in the FPU (and code generator).

  Changed 2 years ago by wgmitchener

  • cc garrett.mitchener@… added

I ran into this a while back and forgot to post a bug-- sorry. It's nothing so exotic. If you go down to the function randomIvalDouble in Random.hs line 400 or so (and I'm looking at the one from  http://darcs.haskell.org/packages/random), here's where something is going wrong:

case (randomIvalInteger (toInteger (minBound::Int32), toInteger (maxBound::Int32)) rng) of

(x, rng') ->

let

scaled_x =

fromDouble ((l+h)/2) + fromDouble ((h-l) / realToFrac int32Range) * fromIntegral (x::Int32)

in (scaled_x, rng')

... int32Range = toInteger (maxBound::Int32) - toInteger (minBound::Int32)

But if the pseudo-random integer x happens to be minBound, which is -231, then the scaled_x comes out

1/2 - 231 / (232 - 1) = 231 ( 1/232 - 1/(232 - 1) ) = -1.164e-10 < 0

which is outside the required range. An easy fix might be to add 1 to the min bound used to generate x.

  Changed 2 years ago by simonpj

  • cc newton@… added

Adding Ryan Newton to cc, since he is now the maintainer.

  Changed 2 years ago by rrnewton

  • owner set to rrnewton

  Changed 2 years ago by rrnewton

If you look at the Java float RNG they discuss a problem with using >24 bits of randomness for generating floats:

 http://download.oracle.com/javase/1.4.2/docs/api/java/util/Random.html#nextFloat()

I'm of a mind to use the 24-bit solution that they use (and 53 for doubles) to strive for greater uniformity as well as fixing the bug in this ticket.

By the way, is there a discussion anywhere of why the [0.0,1.0) semi-closed range was chosen? Specifically it seems a bit odd in contrast with randomR which is inclusive (well, optionally). Yet it's true that Python and Java both use [0.0,1.0) so maybe this is standard.

Related: the following unpublished paper discusses an interesting way of generating all possible float representations with uniform probability. Something to think about:

 http://allendowney.com/research/rand/downey07randfloat.pdf

follow-up: ↓ 11   Changed 2 years ago by scpmw

Well, the way this was discovered was a program assuming floor ((fst (random gen) :: Float) * fromIntegral x) < x. That might well be why other languages have chosen this approach as well.

I read that paper as well after commenting... The interesting question is how often the extra bits of information would actually have a positive effect in programs. And whether it's worth it to have the algorithm loop a few times for that.

In my humble opinion, the cleanest solution would be to determine how many fraction bits we need (e.g. Float -> 23, Double -> 52). Then we can generate a random number of the appropriate range, which maps 1:1 to a floating-point representation. Not *all* possible floating point representations, obviously, but at least it would be fast, uniform and in the [0, 1) range.

  Changed 2 years ago by rrnewton

  • status changed from new to merge

  Changed 2 years ago by rrnewton

FYI, the new development repository for this library is here:

 https://github.com/rrnewton/haskell_stdlib_random

Changes will go there before they get pulled to:

 http://darcs.haskell.org/packages/random.git

Feel free to fork the github repo and submit pull requests.

in reply to: ↑ 8   Changed 2 years ago by rrnewton

In my humble opinion, the cleanest solution would be to determine how many fraction bits we need (e.g. Float -> 23, Double -> 52). Then we can generate a random number of the appropriate range, which maps 1:1 to a floating-point representation. Not *all* possible floating point representations, obviously, but at least it would be fast, uniform and in the [0, 1) range.

I agree and that's what I implemented in revision 4cfc44a16bd9. It is indeed faster -- 150 times faster on my machine: 11M floats a second vs 70K! I'd like to hear if others get the same results.

  Changed 23 months ago by igloo

  • status changed from merge to closed
  • resolution set to fixed

Changes pulled into the GHC repo

Note: See TracTickets for help on using tickets.