2014-04-28 Uniform random floats: How to generate a double-precision floating-point number in [0, 1] uniformly at random given a uniform random source of bits (Permalink: https://mumble.net/~campbell/2014/04/28/uniform-random-float) The naive approach of generating an integer in {0, 1, ..., 2^k - 1} for some k and dividing by 2^k, as used by, e.g., Boost.Random and GCC 4.8's implementation of C++11 uniform_real_distribution, gives a nonuniform distribution: - If k is 64, a natural choice for the source of bits, then because the set {0, 1, ..., 2^53 - 1} is represented exactly, whereas many subsets of {2^53, 2^53 + 1, ..., 2^64 - 1} are rounded to common floating-point numbers, outputs in [0, 2^-11) are underrepresented. [Update, 2016-08-23: I can't figure out what I meant by the above bias. The probability the floating-point draw lies in [0, 2^-11) should be 2^-11. This is the same as the probability that the integer draw lies in {0, 1, ..., 2^53 - 1}, which is 2^53/2^64 = 2^-11. Certainly values below 2^-64 are underrepresented, as below, but that's a separate issue.] - If k is 53, in an attempt to avoid nonuniformity due to rounding, then numbers in (0, 2^-53) and 1 will never be output. These outputs have very small, but nonnegligible, probability: the Bitcoin network today randomly guesses solutions every ten minutes to problems for which random guessing has much, much lower probability of success, closer to 2^-64. What is the `uniform' distribution we want, anyway? It is obviously not the uniform discrete distribution on the finite set of floating-point numbers in [0, 1] -- that would be silly. For our `uniform' distribution, we would like to imagine[*] drawing a real number in [0, 1] uniformly at random, and then choosing the nearest floating-point number to it. To formalize this, start with the uniform continuous distribution on [0, 1] in the real numbers, whose measure is mu([a, b]) = b - a for any [a, b] contained in [0, 1]. Next, let rho be the default (round-to-nearest/ties-to-even) rounding map from real numbers to floating-point numbers, so that, e.g., rho(0.50000000000000001) = 0.5. Note that the preimage under rho of any floating-point number -- that is, for a floating-point number x, the set of real numbers that will be rounded to x, or { u \in [0, 1] : rho(u) = x } -- is an interval. Its measure, mu(rho^-1(x)), is the measure of the set of numbers in [0, 1] that will be rounded to x, and that is precisely the probability we want for x in our `uniform' distribution. Instead of drawing from real numbers in [0, 1] uniformly at random, we can imagine drawing from the space of infinite sequences of bits uniformly at random, say (0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, ...), and interpreting the result as the fractional part of the binary expansion of a real number in [0, 1]: 0*(1/2) + 0*(1/4) + 0*(1/8) + 1*(1/16) + 0*(1/32) + 1*(1/64) + ... Then if we round that number, we'll get a floating-point number with the desired probability. But we can't just directly compute that sum one term at a time from left to right: once we get to 53 significant bits, further bits will be rounded away by addition, so the result will be biased to be `even', i.e. to have zero least significant bit. And we obviously can't choose uniformly at random from the space of infinite sequences of bits and round the result. We can, however, choose finite sequences of bits uniformly at random, and, except for a detail about ties, after a certain constant number of bits, no matter what further bits we choose they won't change what floating-point number we get by rounding. So we can choose a sufficiently large finite sequence of bits and round that, knowing that the result of rounding would not change no matter how many extra bits we add. The detail about ties is that if we simply choose, say, 1088 bits -- the least multiple of 64 greater than the exponent of the smallest possible (subnormal) floating-point number, 2^-1074 -- and round it in the default round-to-nearest/ties-to-even mode, the result would be biased to be even. The reason is that the probability of seeing a tie in any finite sequence of bits is nonzero, but but real ties occur only in sets of measure zero -- they happen only when *every* remaining bit in the infinite sequence is also zero. To avoid this, before rounding, we can simulate a sticky bit by setting the least significant bit of the finite sequence we choose, in order to prevent rounding what merely looks approximately like a tie to even. One little, perhaps counterintuitive, detail remains: the boundary values, 0 and 1, have half the probability you might naively expect, because we exclude from consideration the half of the interval rho^-1(0) below 0 and the half of the interval rho^-1(1) above 1. Some PRNG libraries provide variations on the naive algorithms that omit one or both boundary values, in order to allow, e.g., computing log(x) or log(1 - x). For example, instead of drawing from {0, 1, ..., 2^53 - 1} and dividing by 2^53, some - divide by 2^53 - 1 instead, to allow both boundary points; - draw from {1, 2, ..., 2^53} instead, to omit 0 but allow 1; or - draw from {-2^52, -2^52 + 1, ..., 2^52 - 2, 2^52 - 1} (i.e., the signed rather than unsigned 53-bit integers) and then add 1/2 before dividing, to omit both boundary points. These algorithms still have the biases described above, however. Given a uniform distribution on [0, 1], you can always turn it into a uniform distribution on (0, 1) or (0, 1] or [0, 1) by rejection sampling if 0 or 1 is a problem. See for source code. Thanks to Steve Canon and Alex Barnett for explaining the binary expansion approach to me and talking the problem over. [*] For the pedantic who may stumble upon this before reading ahead, we cannot, of course, assign a meaningful probability to a choice of real number in [0, 1]. What we can do, however, is imagine that it were meaningful, and then formalize it in the next sentence with the correct sense of measure. -- Copyright (c) 2006--2015, Taylor R. Campbell. Verbatim copying and distribution of this entire article are permitted worldwide, without royalty, in any medium, provided this notice, and the copyright notice, are preserved.