sλrthak

on randomness and its implementation

I shall never believe that God plays dice with the world. -- Albert Einstein, 1947

despite einstein's meta-physical objections, the current models of physics, and particularly of quantum theory, are based on the idea that nature does indeed involve random processes.

as much as randomness is abundant, its hard to implement this randomness in programs.

being able to simulate random behavior is necessary, for example, if you want to write a computer game that involves flipping a coin or rolling a die, but is also useful in more practical contexts, as well. programs that simulate such random events are called non-deterministic programs.

getting a computer to behave in a random way involves a certain amount of complexity.

random numbers vs pseudo-random numbers

from a theoretical perspective, a number is random if there is no way to determine in advance what value it will have among a set of equally probable possibilities.

as i described earlier, the idea of random numbers makes intuitive sense but it is a difficult notion to represent inside a computer. but why? well, computers operate by following a sequence of instructions in memory and therefore function in deterministic mode.

now you might be wondering how is it possible to generate unpredictable results by following a deterministic set of rules? after all if a number is the result of a deterministic process, any user should be able to work through that same set of rules and anticipate the computer's response, right?

yet computers do in fact use a deterministic procedure to generate what we call "random" numbers. this strategy somehow works because, even though the user could, in theory, follow the same set of rules and anticipate the computer's response, no one actually bothers to do so (lol lmao even).

random number

turns out in most practical applications, it doesn't matter if the numbers are truly random; all that matters is that the number appears to be random. for numbers to appear random, they should:

  1. behave like random numbers from a statistical point of view
  2. be sufficiently difficult to predict in advance that no user would bother

"random" numbers generated by an algorithmic process inside the computer are referred to as pseudo-random numbers to mention the fact that no truly random activity is involved.

let us see how is this implemented in c/c++

pseudorandom numbers in the standard libraries

the <cstdlib> library exports a low level function called rand that produces pseudo-random numbers.

prototype for rand is

int rand();

each call to rand produces a different value that is difficult for users to predict and therefore appear to be random. the result of rand is guaranteed to be non-negative and no larger than the constant RAND_MAX from <cstdlib>, so each call to rand returns a int between 0 and RAND_MAX, inclusive.

let us print the result of calling rand 10 times to check the vibe

#include <cstdlib>
#include <iomanip>
#include <iostream>
using namespace std;

int main(void) {
  cout << "RAND_MAX: " << RAND_MAX << endl;
  for (size_t i = 0; i < 10; i++) {
    cout << setw(10) << rand() << endl;
  }
  return 0;
}

output:

RAND_MAX: 2147483647
1804289383
 846930886
1681692777
1714636915
1957747793
 424238335
 719885386
1649760492
 596516649
1189641421

as you can see, the value of rand is always positive, and never larger than the value of RAND_MAX the values, moreover, appear to jump around unpredictably within that range, which is exactly what you want from a pseudo-random process.

now, the thing is no matter how many times you will run the executable or even compile it again, you will get these same 10 numbers (what's that? s-seed? hold your horses cowboy, we'll get there soon), so obviously it's not random by any means.

the program produces identical output every time because the designers of the C++ library (and the earlier C libraries on which these libraries are based) decided that rand should return the same random sequence each time a program is run.

on most systems, RAND_MAX is defined to be the largest positive integer which is typically 2,147,483,647, but it may have different values on different systems.

lets be real for a sec, even if you could count on RAND_MAX having that particular value, there are few (if any) apps where what you would need is a random number between 0 and 2,147,483,647. as a user you much more likely to want values that fall into some other range, usually a much smaller one.

common pitfall in the rand function

so in short, we have a function rand in the <cstdlib> library that generates a random number between 0 ad a positive constant RAND_MAX, which is some point on a number line that looks like this:

rand_max number line

to simulate the die roll, for eg, we need to transform that random integer into one of the following discrete outcomes:

die num line

as it happens, there are many bad strategies for performing this transformation.

if you thought of doing something like this:

int die = rand() % 6 + 1;

or in general:

int randomChoice = (rand() % UPPER_LIMIT) + LOWER_LIMIT;

then you are ngmi ong, this looks reasonable on the surface. the rand function always returns a positive integer, so the remainder on division by six must be an integer between 0 and 5, adding 1 to that value gives an integer is in the range of 1 to 6, so what's the problem smarty pants you might ask.

making our own random number library

well the problem is that rand guarantees only that the value it produces is uniformly distributed over the range from 0 to RAND_MAX. there is however, no guarantee that the remainders on division by six will be at all random.

early versions of rand were even weirder that were distributed with the unix generated values between odd and even values, despite the fact that those values did fall uniformly over the range.

what we want to do instead is divide the integers between 0 and RAND_MAX into six equal-sized segments that correspond to the different outcomes, as follows:

partitions

in the more general case, we need to divide the number line between 0 and RAND_MAX into k equal intervals, where k is the number of possible outcomes in the desired range.

the process of transforming the result of the rand function into an integer in a finite range can be understood most easily if you decompose it into the following four-step process:

  1. normalize the integer result from rand by converting it into a floating-point number d in the range 0d<1 .
  2. scale the value d by multiplying it by the size of the desired range, so that it spans the correct numbers of integers.
  3. translate the value by adding in the lower bound so that the range begins at the desired point.
  4. convert number to an integer by calling the function floor from <cmath>.

if the initial call to rand() returns 848,256,064 and RAND_MAX has its most common value of 2,147,483,647.

steps required to generate a random integer in the range 1 to 6 are:

steps

now writing the code to implement this is pretty straightforward if you understood everything till here.

// this function returns a random number in 4 stages:
// 1. generate a real number d in the range [0...1)
// 2. scale the number to the range [0...n) where n is the number of values
// 3. translate the number so that the range starts at appropriate value
// 4. convert the result to the next lower integer with floor()

int randomInteger(int low, int high) {
  // normalising:
  double d = rand() / (double(RAND_MAX) + 1);
  // scaling:
  double s = d * (double(high) - low + 1);
  // translation and converting to int:
  return int(floor(low + s));
}

similarly function for generating random real numbers could be implemented

double randomReal(double low, double high) {
  double d = rand() / (double(RAND_MAX) + 1);
  double s = d * (high - low + 1);
  return low + s;
}

but there is still one issue remaining, no matter how many times you will call the function it'll return the same value and that is not random by any means. at first, it may be hard to understand why a function that is supposed to return a random number always returns the same sequence of values. after all, deterministic behavior of this sort seems to run counter to the whole idea of randomness. there is, however, a good reason for this behavior: programs that behave deterministically are easier to debug

at the same time, it also has to be possible to use rand so that it doesn’t always deliver the same results. to understand how to implement this behavior, it helps to understand how rand works internally. the rand function generates each new random value by applying a set of mathematical calculations to the last value it produced.

because we don’t know what those calculations are, it is best to think of the entire operation as a black box where old numbers go in on one side and new pseudorandom numbers pop out on the other. (i could make another post diving deep in this topic, if you some of y'all are interested in this)

let's use our previous example where we printed 10 seemingly random numbers

RAND_MAX: 2147483647
1804289383
 846930886
1681692777
1714636915
1957747793
 424238335
 719885386
1649760492
 596516649
1189641421

since the first call to rand produces the value 1804289383 the second call to rand corresponds to putting 1804289383 into one end of the black box and having 846930886 pop out on the other side:

black box 1

on the next call to rand the implementation puts 846930886 into the black box, which returns 1681692777:

blackbox2

this same process is repeated on each call to rand. the computation inside the black box is designed so that:

  1. numbers are uniformly distributed over the legal range
  2. sequence goes on for a long time before it begins to repeat

but what about the first call to rand the one that returns 1804289383? the implementation must have a starting point. there must be an integer s0 that goes into the black box and produces 1804289383:

black box 3

this initial value that is used to get the entire process started is called the seed for the random number generator.

in the <cstdlib> library, you can set that seed explicitly by calling srand(seed).

as we know from the multiple runs of the previous program, the C++ library sets the initial seed to a constant value every time a program is started, which is why rand always generates the same sequence of values.

to implement this change, the functions randomInteger and randomReal must first check to see whether the random number seed has already been initialized and, if not, set it to some starting value that would be difficult for users to predict, which is usually taken from the value of the system clock.

now you might be thinking : "senpai, but why the value from the system clock? why not add my body count? its unique :3" so the reason is we need a different value every time our program runs and guess what changes everytime? "time" - you guessed it right!!

in C++, we can retrieve the current value of the system clock by calling the function time and then converting the result to an integer. this allows you to write the following statement, which has the effect of initializing the pseudorandom number generator to an unpredictable point:

srand(int(time(NULL)));

although it requires only a single line, the operation to set the random seed to an unpredictable value based on the system clock is quite obscure.

to ensure that the initialization code doesn’t get executed every time, you need a bool flag to record whether that initialization has been performed.

we'll use this function to initialize a random seed before initializing any of our functions in the library:

void initRandomSeed() {
  static bool initialized = false;
  if (!initialized) {
    srand(int(time(NULL)));
    initialized = true;
  }
}

now we need to use this function in randomInteger and randomReal

updated functions definitions:

int randomInteger(int low, int high) {
  initRandomSeed();
  double d = rand() / (double(RAND_MAX) + 1);
  double s = d * (double(high) - low + 1);
  return int(floor(low + s));
}

double randomReal(double low, double high) {
  initRandomSeed();
  double d = rand() / (double(RAND_MAX) + 1);
  double s = d * (high - low + 1);
  return low + s;
}

and our random number generating library is complete now, i hope you understood the underlying basics :)

random num gen meme

monte carlo integration

let us solve a interesting problem that involves randomness to approximate the value of π. sounds interesting? imagine that you have a dartboard hanging on your wall that consists of a circle painted on a square backdrop, as in the following diagram:

square

what happens if you throw a whole bunch of darts completely randomly, ignoring any darts that miss the board altogether? some of the darts will fall inside the gray circle, but some will be outside the circle in the white corners of the square. if the throws are random, the ratio of the number of darts landing inside the circle to the total number of darts hitting the square should be approximately equal to the ratio between the two areas. the ratio of the areas is independent of the actual size of the dartboard, as illustrated by the formula:

darts falling inside the circledarts falling inside the square = area inside the circlearea inside the square = πr24r2 = π4

to simulate this process in a program, imagine that the dart board is drawn on the standard cartesian coordinate plane with its center at the origin and a radius of 1 unit. the process of throwing a dart randomly at the square can be modeled by generating two random numbers, x and y, each of which lies between –1 and +1. this (x, y) point always lies somewhere inside the square. the point (x, y) lies inside the circle if

x2+y2<1

this condition, however, can be simplified considerably by squaring each side of the inequality, which yields the following more efficient test:

x2+y2<1

if you perform this simulation many times and compute what fraction of the darts fall inside the circle, the result will be an approximation of π4.

solution:

we will simulate throwing 100,000 darts and will use our random number library that we designed to approximate the value of π

#include "random.h"
#include <cmath>
#include <iostream>
using namespace std;

const int LIMIT = 100000;

int main(void) {
  double x, y;
  int sum_circle = 0;
  for (int i = 0; i < LIMIT; i++) {
    x = randomReal(-1, 1);
    y = randomReal(-1, 1);
    if (x*x + y*y < 1) {
      sum_circle++;
    }
  }
  cout << sum_circle / double(LIMIT) << endl;
  cout << M_PI / 4 << endl; // M_PI stores the value of PI
  return 0;
}

output:

0.78468
0.785398

see how close we are to the original value just with randomness? isn't that amazing?

the strategy used in this problem is not particularly accurate, even though it occasionally proves useful as an approximation technique. in mathematics, this technique is called monte carlo integration, after the capital city of monaco, famous for its casinos.

maybe by this blog i could show you how randomness is everywhere in our nature and how can we generate pseudo-random numbers. and to counter the quote by Albert Einstein i'll post another quote of a famous physicist.

Not only that God does play dice, but that He sometimes confuses us by throwing them where they can't be seen. - Stephen Hawking