Level 1: Random Numbers and Computers

Even if you haven't formally studied these topics, you probably have a good intuition of what randomness means: something you can’t accurately predict. If you buy a lottery ticket, you can’t be certain what the winning number will be; all you know is there are only $N$ tickets available and you bought one (or more) of them. The same applies to basically everything – dices, weather, asset prices, digits of $\pi$, quantum physics, genetic mutations, sports results… Even seemingly “non-random” events almost always have variability, measuring error, etc.

One way of looking at this is the following: if we were capable of controlling every single parameter of a process, then we might be able to exactly know what the outcome would be^[1]. For example, on a standard six-sided fair die^[2] roll, we could be able to exactly predict what the number would fall, if we could know in advance the exact position of a hand in 3D space, the exact speed, acceleration and angle of the hand throwing the die, the exact initial die position in the hand, the exact amount of spin the hand does while launching, the exact air friction, the exact dampening of the table… You get the idea. This would mean the process was deterministic instead of random. But this seems a practically impossible task to perform, so we are probably better off with assuming the event to be random.

Random variables and distributions

That does not mean we can’t do anything to better understand these processes. First of all, we can define a random variable as the number that results for throwing a single fair six-sided die. For that variable, we can deduce what the complete set of outcomes is. For the referred die roll, this would be all the possible numbers – the set $\{1,2,3,4,5,6\}$. This is called the sample space. Occurrences of this random process would be elements of the sample space (actual die rolls). Secondly, we can model the probability of each outcome – or subset of outcomes – occurring. This is done via a probability distribution. In this example, we are using the word “fair” to describe a discrete uniform distribution, which essentially means any of the six possible outcomes is equally likely. Thirdly, we can use that distribution to calculate the probability of a certain subset of events happening, by evaluating that distribution. For the geeks, the probability distribution function would be the following:

$$p(k) = \frac{1}{6}, k \in \{1,2,3,4,5,6\}$$

For discrete distributions (also called probability mass functions or PMF), we can compute the probability of a specific value occurring, by evaluating the function (e.g. the probability of rolling a 4 is $p(4) = \frac{1}{6}$). We can also compute cumulative probabilities by summing values up to a specific point - this is called the cumulative distribution function or CDF. Finally, we can use the rule of sum to compute probabilities of disjoint occurrences (i.e. things that cannot both happen at the same time), and the rule of product to compute probabilities of joint events. This set provides us with several tools to calculate all sorts of probabilities. For the simple fair die example, the probability of rolling either a 1 or 6 in one roll is $p(1) + p(6) = \frac{1}{6} + \frac{1}{6} = \frac{1}{3}$. Moreover, we can model each and every one of the die rolls with a mathematical formula, compute things like mean, variance, median… and calculate probabilities for arbitrary scenarios. In other words, we can foster as much (deterministic) info as possible from the random process.

Random variables (and their corresponding distribution functions) are called discrete if the possible outcomes it can take is countable (i.e. the number rolled by a fair die, the number of tails before the first head in a sequence of coin flips, etc.) and continuous if the possible outcomes are non-countably infinite (i.e. the weight of a random person from the World’s population, the time between customers arriving to a store). There are arbitrarily many probability distribution functions we can use^[3], and more specifically there are many well-studied function families that can be readily used for many phenomena; some of the most important ones will be discussed in Level 3, but for discrete random variables, the discrete uniform, binomial distribution and Poisson are arguably the most important ones, and for continuous random variables, the (continuous) uniform, Normal, Lognormal and Exponential random variables are some of the most iconic.

Simulation and random number generation

One other thing: since we have a mathematical formula for the distribution of the probabilities, we can simulate this process. Simulating a random process means artificially producing concrete events from the sample space with probabilities determined by the distribution function. There are several ways to simulate random variables, but the most used ones rely on generating continuous uniform random variables, i.e., values between 0 and 1 with equal probability, and then applying some transformations to these values. This is what we call random number generation (RNG).

So how do we simulate random values between 0 and 1? Before computers were widely used, people resorted to manually crafting random number tables, which then were published so other people could read them and use those numbers (for example, for random sampling). Do you think you could generate "truly random" numbers? You can try in the GM below:

Many studies have shown that we humans are not very good at generating random numbers. We tend to fall to certain fallacies (for example, on independent runs, we tend to think a specific number that hasn't come up in a long time is "due" and hence we'll try to correct for that - the "Gambler's Fallacy"; we normally don't consider the expected amount of consecutive runs etc.). Fortunately, since we can program computers, we can rely on generating those random values. How "truly random" do you think we can generate numbers with our software applications?

Breaking news: modern computers can simulate seemingly random values, but these are actually deterministic values. We cannot ask a computer to organically generate “truly random” numbers, because computers operate on specific instructions – hence, the moment we tell the computer exactly what to do, we are determining an algorithmic way to generate them. This has profound implications: once we know the mathematical function that generates them, as well as the initial values, we can predict every following value that will be generated, in order. We call these values pseudorandom (pseudo = fake^[4]) numbers, and the algorithms that generate them pseudorandom number generators (PRNG, or casually called RNG)^[5].

Most PRNGs are iterated mathematical functions that generate seemingly unpredictable patterns. Consequently, they have a period after which they start repeating the sequence. At first glance, this sounds a bit lame, and for really "serious" applications, sometimes people need to resort to other means than software-based PRNGs. However, for the majority of uses (including most game development stuff), this is not actually bad news – actually being able to replicate the sequence of pseudorandom numbers is really useful, for example, to be able to debug our game without the "noise" that RNG can create - think, for example, debugging our code for developing procedurally generated dungeons.

However, we need to make sure the generated pseudorandom numbers are sufficiently robust. So, how can one be certain a set of numbers is a “good” (pseudo)random sequence? Well, different test batteries have been designed over the years to assess these PRNGs and determine whether they actually produce “sufficiently random” values. If you're interested you can check test batteries such as DieHarder or TestU01. Some desirable properties of PRNGs are:

All different possible values (or values in a range) should be generated with the same proportion (equidistribution;
Different sequences of numbers should be statistically independent;
The PRNG should be replicable (in the sense mentioned above);
The PRNG should be sufficiently fast for the required use.

The tests generally review equidistribution, degree of linear and non-linear correlation, statistical independence etc., in $n$ dimensions. One can download and apply the test batteries to a particular PRNG. It is worth noting that nearly no PRNG pass all tests. Having said that, the more critical the application, the more robust the PRNG needs to be (for example, there are several PRNGs that can be perfectly valid for day-to-day applications, but are unsuitable for cryptographic applications; also, a standard PRNG might be OK for game with fake money, but not sufficiently good for a casino simulation game where people actually bet their real-life dollars on).

So, now that we know about PRNGs, how does an actual PRNG look like? We’ll have to explore deeper into the dungeon for that. Over there, we’ll build some simple (and not very good) PRNGs in GameMaker and we’ll also take a look at the built-in PRNG GameMaker exposes to the user.

You just gained 1,000 XP! Are you ready for the next level of the dungeon? If so...

^[1] Quantum physicists might disagree.

^[2] Considering our geeky gaming profile, I should have written d6 instead of “standard fair die”.

^[3] Any function can be a probability distribution function (mass function in the case of discrete random variables, or desity function in the case of continuous variables), provided its range (output) is a number between 0 and 1 and its sum or integral over the entire domain (i.e. the set of all possible outcomes) is 1, meaning the probability of any of the possible events of a random process is always 100%.

^[4] See this definition.

^[5] From this moment on, in the context of computer-generated numbers, I use “pseudorandom” and “random” interchangeably.