Skip to main content

Random Number Generation: Inverse Transform Sampling with Python

Following on from my previous post, in which I showed how to generate random normally distributed numbers using the Box-Muller Transform, I want to demonstrate how Inverse Transform Sampling(ITS) can be used to generate random exponentially distributed numbers. The description of the Box-Muller Transform can be found here: https://astroanddata.blogspot.com/2020/06/random-number-generation-box-muller.html.

As discussed in my previous post random numbers appear everywhere in data analysis and knowing how to generate them is an important part of any data scientists tool box.

ITS takes a sample of uniformly distributed numbers and maps them onto a chosen probability density function via the cumulative distribution function (CDF). In our case the chosen probability density function is for an exponential distribution given by,

$P_d(x) = \lambda \exp(-\lambda x)$.

This is a common distribution that describes events that occur independently, continuously and with an average constant rate, $\lambda$. An example of this type of process is the radioactive decay of atoms. The decay processes are described by a half life, $\tau$ which is the time after which a sample of radioactive material will be halved in size. $\tau$ is directly related to the decay rate, $\lambda$ by,

$\lambda = \frac{\ln(2)}{\tau}$.

To perform the ITS we need the inverse cumulative distribution function. The CDF of $X$ evaluated at $x$ is the probability that $X$ will have a value less than or equal to $x$,

$F_X(x) = P(X \leq x)$.

Note the distinction between the probability, $P(x)$ and the probability density function, $P_d(x)$ here. For our problem the CDF is given by,

$F_X(x) = 1 - P(X > x) = 1 - \exp(-\lambda x)$.

We show the probability density function and the CDF in the figure below for a range of decay rates.


The inverse CDF is found by solving $y = F(x)$ which gives us,

$F^{-1}(y) = -\frac{1}{\lambda} \ln(1-y)$,

where $y$ here is a uniformly distributed random number and the value of $F^{-1}(y)$ is the required exponentially distributed random number. We can illustrate now how this works by taking a small sample of uniform random numbers and mapping this onto our exponential CDF using it's inverse with $\lambda = 0.5$ as shown below.



Here we have used ten uniformly distributed random numbers, $y$ and calculated their corresponding $x$ value using the approach described above. The corresponding data points, shown as stars, lie along the line given by the CDF. A larger sample would show this a bit more clearly but we can already see that the numbers are fairly uniform across $y$ and that this maps on to low values of $x$. Exactly the behaviour we would expect of exponentially distributed random numbers!

Now we have confirmed that this approach for generating random numbers works with a simple example we can go ahead and produce a callable Python class to generate exponentially distributed random numbers using the ITS. This is actually a fairly straight forward process and to use a class is perhaps a bit extreme when a function will suffice. However, for consistency with my Box-Muller example we have,
import numpy as np
from uniform_dist import uniform

class exponential():
    def __init__(self, lam, size):
        y = uniform(size).rand
        self.random = -1/lam * np.log(1 - y)
where our exponential random numbers are just given by the inverse CDF evaluated for our uniformly distributed numbers. The uniform distributed numbers are produced using numpy in a separate code and for more details see my post on the Box Muller Transformation linked at the top of the page.

We can call the class for varying decay rates and request different size arrays of random numbers like so,
random = exponential(0.5, 20000).random
random2 = exponential(1, 3000).random
and if we plot the results in a histogram we get the graph shown below.


Looking back out our probability density functions at the very beginning of this article we would expect that a larger decay rate will produce a sample of random numbers closer to $0$ in value. This is exactly what we find and confirms that our code is working!

The ITS is an important and useful tool and as long as the CDF is calculable then it can be used to transform uniformly distributed numbers onto any probability distribution.

The code for this little project is available at: https://github.com/htjb/Blog/tree/master/random_numbers

Thanks for reading! 

Comments

Popular posts from this blog

Random Number Generation: Box-Muller Transform

Knowing how to generate random numbers is a key tool in any data scientists tool box. They appear in multiple different optimisation routines and machine learning processes. However, we often use random number generators built into programming languages without thinking about what is happening below the surface. For example in Python if I want to generate a random number uniformally distributed between 0 and 1 all I need to do is import numpy and use the np.random.uniform() function. Similarly if I want gaussian random numbers to for example simulate random noise in an experiment all I need to do is use np.random.normal(). But what is actually happening when I call these functions? and how do I go about generating random numbers from scratch? This is the first of hopefully a number of blog posts on the subject of random numbers and generating random numbers. There are multiple different methods that can be used in order to do this such as the inverse probability transform method and I

LDL Decomposition with Python

I recently wrote a post on the Cholesky decomposition of a matrix. You can read more about the Cholesky decomposition here;  https://harrybevins.blogspot.com/2020/04/cholesky-decomposition-and-identity.html . A closely related and more stable decomposition is the LDL decomposition which has the form, $\textbf{Q} = \textbf{LDL*}$, where $\textbf{L}$ is a lower triangular matrix with diagonal entries equal to 1, $\textbf{L*}$ is it's complex conjugate and $\textbf{D}$ is a diagonal matrix. Again an LDL decomposition can be performed using the Scipy or numpy linear algebra packages but it is a far more rewarding experience to write the code. This also often leads to a better understanding of what is happening during this decomposition. The relationship between the two decompositions, Cholesky and LDL, can be expressed like so, $\textbf{Q} = \textbf{LDL*} = \textbf{LD}^{1/2}(\textbf{D})^{1/2}\textbf{*L*} = \textbf{LD}^{1/2}(\textbf{LD}^{1/2})\textbf{*}$. A simple way to calcu