Skip to main content

Cholesky Decomposition and the Identity Matrix

The Cholesky decomposition of a matrix is a useful tool which can be used to transform a matrix into the identity matrix. My interest in this has stemmed from such a use in which to normalise an optimisation problem I am required to remove the complexity from my data matrix.

A Cholesky decomposition of a positive-definite matrix Q of dimensions (nXn) has the form,

$Q = LL^*$

where $L^*$ is the conjugate transpose of L and L is a lower triangular matrix composed of real and positive diagonal entries.

This decomposition can be achieved easily in Python with both the numpy.linalg and scipy.linalg packages as,

L = np.linalg.cholesky(Q)

Transforming Q into the identity matrix then follows from,

$Q' = L^{-1} Q (L^{-1})^{T}$,

and in Python we can write this as,

A = np.linalg.inv(L)
Q_ = A @ Q @ A.T

This is a simple enough task that for low order n is capable of returning the identity matrix. Depending on the scale of the entries in Q I have found that the accuracy of the returned identity matrix can vary with the result Q_, hereafter referred to as Q', having slight deviations from the target matrix.

I therefore wanted to try and gain a better understanding of the Cholesky decomposition and write my own code to perform this task. I began this process by inspecting the decomposition further. For a three by three Q matrix the decomposition looks like,

$Q = LL* =
\begin{pmatrix}
L_{11} & 0 & 0 \\
L_{21} & L_{22} & 0 \\
L_{31} & L_{32} & L_{33}
\end{pmatrix}
\begin{pmatrix}
L_{11} & L_{21} & L_{31} \\
0 & L_{22} & L_{32} \\
0 & 0 & L_{33}
\end{pmatrix}$

and expanding this out results in,

$Q =
\begin{pmatrix}
L_{11}^2 & L_{21}L_{11} & L_{31}L_{11} \\
L_{21}L_{11} & L_{21}^2 + L_{22}^2 &  L_{31}L_{21} + L_{32}L_{22}\\
L_{31}L_{11} & L_{31}L_{21} + L_{32}L_{22}& L_{31}^2+L_{32}^2 + L_{33}^2\\
\end{pmatrix}$

which by equating with the elements of Q leads too,

$L =
\begin{pmatrix}
\sqrt{Q_{11}} & 0 & 0 \\
Q_{21}/L_{11} & \sqrt{Q_{22} - L_{21}^2} & 0 \\
Q_{31}/L_{11} & (Q_{32} - L_{31}L_{21})/L_{22} & \sqrt{Q_{33} - L_{31}^2 - L_{32}^2} \\
\end{pmatrix}$.

This can be generalised into the following formulas for the elements of L,

$L_{j, j} = \sqrt{A_{j,j} - \sum_{k=1}^{j-1}L_{j,k}^2}$,
$L_{i, j} = \frac{1}{L_{j,j}} \bigg(Q_{i, j} - \sum_{k=1}^{j-1}L_{i,k}L_{j,k}\bigg)$ for $i>j$.

These recurrence relations can be coded in Python as,

L = np.zeros(Q.shape)
for i in range(L.shape[0]):
    for j in range(L.shape[1]):
            if i == j:
                L[i, j] = np.sqrt(Q[i, j] -
                    np.sum([pow(L[j, k], 2) for k in range(j)]))
            elif i > j:
                L[i, j] = (1/L[j, j])*(Q[i, j] -
                    np.sum([L[i, k]*L[j, k] for k in range(j)]))

where L has the same shape as Q by definition. This code effectively completes the same job that numpy and scipy's 'linalg.cholesky' achieves and gives the decomposition of Q.

I proceeded to compare the ability of my code to accurately return the identity matrix in comparison with that from numpy's cholesky decomposition. The code for this is available at https://github.com/htjb/Blog/tree/master/Cholesky_decomposition.

To build my Q matrix and ensure it is positive-definite, before I decompose it and transform it to the identity matrix, I use the following code,

x = np.linspace(40, 120, 100)

x_ = (x-x.min())/(x.max()-x.min())
N = 9

phi = np.empty([len(x), N])
for i in range(len(x)):
    for l in range(N):
        phi[i, l] = x_[i]**(l)

Q = phi.T @ phi

where N defines the dimension of Q and x is arbitrarily chosen. 'phi' is effectively a matrix of basis functions and this was just a convenient choice given the larger application of the Cholesky decomposition I have mentioned at the beginning of this post.

x here is normalised to the interval [0, 1]. I have found that this type of normalisation on x gives the most consistent results and is the most likely to return perfectly the identity matrix for any N and any range of x as opposed to say something like $x/x_{max}$.

The graphs below show the elements of Q' as produced with L from both my code and numpy's linalg library. The results show that both approaches return L to a sufficient degree of accuracy to transform Q into the identity matrix for N = 4, N = 9 and N = 11.




Beyond N = 11 we begin to encounter machine precision related issues that cause deviations in Q' from the identity matrix for both my code and numpy. However, the deviations are usually in the elements of Q' with $i \geq 10$ and are only minor.

The machine precision related issues can also result in Q matrices that 'are not positive definite' in the sense that elements of Q maybe sufficiently below the machine precision limit, usually around 1e-07, that we find it has negative eigenvalues. These issues can be corrected by adding $1e-07 \times I$ to Q where $I$ is the identity matrix however this is ill advised as it can significantly skew the values of Q.

For practical purposes it is unlikely that we would need to go to high enough N to encounter these issues. For example in my case N represents the order of a polynomial fit and so N = 11 is already high enough to be over fitting my data.

This is a fun little project to have a go at and definitely leads to a better understanding of how the Cholesky decomposition and it's relationship to the identity matrix work.

Thanks for reading!


Comments

Popular posts from this blog

Random Number Generation: Box-Muller Transform

Knowing how to generate random numbers is a key tool in any data scientists tool box. They appear in multiple different optimisation routines and machine learning processes. However, we often use random number generators built into programming languages without thinking about what is happening below the surface. For example in Python if I want to generate a random number uniformally distributed between 0 and 1 all I need to do is import numpy and use the np.random.uniform() function. Similarly if I want gaussian random numbers to for example simulate random noise in an experiment all I need to do is use np.random.normal(). But what is actually happening when I call these functions? and how do I go about generating random numbers from scratch? This is the first of hopefully a number of blog posts on the subject of random numbers and generating random numbers. There are multiple different methods that can be used in order to do this such as the inverse probability transform method and I

LDL Decomposition with Python

I recently wrote a post on the Cholesky decomposition of a matrix. You can read more about the Cholesky decomposition here;  https://harrybevins.blogspot.com/2020/04/cholesky-decomposition-and-identity.html . A closely related and more stable decomposition is the LDL decomposition which has the form, $\textbf{Q} = \textbf{LDL*}$, where $\textbf{L}$ is a lower triangular matrix with diagonal entries equal to 1, $\textbf{L*}$ is it's complex conjugate and $\textbf{D}$ is a diagonal matrix. Again an LDL decomposition can be performed using the Scipy or numpy linear algebra packages but it is a far more rewarding experience to write the code. This also often leads to a better understanding of what is happening during this decomposition. The relationship between the two decompositions, Cholesky and LDL, can be expressed like so, $\textbf{Q} = \textbf{LDL*} = \textbf{LD}^{1/2}(\textbf{D})^{1/2}\textbf{*L*} = \textbf{LD}^{1/2}(\textbf{LD}^{1/2})\textbf{*}$. A simple way to calcu

Random Number Generation: Inverse Transform Sampling with Python

Following on from my previous post, in which I showed how to generate random normally distributed numbers using the Box-Muller Transform, I want to demonstrate how Inverse Transform Sampling(ITS) can be used to generate random exponentially distributed numbers. The description of the Box-Muller Transform can be found here:  https://astroanddata.blogspot.com/2020/06/random-number-generation-box-muller.html . As discussed in my previous post random numbers appear everywhere in data analysis and knowing how to generate them is an important part of any data scientists tool box. ITS takes a sample of uniformly distributed numbers and maps them onto a chosen probability density function via the cumulative distribution function (CDF). In our case the chosen probability density function is for an exponential distribution given by, $P_d(x) = \lambda \exp(-\lambda x)$. This is a common distribution that describes events that occur independently, continuously and with an average constant rate, $\