Skip to main content

Cholesky Decomposition and the Identity Matrix

The Cholesky decomposition of a matrix is a useful tool which can be used to transform a matrix into the identity matrix. My interest in this has stemmed from such a use in which to normalise an optimisation problem I am required to remove the complexity from my data matrix.

A Cholesky decomposition of a positive-definite matrix Q of dimensions (nXn) has the form,

$Q = LL^*$

where $L^*$ is the conjugate transpose of L and L is a lower triangular matrix composed of real and positive diagonal entries.

This decomposition can be achieved easily in Python with both the numpy.linalg and scipy.linalg packages as,

L = np.linalg.cholesky(Q)

Transforming Q into the identity matrix then follows from,

$Q' = L^{-1} Q (L^{-1})^{T}$,

and in Python we can write this as,

A = np.linalg.inv(L)
Q_ = A @ Q @ A.T

This is a simple enough task that for low order n is capable of returning the identity matrix. Depending on the scale of the entries in Q I have found that the accuracy of the returned identity matrix can vary with the result Q_, hereafter referred to as Q', having slight deviations from the target matrix.

I therefore wanted to try and gain a better understanding of the Cholesky decomposition and write my own code to perform this task. I began this process by inspecting the decomposition further. For a three by three Q matrix the decomposition looks like,

$Q = LL* =
\begin{pmatrix}
L_{11} & 0 & 0 \\
L_{21} & L_{22} & 0 \\
L_{31} & L_{32} & L_{33}
\end{pmatrix}
\begin{pmatrix}
L_{11} & L_{21} & L_{31} \\
0 & L_{22} & L_{32} \\
0 & 0 & L_{33}
\end{pmatrix}$

and expanding this out results in,

$Q =
\begin{pmatrix}
L_{11}^2 & L_{21}L_{11} & L_{31}L_{11} \\
L_{21}L_{11} & L_{21}^2 + L_{22}^2 &  L_{31}L_{21} + L_{32}L_{22}\\
L_{31}L_{11} & L_{31}L_{21} + L_{32}L_{22}& L_{31}^2+L_{32}^2 + L_{33}^2\\
\end{pmatrix}$

which by equating with the elements of Q leads too,

$L =
\begin{pmatrix}
\sqrt{Q_{11}} & 0 & 0 \\
Q_{21}/L_{11} & \sqrt{Q_{22} - L_{21}^2} & 0 \\
Q_{31}/L_{11} & (Q_{32} - L_{31}L_{21})/L_{22} & \sqrt{Q_{33} - L_{31}^2 - L_{32}^2} \\
\end{pmatrix}$.

This can be generalised into the following formulas for the elements of L,

$L_{j, j} = \sqrt{A_{j,j} - \sum_{k=1}^{j-1}L_{j,k}^2}$,
$L_{i, j} = \frac{1}{L_{j,j}} \bigg(Q_{i, j} - \sum_{k=1}^{j-1}L_{i,k}L_{j,k}\bigg)$ for $i>j$.

These recurrence relations can be coded in Python as,

L = np.zeros(Q.shape)
for i in range(L.shape[0]):
    for j in range(L.shape[1]):
            if i == j:
                L[i, j] = np.sqrt(Q[i, j] -
                    np.sum([pow(L[j, k], 2) for k in range(j)]))
            elif i > j:
                L[i, j] = (1/L[j, j])*(Q[i, j] -
                    np.sum([L[i, k]*L[j, k] for k in range(j)]))

where L has the same shape as Q by definition. This code effectively completes the same job that numpy and scipy's 'linalg.cholesky' achieves and gives the decomposition of Q.

I proceeded to compare the ability of my code to accurately return the identity matrix in comparison with that from numpy's cholesky decomposition. The code for this is available at https://github.com/htjb/Blog/tree/master/Cholesky_decomposition.

To build my Q matrix and ensure it is positive-definite, before I decompose it and transform it to the identity matrix, I use the following code,

x = np.linspace(40, 120, 100)

x_ = (x-x.min())/(x.max()-x.min())
N = 9

phi = np.empty([len(x), N])
for i in range(len(x)):
    for l in range(N):
        phi[i, l] = x_[i]**(l)

Q = phi.T @ phi

where N defines the dimension of Q and x is arbitrarily chosen. 'phi' is effectively a matrix of basis functions and this was just a convenient choice given the larger application of the Cholesky decomposition I have mentioned at the beginning of this post.

x here is normalised to the interval [0, 1]. I have found that this type of normalisation on x gives the most consistent results and is the most likely to return perfectly the identity matrix for any N and any range of x as opposed to say something like $x/x_{max}$.

The graphs below show the elements of Q' as produced with L from both my code and numpy's linalg library. The results show that both approaches return L to a sufficient degree of accuracy to transform Q into the identity matrix for N = 4, N = 9 and N = 11.




Beyond N = 11 we begin to encounter machine precision related issues that cause deviations in Q' from the identity matrix for both my code and numpy. However, the deviations are usually in the elements of Q' with $i \geq 10$ and are only minor.

The machine precision related issues can also result in Q matrices that 'are not positive definite' in the sense that elements of Q maybe sufficiently below the machine precision limit, usually around 1e-07, that we find it has negative eigenvalues. These issues can be corrected by adding $1e-07 \times I$ to Q where $I$ is the identity matrix however this is ill advised as it can significantly skew the values of Q.

For practical purposes it is unlikely that we would need to go to high enough N to encounter these issues. For example in my case N represents the order of a polynomial fit and so N = 11 is already high enough to be over fitting my data.

This is a fun little project to have a go at and definitely leads to a better understanding of how the Cholesky decomposition and it's relationship to the identity matrix work.

Thanks for reading!


Comments

Popular posts from this blog

Random Number Generation: Inverse Transform Sampling with Python

Following on from my previous post, in which I showed how to generate random normally distributed numbers using the Box-Muller Transform, I want to demonstrate how Inverse Transform Sampling(ITS) can be used to generate random exponentially distributed numbers. The description of the Box-Muller Transform can be found here:  https://astroanddata.blogspot.com/2020/06/random-number-generation-box-muller.html . As discussed in my previous post random numbers appear everywhere in data analysis and knowing how to generate them is an important part of any data scientists tool box. ITS takes a sample of uniformly distributed numbers and maps them onto a chosen probability density function via the cumulative distribution function (CDF). In our case the chosen probability density function is for an exponential distribution given by, $P_d(x) = \lambda \exp(-\lambda x)$. This is a common distribution that describes events that occur independently, continuously and with an average constant rate...

LDL Decomposition with Python

I recently wrote a post on the Cholesky decomposition of a matrix. You can read more about the Cholesky decomposition here;  https://harrybevins.blogspot.com/2020/04/cholesky-decomposition-and-identity.html . A closely related and more stable decomposition is the LDL decomposition which has the form, $\textbf{Q} = \textbf{LDL*}$, where $\textbf{L}$ is a lower triangular matrix with diagonal entries equal to 1, $\textbf{L*}$ is it's complex conjugate and $\textbf{D}$ is a diagonal matrix. Again an LDL decomposition can be performed using the Scipy or numpy linear algebra packages but it is a far more rewarding experience to write the code. This also often leads to a better understanding of what is happening during this decomposition. The relationship between the two decompositions, Cholesky and LDL, can be expressed like so, $\textbf{Q} = \textbf{LDL*} = \textbf{LD}^{1/2}(\textbf{D})^{1/2}\textbf{*L*} = \textbf{LD}^{1/2}(\textbf{LD}^{1/2})\textbf{*}$. A simple way to calcu...

Contour Plotting with Matplotlib

Visualising data well is an important part of any analysis and a good handle on the Python package Matplotlib is essential for any Python data analyst. I hope to provide a few tutorials on some of the more complex concepts in data visualisation and how to produce clear and tidy graphs with Matplotlib. I will assume some basic knowledge of the Matplotlib package but will try and explain the code as clearly as possible. Comments are always welcome! I am going to begin with this piece on contour plotting which is an area I have a particular interest in. Specifically I am interested in plotting parameter spaces for fitted functions with contours defined by an objective function like $\chi^2$. We will get to an example of this shortly but I first want to look at how we make contour plots with a simpler example. Basic Example with Radii For our simple example we will define variables $x$ and $y$ over a given range and plot the corresponding radius, $Z$ from 0 for each data point $(x, y)$ as ...