Visualising data well is an important part of any analysis and a good handle on the Python package Matplotlib is essential for any Python data analyst. I hope to provide a few tutorials on some of the more complex concepts in data visualisation and how to produce clear and tidy graphs with Matplotlib. I will assume some basic knowledge of the Matplotlib package but will try and explain the code as clearly as possible. Comments are always welcome!
I am going to begin with this piece on contour plotting which is an area I have a particular interest in. Specifically I am interested in plotting parameter spaces for fitted functions with contours defined by an objective function like $\chi^2$. We will get to an example of this shortly but I first want to look at how we make contour plots with a simpler example.
Basic Example with Radii
For our simple example we will define variables $x$ and $y$ over a given range and plot the corresponding radius, $Z$ from 0 for each data point $(x, y)$ as contour lines. To define $x$ and $y$ we will use numpy and this is done like so,
import matplotlib.pyplot as plt import numpy as np x = np.arange(-10, 20, 1) y = np.arange(-10, 20, 1)
where we have also imported matplotlib.pyplot in order to produce our contour maps of the radius.
The next stage is to create a np.meshgrid of $x$ and $y$ data corresponding to each data point in the $y - x$ plane which we define with the parameters $X$ and $Y$. We can define the radius using the standard definition from polar coordinates, $Z = \sqrt{x^2 + y^2}$,
X, Y = np.meshgrid(x, y) Z = np.sqrt(X**2 + Y**2)The matrices $X$, $Y$ and $Z$ all have the same dimensions and the elements of $Z$ correspond to the radius evaluated at different combinations of $x$ and $y$ in the matrices $X$ and $Y$.
There are several different ways to produce the corresponding contour plot and here I will demonstrate two. One will produce a colour map contour plot and the other an inline contour map. To illustrate there equivalence I will plot them in a side by side subplot which is defined by,
fig, ax = plt.subplots(1, 2, sharex=True, sharey=True, figsize=(10, 5))The function plt.subplots() returns a fig element which can be used to set general properties of the plot. The ax element of the function is an iterable array where each element corresponds to one of the subplots in the plot. Here there are two subplots arranged in 1 row and 2 columns. The arguments 'sharex' and 'sharey' ensure that the x and y scales are the same in both plots. This is not strictly necessary here as we are plotting the same data in both figures but it is often good practice in subplot graphs to use these key word arguments. It also ensures that only the outer axis on the subplot have tick labels which can be helpful for formatting clear graphs. The argument 'figsize' rescales the graph size from the default. It takes a tuple with each element corresponding to the width and height of the graph in inches respectively.
Now to add the contour plots. For 'ax[0]' corresponding here to the left most axis we have,
cp = ax[0].contour(X, Y, Z) ax[0].clabel(cp, inline=True, fontsize=10, fmt='%1.1f')The 'contour' function produces a line contour map of $Z$ in the $X$ and $Y$ space. We can then label each line with the value of $Z$ at that contour using the 'clabel' function. The 'inline' argument places the labels directly on the contour lines and the 'fontsize' argument determines how big they are. 'fmt' determines how the contour values are displayed. Here we are displaying the contour values as floats with 1 decimal place.
For 'ax[1]' we can plot the contours as a colour map using the 'contourf' function,
cp = ax[1].contourf(X, Y, Z, cmap='autumn') plt.colorbar(cp, label='Radius')We have defined the colour map to be 'autumn'. The function 'colorbar' adds a colour bar with label 'Radius' which displays the values of $Z$ corresponding to each coloured contour.
Finally, we can tidy up the graph for a nicer presentation. The script below will wrap the entire figure inside another plot that has no axis, axis ticks or tick labels. However, we can use this figure to add 'global' labels that cover both subplots.
fig.add_subplot(111, frame_on=False) plt.tick_params(labelcolor="none", bottom=False, left=False) plt.xlabel(r'$X$') plt.ylabel(r'$Y$')We can then remove the white space between the two figures using the 'subplots_adjust' function and save/show the figure.
plt.subplots_adjust(wspace=0) plt.savefig('Basic_example.png') plt.show()The resultant graph is shown below and it illustrates the two approaches to plotting contours. I would recommend building the code and displaying the plot gradually so that you can see what each function and argument is doing. Particularly it might be worth plotting this with and without 'sharex=True' and 'sharey=True' (you can set these to 'False' or remove completely). It is also worth investigating the 'suplots_adjust' function and the contour label options like 'fmt'. The Matplotlib documentation is very extensive with a range of examples and I have not covered all optional arguments for the functions used here. For example the 'levels' argument in both the 'contour' and 'contourf' functions can be used to adjust the frequency of the contour lines or coloured regions. It is worth exploring these a bit more as well and there is a link to the documentation at the end of the article.
The colour map figure appears in the figure below to have a shorter width than the line plot. This is as a consequence of the subplot environment we have used and the presence of the colour bar. A better visualisation can be achieved by using Matplotlibs 'gridspec' environment. However this is out of the scope of this article and I may follow it up with a later post.
From the graph we can see that the value of the contour lines gets larger as we move to larger absolute values of $x$ and $y$ which is what we would expect for the radius $Z$.
$\chi^2$ Example
How can we apply this to parameter spaces and $\chi^2$?
Well first we need to define some data and a fitting function. We define some mock data like so,
x = np.arange(-10, 20, 1) y = 1.4*x**2 + 0.2*x + 4.3*x**3In reality we will not necessarily have any prior knowledge of the data set other than the fact that it looks polynomial in nature. Consequently, we may choose to fit it with a polynomial with powers on $x$ of $1$ and $2$ with two fitting parameters $a$ and $b$. We therefore define our model using the following function,
def fitting(x, a, b): y_fit = a*x**2 + b*x return y_fitTo fit this function to the data we are going to use here the curve_fit function from scipy.optimize which uses the Levenburg-Marquardt fitting routine. The algorithm is based on the Gauss-Newton and Gradient Descent algorithms and is not particularly reliable when there are a large number of local minima present. However, for the purposes of this article it is sufficient and is implemented like so,
from scipy.optimize import curve_fit
p0 = [1, 1] parameters, cov = curve_fit(fitting, x, y, p0=p0)Here 'p0' is the initial parameter guess, 'parameters' are the optimum parameters for the fit and 'cov' is the covariance matrix for the fit. You can read more about the algorithm via the link at the bottom of the article.
If we want to plot the value of $\chi^2$ in the parameter space we need to sample the parameters around their optimum values which were returned from the algorithm. We use np.linspace to produce arrays of the parameter values that have length $100$ and extend $50\%$ either side of the optimum values. We then produce a mesh grid of these two parameter arrays and for each combination of $a$ and $b$ or 'param1' and 'param2' we calculate $\chi^2$ and assign these values to the matrix $Z$. This is performed with the following code,
param1 = np.linspace(parameters[0]*0.5, parameters[0]*1.5, 100) param2 = np.linspace(parameters[1]*0.5, parameters[1]*1.5, 100) X, Y = np.meshgrid(param1, param2) Z = np.empty(X.shape) for i in range(len(X)): for j in range(X.shape[0]): Z[i, j] = (np.sum((y - fitting(x, X[i,j], Y[i,j]))**2))Here we have used the standard definition of $\chi^2$ given by,
$\chi^2 = \sum (y_{data} - y_{fit})^2$.
Finally we can plot the contour map using the following code,
fig = plt.figure() cp = plt.contourf(X, Y, Z, cmap='jet') plt.colorbar(cp, label=r'$\chi^2$') plt.xlabel(r'$a$') plt.ylabel(r'$b$') plt.savefig('chi2_example.png') plt.show()I have chosen to do this using the colour map method as I believe it makes for a nicer visualisation of the data. I have also used a different colour map to illustrate how this can be altered. The resultant graph which shows how the $\chi^2$ value varies with parameters $a$ and $b$ is shown below.
Unsurprisingly our model produces a bad fit but the example serves to illustrate how this style of plot can be achieved. The optimum value is found in the centre of the graph since we have defined our parameter ranges to be $50\%$ either side of the optimum values. However the graph also tells us that the value of $a$ and $b$ are anti-correlated. What this means is if we take arbitrary parameters, $(a_0, b_0)$ in the space and we increase the value of $b_0$ then the value of $a_0$ must decrease in order for the new parameters, $(a_1, b_1)$ to produce the same quality of fit as the original parameters.
This is a powerful tool for visualising parameter spaces and also for diagnosing relationships between parameters in fitted models.
Thanks for reading! Have fun plotting contours!
Further reading:
- Matplotlib Documentation: https://matplotlib.org/3.1.1/contents.html
- Levenberg-Marquardt: https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm
Comments
Post a Comment