Skip to main content

Posts

Showing posts from June, 2020

Subplots with Matplotlib

As already discussed in some of my previous articles good visualisation of data is essential to getting the associated message across. One aspect of this is the need to plot multiple data sets or visualise the same data set in different ways on the same figure. For example we may wish to illustrate our data and the residuals after we subtract a fit to that data all in the same figure. This can be effectively done using matplotlib and the associated subplots environments. Mastery of these tools is something that comes very much with practice and I do not claim to be an expert. However, I have some experience with the environment and I will share with you the basics in this article. In order to use the matplotlib environment we will need to begin by importing matplotlib via, import matplotlib.pyplot as plt We can then proceed to explore what is on offer. plt.subplot() and plt.subplots() So plt.subplots() and plt.subplot() are probably where most people begin to learn about the idea of c

Binary Search with Python

The Binary Search is an important algorithm for any Data Analyst to know. Frequently, we may need to identify if a particular value can be found in a data set and if it can we may need to know it's index. For example we may have some conditional task that only gets acted upon if our data set features a specific value.  Perhaps, another example, we have collected some data but we know having plotted it in some informative way that there is a spurious value in the set that isn't following the expected trend. We can most likely attribute it to a typo, 349 instead of 345 say, when entering our data but either way we consider it an outlier and we need to identify it's position in the list and remove it from the data set. A possible solution is to loop through our data set until we find the entry that equals 349 but our data set could be big and this could be very time consuming. The alternative is to use the Binary Search which allows us to find the index of our typo in a sorted

Quicksort with Python: Implementation and Comparison to Selection Sort

So in my endeavour to learn some sorting algorithms I have decided to look at implementing the Quicksort algorithm. This is has been significantly harder than implementing a Selection Sort which you can read about in my previous post 'Selection Sort with Python' . Part of the reason it has been hard is because I attempted to do this without properly reading and understanding the algorithm but also because it is a bit more complicated than a Selection Sort! However, I eventually got the algorithm working and I have had a lot of fun exploring this sorting method! My Saturday afternoon frustration can be thought of as an 'Ode to Reading the Instruction Manual'. The algorithm works by dividing the list to be sorted around a pivot or partition point into sub-lists of elements based on whether they are greater than or less than the partition. The process is then repeated with the resultant lists decreasing in length until each sub-list is ordered from lowest to highest. The s

Selection Sort With Python

Sorting algorithms are an important aspect of data analysis and many operations require sorted data sets. There an abundance of different types of sorting algorithms such as Quicksort, Merge sort and Selection sort to name a few. I thought I would try and learn how to perform a few of these and I am starting with the Selection sort as it is one of the simplest and easiest to implement. The selection sort algorithm is given as follows: Find the minimum (or maximum) of an unsorted list. Remove the minimum (or maximum) and append it as the first element in the sorted list. Repeat until there are no elements left in the unsorted list. Fairly straight forward right? The algorithms simplicity means that it is often favoured over complex alternatives when efficiency is not an issue. It is not a particularly efficient algorithm and the time taken to perform this algorithm goes like the square of the number of data points, $N$. In other words it has $\mathcal{O}(N^2)$ time complexity. Don't

5 Great Popular Science Books: Get started with Physics and Astrophysics

Popular science books are a great way to alleviate your curiosity in a relaxing way! I have spent many an hour reading some fantastic books and I want to share a few of these with you now. The following books serve as a good introduction to Physics, Astrophysics and some of the more complex concepts in the subject areas. I consider these 'must reads' for every enthusiast and professional. I hope you can enjoy them as much as I have! #1 - The Character of Physical Law By Richard Feynman Okay you should read as much of Feynman's writing as you can get you hands on. That goes without saying and explains why two of his books appear on this list! However, 'The Character of Physical Law' is a great place to start not only with Feynman's writing but also with Physics based popular science books in general. The book itself is based on a series of seven guest lectures given by Feynman at Cornell University in 1964. You can watch recordings of these lectures here:  https:

Contour Plotting with Matplotlib

Visualising data well is an important part of any analysis and a good handle on the Python package Matplotlib is essential for any Python data analyst. I hope to provide a few tutorials on some of the more complex concepts in data visualisation and how to produce clear and tidy graphs with Matplotlib. I will assume some basic knowledge of the Matplotlib package but will try and explain the code as clearly as possible. Comments are always welcome! I am going to begin with this piece on contour plotting which is an area I have a particular interest in. Specifically I am interested in plotting parameter spaces for fitted functions with contours defined by an objective function like $\chi^2$. We will get to an example of this shortly but I first want to look at how we make contour plots with a simpler example. Basic Example with Radii For our simple example we will define variables $x$ and $y$ over a given range and plot the corresponding radius, $Z$ from 0 for each data point $(x, y)$ as

Random Number Generation: Inverse Transform Sampling with Python

Following on from my previous post, in which I showed how to generate random normally distributed numbers using the Box-Muller Transform, I want to demonstrate how Inverse Transform Sampling(ITS) can be used to generate random exponentially distributed numbers. The description of the Box-Muller Transform can be found here:  https://astroanddata.blogspot.com/2020/06/random-number-generation-box-muller.html . As discussed in my previous post random numbers appear everywhere in data analysis and knowing how to generate them is an important part of any data scientists tool box. ITS takes a sample of uniformly distributed numbers and maps them onto a chosen probability density function via the cumulative distribution function (CDF). In our case the chosen probability density function is for an exponential distribution given by, $P_d(x) = \lambda \exp(-\lambda x)$. This is a common distribution that describes events that occur independently, continuously and with an average constant rate, $\

Random Number Generation: Box-Muller Transform

Knowing how to generate random numbers is a key tool in any data scientists tool box. They appear in multiple different optimisation routines and machine learning processes. However, we often use random number generators built into programming languages without thinking about what is happening below the surface. For example in Python if I want to generate a random number uniformally distributed between 0 and 1 all I need to do is import numpy and use the np.random.uniform() function. Similarly if I want gaussian random numbers to for example simulate random noise in an experiment all I need to do is use np.random.normal(). But what is actually happening when I call these functions? and how do I go about generating random numbers from scratch? This is the first of hopefully a number of blog posts on the subject of random numbers and generating random numbers. There are multiple different methods that can be used in order to do this such as the inverse probability transform method and I

The Extragalactic Radio Background

I have recently published an article with colleagues from Manchester University on an updated estimate of the Extragalactic Radio Background (EGRB) or Cosmic Radio Background. The article can be found here:  https://arxiv.org/abs/2004.13596 . What follows is a brief summary of what the EGRB is, how we have estimated it and the updates we made to a previous estimate by Protheroe and Biermann (1996) which can be found at  https://arxiv.org/abs/astro-ph/9605119 . Further details on our method and the mathematics can be found in the paper. The EGRB is the total contribution of radio emission from Star Forming Galaxies (SFG) and Radio Galaxies to the radio sky averaged into units of per steradian. It is difficult to measure the EGRB across a wide range of radio frequencies, $10^3 - 10^{11}$Hz, because the ionosphere prevents measurements at low frequency and emission from our own Galaxy acts as a foreground. In order to take measurements at high frequency the Galactic foreground has to be m