As already discussed in some of my previous articles good visualisation of data is essential to getting the associated message across. One aspect of this is the need to plot multiple data sets or visualise the same data set in different ways on the same figure. For example we may wish to illustrate our data and the residuals after we subtract a fit to that data all in the same figure. This can be effectively done using matplotlib and the associated subplots environments. Mastery of these tools is something that comes very much with practice and I do not claim to be an expert. However, I have some experience with the environment and I will share with you the basics in this article. In order to use the matplotlib environment we will need to begin by importing matplotlib via, import matplotlib.pyplot as plt We can then proceed to explore what is on offer. plt.subplot() and plt.subplots() So plt.subplots() and plt.subplot() are probably where most people begin to learn about the idea of c...
The Binary Search is an important algorithm for any Data Analyst to know. Frequently, we may need to identify if a particular value can be found in a data set and if it can we may need to know it's index. For example we may have some conditional task that only gets acted upon if our data set features a specific value. Perhaps, another example, we have collected some data but we know having plotted it in some informative way that there is a spurious value in the set that isn't following the expected trend. We can most likely attribute it to a typo, 349 instead of 345 say, when entering our data but either way we consider it an outlier and we need to identify it's position in the list and remove it from the data set. A possible solution is to loop through our data set until we find the entry that equals 349 but our data set could be big and this could be very time consuming. The alternative is to use the Binary Search which allows us to find the index of our typo in a sorted...