Scientific visualisation with Python

James Emberton

ICCS - University of Cambridge

Jack Atkinson

ICCS - University of Cambridge

Outlook

  • Discuss examples of good and bad visualisations
  • Principles of good visualisation
  • Explore plotting with Python, Matplotlib and Cartopy in Jupyter Notebook

  • Part theory, part practical

Learning Outcomes

  1. Tailor your visualisations to your audience
  2. Avoid plotting bad habits
  3. Reinforce coding best practices
  4. How to use colour to best effect
  5. Fundamentals of Matplotlib - plots and animations
  6. Explore Cartopy

Setup

Navigate to:

github.com/Cambridge-ICCS/summer-school-scientific-vis

and follow installation instructions in the README.


This Github repo contains this presenation as an html file, and the jupyter notebook.
To launch Jupyter notebook from terminal:

  • navigate to notebook directory
  • run jupyter notebook

An example of a bad scientific plot

Objective:

Compare p-value and variance data for two populations highlighting statistical differences between different factors

Problems:

  1. Double y-axis
  2. Inverted left y-axis
  3. Y-axis don’t start at zero
  4. Different axis scales on plots for comparison
  5. Different x-axis category orders

Nat Med 28, 535–544 (2022). https://doi.org/10.1038/s41591-022-01695-5
Cross-cohort gut microbiome associations with immune checkpoint inhibitor response in advanced melanoma.
Lee, K.A., Thomas, A.M., Bolte, L.A. et al.

Different approaches to the same data

Figure SPM.8 in IPCC, 2021

Plot adds colour and annotation to guide interpreatation of the data for a non-technical audience.
BBC replotted IPCC 2021 report data for a different audience.

Let’s get plotting

Anatomy of a Matplotlib plot

‘Figure’ is the canvas on which you add one or more axes

‘Axes’ is the part of the Figure where information is added

‘Artist’ is the base class for all elements that can be added to the axes - 2D line, patch, text

‘Axis’ is the x- or y-axis (not to be confused with axes)

The Matplotlib interface

Pyplot (Implicit)

import matplotlib.pyplot as plt

plt.plot([1, 2, 3], [4, 5, 6])

plt.title('Title')
plt.xlabel('x-label')
plt.ylabel('y-label')

plt.legend()

plt.show()

Core Matplotlib (Explicit)

import matplotlib.pyplot as plt

fig = plt.figure()

ax = fig.subplots()
ax.plot([1, 2, 3], [4, 5, 6])

ax.set_title('Title')
ax.set_xlabel('xlabel')
ax.set_ylabel('ylabel')

ax.legend()

plt.show()

Plain Pyplot is a simplified interface for quick plotting

Explicit approach using an object oriented interface allows advanced plot customsiation.

Matplotlib customisations: JL Ex 1

Plotting NetCDF data with basic Matplotlib: JL Ex 2

Introduction to Cartopy

  • Geospatial plotting package built on top of Matplotlib, PROJ, Shapely and NumPy
  • Projections and transformations
  • Data handling
  • Plotting and visualisation

Examples of Cartopy features: JL Ex 3

Geographic maps

Global coastline map

Feature maps

Using Features and Projections in Cartopy: JL Ex 4

Visualising vector data

Plotting vector data: JL Ex 5
Animating vector data: JL Ex 6

Code organisation

Plotting code is code

Plotting is often done using scripts with a eagerness for seeing the results.

However, plotting code is still code and treating it as such can save time and tears later.

Easy adaptation

If you have repeated code consider using:

  • loops

Easy adaptation

The numpy.ravel() function can be particularly useful for flattening multi-dimensional ax arrays.

This is OK (1D):

import matplotlib.pyplot as plt
fig, ax = plt.subplots(6)

for axis in ax:
    axis.plot(...

But this will fail (2D):

import matplotlib.pyplot as plt
fig, ax = plt.subplots(2, 3)

for axis in ax:
    axis.plot(...

Instead use:

import matplotlib.pyplot as plt
fig, ax = plt.subplots(2, 3)

for axis in ax.ravel():
    axis.plot(...

Easy adaptation

If you have repeated code consider using:

  • loops
  • functions/subroutines
    • Make axis a parameter of the function
    • When designing functions think about breaking them up according to the matplotlib ‘anatomy’
import matplotlib.pyplot as plt

def line_plot(axis, x, y):
    axis.plot(x, y, c='r')
    axis.set_xlim(0, np.max(x))

...

fig, ax = plt.subplots(2, 3)
x = np.arange(36)
y = np.random.rand(36,50)

for i, axis in enumerate(ax.ravel()):
    line_plot(axis, x, y[:, i])

Easy adaptation

If you have repeated code consider using:

  • loops
  • functions/subroutines
    • Make axis a parameter of the function
    • When designing functions think about breaking them up according to the matplotlib ‘anatomy’
  • Keep (post-)processing of data separate to plotting
    e.g. calclulate_vorticity() and plot_vorticity() routines.

Easy adaptation

The **kwargs functionality can be used to grab any extra keyword arguments and wrap them into a dictionary to be passed through wrapper functions.

def acronym(acronym, **kwargs):
    print(f"kwargs dictionary: {kwargs}\n")

    result = acronym
    for arg in kwargs.values():
        result += arg
    return result

print(acronym("ICCS: ", a="Institute", b="Computing", c="Climate", d="Science"))
kwargs dictionary: {'a': 'Institute', 'b': 'Computing', 'c': 'Climate', 'd': 'Science'}

ICCS: InstituteComputingClimateScience

Reuse in future

Place commonly used routines in a file to create your own ‘plotting library’.

This can be particularly useful if you have domain-specific adaptations to the common matplotlib or cartopy functions, and for consistency into the future.

"""my_plotting_lib.py - a collection of useful plotting routines"""

def time_axis_years():
    """Custom formatting of axis ticks in years"""
    ...

def time_axis_days():
    """Custom formatting of axis ticks in days"""
    ...

def align_axis_x(axs, ax_target):
    """Make x-axis of axs align with `ax_target` in figure"""
...
...


import my_plotting_lib as myplot

...
...

JL Ex 7

Colormaps

Some Science

Colour can be defined by two things:

  • hue or tint (the ‘colour’ or wavelength)
  • luminosity (how bright/dark the color is)

In the eye these are processed by different cells:

  • rod cells process achromaticity (luminosity or greyscale)
  • cone cells process hue
    • 2/3 of cells process longer wavelengths giving better perception across warmer colors
    • colour vision deficiency is usually an abnornality in one or more types of cone cell
    • The most common CVD is red-green dichromatism (M-cone cell)

Image from Crameri, Shephard, and Heron (2020)

Interpretability and colourblindness

Consider types of colormap in terms of luminosity:

  • Uniform
  • Sequential
  • Cyclic
  • Multi-Sequential

A quick check for how colormaps may be perceived
by those with CVD is to examine the grayscale
version of the image.

Perceptual uniformity

  • A description of how much data variation is weighted across the colormap
  • Low uniformity leads to:
    • artificial features
    • washout of real features
  • Always use a perceptually uniform colormap

Image from Crameri, Shephard, and Heron (2020)

Image from Zeller and Rogers (2020)

cmocean

cmocean is a package of colormaps designed by oceanographers to be:

  • perceptually uniform
  • intuitively reflect the data they are representing
    • sequential, divergent, or cyclic
    • intuitive colors

Image from Thyng et al. (2016)

Custom color maps

All default colormaps are designed apart from the data they are used for. To truly get the best representation of our data we may need to tweak or use a custom colormap.

Matplotlib provides tools for tweaking the predefined and defining new colormaps.

Australian population by (Halupka 2020)

Ocean Eddies

Image by Francesca Samsel and M. Petersen, LANL taken from Zeller and Rogers (2020)

Matplotlib Norms

We would often use a vmin and vmax value to normalise a colormap to a data range:

im = ax.pcolormesh(x, y, z, vmin=273, vmax=300, cmap="cmo.Thermal")

This can be done with the Normalize class and passing a norm:

import matplotlib.colors as mpc
my_norm = mpc.Normalize(vmin=273, vmax=300)
im = ax.pcolormesh(x, y, z, norm=my_norm, cmap='cmp.solar')

The benefit of this approach is it opens up a variety of other normalization options:

my_norm = mpc.Normalize(vmin=273, vmax=300, clip=True)

my_norm = mpc.CenteredNorm(vcenter=286.5, halfrange=13.55)

my_norm = mpc.LogNorm(vmin=273, vmax=300)

my_norm = mpc.TwoSlopeNorm(vcenter=290, vmin=273, vmax=300)

my_norm = mpc.PowerNorm(2, vmin=273, vmax=300)

JL Ex 8

Summary

Learning Outcomes

  1. Tailor your visualisations to your audience
  2. Avoid plotting bad habits
  3. Coding best practices for good plots
  4. How to use colour to best effect
  5. Fundamentals of Matplotlib - plots and animations
  6. Explore Cartopy


“The greatest value of a picture is when it forces us to notice what we never expected to see.”
— John W. Tukey


“Data are just summaries of thousands of stories – tell a few of those stories to help make the data meaningful.”
— Chip & Dan Heath

Thanks for Listening

References

Crameri, Fabio, Grace E Shephard, and Philip J Heron. 2020. “The Misuse of Colour in Science Communication.” Nature Communications 11 (1): 5444.
Halupka, Kerry. 2020. “Beautiful Custom Colormaps with Matplotlib.” https://towardsdatascience.com/beautiful-custom-colormaps-with-matplotlib-5bab3d1f0e72.
Hawkins, Ed. 2022. Climate Lab Book – Visualisation Resources.” https://www.climate-lab-book.ac.uk/visualisation-resources/.
Solomon, Brad. n.d. Real Python – Python Plotting With Matplotlib.” https://realpython.com/python-matplotlib-guide/.
Thyng, Kristen M, Chad A Greene, Robert D Hetland, Heather M Zimmerle, and Steven F DiMarco. 2016. “True Colors of Oceanography: Guidelines for Effective and Accurate Colormap Selection.” Oceanography 29 (3): 9–13.
Zeller, Stephanie, and David Rogers. 2020. “Visualizing Science: How Color Determines What We See.” https://eos.org/features/visualizing-science-how-color-determines-what-we-see.