Institute of Computing for Climate Science Summer School 2024 - Programme

Other information

Schedule

During the week, you can book in a session with one of the RSE team for advice, or to discuss ongoing projects.

Click a title to see its abstract and suggested pre-requisites for each session.

Wednesday 10th July, Centre for Mathematical Sciences

Start	End	Track 1	Track 2
14:00	14:15	Introduction - Please be seated by 2pm sharp
14:15	15:45	Introduction to Git and GitHub (MR4)	Intermediate Git and GitHub (MR2)
15:45	16:15	Coffee & Tea
16:15	17:45	Scientific Visualisation (MR2)	Code clinic (MR4 & 5)

Aromi Pizza and beer from 17:45; board games + Lego

Thursday 11th July, Centre for Mathematical Sciences

Optional running excursion in the morning:

07:00-07:30 - scenic 4k led by Jack Atkinson
07:00-07:30 - scenic but longer and slightly faster 5k led by Dominic Orchard

Both starting and leaving from the Faulkes Gatehouse at the CMS.

Start	End	Track 1	Track 2
08:15	09:00	Continental breakfast at the CMS
09:00	10:30	Introduction to Neural Networks with PyTorch (MR2)	Coupling PyTorch with Fortran via FTorch (MR4)
10:30	11:00	Break - tea, coffee, pastries
11:00	12:00	Introduction to Neural Networks with PyTorch (MR2)	Code clinic (MR4 & 5)
12:00	13:30	Lunch - Churchill College
13:30	15:00	OpenMP for GPUs (MR4)	Research Sofware Engineering Skills (in Python) (MR2)
15:00	15:30	Break - tea, coffee
15:30	17:00	OpenMP for GPUs (lab) (MR4)	Typing Python with mypy (MR2)

Dinner at Madingley Hall, Madingley, Cambridge CB23 8AQ. Transport from CMS will depart at 17:15.

Friday 12th July, CMS

Start	End	Track 1	Track 2
08:15	09:00	Continental breakfast at the CMS
09:00	10:30	Introduction to climate and weather modelling (MR2)	Explainable data science with the Fluid language (MR4)
10:30	11:00	Break - tea, coffee, pastries
11:00	12:00	Introduction to climate and weather modelling (MR2)	What can abstract mathematics tell us about programming climate models? (MR4)
12:00	13:30	Lunch - Churchill College
13:30	15:30	Profiling and performance testing (MR2)	Introduction to Computational Science in Julia (MR4)
15:30	16:00	Break - tea, coffee
16:00	17:00	Closing Keynote - Dr Evelina Gabasova - Transformational power of openness: open source in research and beyond (MR2)

Cambridge Wine Merchants and cheese and wine tasting session - 15 min short intro to wines and then wines and nibbles

Full Program

This session is aimed to help participants taking their first steps with version control using Git and Github. We will learn the basic principles of Git, how we can upload our code (or other data) to a remote repository, collaborate on it with colleagues, receive their changes, go back to previous versions, etc.

No more emailing files forth and back, no more "version5.78_final_final_use-this-one"!

This is a hands-on session with live-coding and exercises.

We will use the Unix shell in this course. Previous experience with using the shell would be helpful, but we will help you out if you haven"t used it before.

Pre-requisites: Install git on your computer, set up a Github account and the SSH key and MFA.
You can follow the steps from here: https://swcarpentry.github.io/git-novice/ as well as https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account.

This session is intended for participants who want to expand their understanding of Git and GitHub. Building on the basic principles of Git (e.g., the commit, pull, and push commands), we will explore the concept of branching, when to use it, and useful tools for interrogating and manipulating branches. We will also learn about the core concepts of GitHub, how they interact, and how they can be used to build effective software development workflows.

This is a hands-on session with live-coding and exercises.

We will use the Unix shell in this course.

Pre-requisites: Attendees will need to have Git installed on their computers, have GitHub accounts, and have SSH keys and MFA set up.

We are assuming that attendees are familiar with Git commands `git add`, `git commit`, `git pull`, `git push`, and `git log`, and the GitHub concepts of Issues and Pull Requests.

The repository used for the exercises will include some simple Python code but understanding Python is not a requirement. However, attendees will need to have working Python 3 installations on their computers.

In this session we will look at viewing scientific data using python tools. We will cover how to open and access large datasets and prepare them for plotting - e.g. with xarray and (geo)pandas. We will look at libraries that are useful for plotting geospatial data such as cartopy, regionmask, cmocean. As well as technical skills we will discuss considerations for presenting data such as use of scales, colourmaps, and labelling. Finally we will look at examples of structuring matplotlib code for streamlining presentation and enabling easy re-use.

This session will include a general lecture to explain what the current approach to weather and climate modelling is, and how it links to supercomputing. This will be followed by a short practical session using a pre-built model, with some tasks via a Jupyter Notebook.

Fundamentals of dynamics and physics for the atmosphere and ocean
Numerical methods used in weather and climate prediction
The supercomputing challenges in weather and climate simulation
Aspects of Machine Learning
- ML emulators
- Improvement of parameterizations
- Uncertainty quantification
- ML techniques for operational weather forcast

The practical session will be based on _Observation System Simulation Experience for ocean surface pCO2 over the Atlantic Ocean_. Sparse data coverage and the lack of observations covering the full seasonal cycle challenge mapping methods and result in noisy reconstructions of surface ocean pCO2 and disagreements between different models. We explored design options for a future augmented Atlantic-scale observing system that would optimally combine data streams from various platforms and contribute to reduce the bias in reconstructed surface ocean pCO2 fields and sea–air CO2 fluxes.

Pre-requisites: The data required for the practical session is here:

Denvil-Sommer, A. (2024). Dataset for OSSE exercise at ICCS Summer School 2024 Cambridge [Data set]. Zenodo. https://doi.org/10.5281/zenodo.12567970

You may wish to look at the article:

Denvil-Sommer, A., Gehlen, M., and Vrac, M.: Observation system simulation experiments in the Atlantic Ocean for enhanced surface ocean pCO2 reconstructions, Ocean Sci., 17, 1011–1030, https://doi.org/10.5194/os-17-1011-2021, 2021.

The code (including notebooks and install instructions) can be found here.

Charts and other visual summaries, curated by journalists and scientists from real-world data and simulations, are how we understand our changing world and the anthopogenic sources of that change. But interpreting these visual outputs is a challenge, even for experts with access to the source code and data. Fluid (f.luid.org) is a new “transparent” programming language, being developed at the Institute of Computing for Climate Science in Cambridge, that can be used to create charts and figures that are linked to data so a user can interactively discover what visual elements actually represent. This is an opportunity to learn about and experiment with a new programming language designed to make climate science more open, intelligible and accessible.

Category theory is a subfield of mathematics that seeks to expose common underlying structure in other areas of mathematics. It has since also became a foundational technique for understanding logic and programming, with its use both in semantics of formal languages and as a tool for structuring programs. Many concepts in computer programming can be explained from a category theoretic perspective, yielding new insights about how to reason about programs and generalise their definitions. In this session, I will give an overview of a few key ideas that have applications to numerical programming tasks familiar in earth systems modelling. This will provide some fresh perspectives about how to structure and reason about programs both for correctness and efficiency.

To make the best use of today's massively parallel and heterogeneous (both CPU and GPU) computing resources we need to use several programming models. OpenMP is an open specification for a directive based programming model that can take advantage of all the cores on a processor and offload computations to GPUs making only minimal changes to the C, C++ or Fortran source code.

This session will serve as an introduction to the OpenMP programming model for GPU acceleration. You will learn how to introduce the directives into your code, and put this into practice using OpenMP to speed up example programs.

Pre-requisites:

As we will be running the practical exercises on the Cambridge HPC system, basic linux shell knowledge is expected.
Expect basis programming skills and the ability to read C or Fortran-style code, and the ability to compile and run code on systems using Makefiles.
Some familiarity with GPU programming is beneficial but not essential

Python is the tool of choice for many applications in research, from data processing and analysis to producing plots and figures for publications.

However, much of this code is written to a base standard to achieve a single goal. Further, it is often written in a fluid style as interesting science appears. Whilst this is fast in the short-term, it does not lend well to re-usability by others (or even the future author!) or to well-written and structured code.

In this session we will explore a number of tools and techniques that can be easily applied to improve your code's quality, readability, reduce bugs, and facilitate re-use.

Pre-requisites: For the RSE Skills we require participants to:

Have a working Python 3 installation on their system.
Ideally clone the workshop repository in advance of the session: https://github.com/Cambridge-ICCS/rse-skills-python
We expect basic programming skills, the ability to read and follow python code, and an enthusiasm to learn better practice - it is worth emphasising that many of the concepts will map across to other languages besides python.

Many compiled languages include a 'type checker' as part of their compilation process which applies automated checks to source code to rule out potential runtime errors due to mismatches in the format of data ('type errors'). The Python language does not include such a check: its types are 'dynamic', with type errors occurring only if encountered at runtime. Python however supports type annotations (since Python 3.0) which allows a programmer to insert optional type information into code which external tools can then use to type check a program. This session will teach how to use Python types alongside the mypy tool for ruling out program bugs and better documenting source code. We will also talk about some fundamental concepts in typing and program verification.

Pre-requisites
Python 3 and mypy should be installed before the session.

This session aims to teach the key theoretical concepts behind machine learning, and offers hands-on training in applying machine learning techniques using PyTorch, along with guidance on structuring resilient and sustainable machine learning code.

We will cover both regression and classification, learning about key concepts and applying them in parallel exercises. Once complete participants will have a good framework for building, training, and running neural nets that could be adapted for their own applications.

We will demonstrate the application of machine learning with examples from the geoscience domain.

Required Pre-Reading: To make the most of the session we expect participants to arrive with a (minimal) base-level understanding of machine learning concepts. In addition to this we will also assume knowledge of some basic mathematics and python abilities.

Pre-requisites: Participants will have the choice of executing the material on Colab or locally on their own system. The latter will require familiarity with virtual environments and code deployment.

Mathematics and Machine Learning We will not focus on the mathematics of ML too heavily but we expect some familiarty with calculus (differentiating a function), matrix algebra (matrix multiplication and representing data as a matrix) and the concept of regression (fitting a function to data)

Neural Networks
High level concepts can be obtained by watching the the [video series by 3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks), at least chapters 1-3.

Python The course will be taught in python using pyTorch. Whilst no prior knowledge of pyTorch is expected we assume users are familiar with the basics of Python3 which includes: - Basic mathematical operations - Writing and using functions The concept of object orientation i.e. that an object, e.g. a dataset, can have associated functions/methods associated with it. Basic use of the following libraries: - numpy for mathematical and array operations - matplotlib for ploting and visualisation - pandas for storing and accessing tabular data - Familiarity with the concept of a jupyter notebook.

A key focus of many scientific computing domains at present is how to use machine learning to enhance and accelerate traditional simulations. Climate science is no exception, with this topic being part of all VESRI projects. To achieve coupling between ML and numerical models presents a number of technical and scientific challenges, however. FTorch is a library developed by ICCS to couple PyTorch-based machine learning models to Fortran code with the aim of reducing the burden on scientific researchers. It has already been used in DataWave and M2LInES projects and further afield. In this workshop we will introduce FTorch and review its capabilities before taking participants through the process of coupling a PyTorch model into a Fortran code bin a practical demonstration. There may also be time for questions/discussion from those seeking to use FTorch in their work, and the developers will be available for code-clinics and discussions throughout the week. Further information can be found in this video or this video.

Pre-requisites:

A python installation. Preferably with pytorch pip installed in advance
CMake installed
Compilers (the gnu suite would be ideal)
- A Fortran Compiler
- A C compiler
- A C++ compiler
Internet access
Windows users are encouraged to use Windows Subsystem for Linux, or review the Windows guidance on the FTorch documentation in advance.

Have you ever found yourself in a position where your code feels slow but you can't quite put your finger on it.

is it the new system your running on?
the new dependencies installed by your system admin?
or that new awesome feature you pushed to main branch last week without tests 😳 ?

Climate software is necessarily complex, often containing thousands of source files and millions of lines of code. These projects are often developed collaboratively by a large number of scientists over a significant number of years. It is no longer possible to know every line of code, every function and every source file. We can no longer "just guess" where performance is being lost. This is where profiling comes in. In this tutorial we will cover the basics of profiling -- what it is, what its used for and how to understand the output. These basics will be reinforced with demonstrations of two high performance profilers: score-p and TAU.

Pre-requisites:

Bring a code that you would like to profile (we can provide example code but it's always better to use your own)
No need to install any profilers or tools prior to the workshop
Access to a Unix machine would be ideal (if using Windows, please install Windows Subsystem for Linux WSL)
Optional - score-p/cube, valgrind, clang/gnu compiler, tau profiler, python

This introductory tutorial provides a comprehensive overview of the core features and capabilities of the Julia programming language, designed for participants with a foundational understanding of programming concepts. We begin with an introduction to Julia and the interactive Pluto Notebook environment, followed by an exploration of functions, primary and composite data types, generic programming through multiple dispatch, and more. Afterwards, the tutorial provides several study cases to delve into applications of Julia in scientific computing and machine learning. The last part will be a hands-on lab to build an Earth energy balance model and train a neural network to solve its differential equation.

Pre-requisites:

Basic experience in programming
A Julia installation with Pluto.jl. Please following the setup instructios on the material for this session
Some knowledge in calculus and linear algebra would be desirable

Building software tools has become a fundamental aspect of many areas of current research, from environmental modelling to digital humanities. Evelina will talk about how the potential of these tools can be amplified through the principles of open source and open science. Looking at successful and not so successful examples, we will explore the current landscape of open source in academia and research in general: from building collaborative communities to the current struggles to define what open source even means in the world of large language models. On top of that, we will cover some of the best practices for creating robust, reusable and openly accessible tools to maximise the impact of our research work.