Institute of Computing for Climate Science Summer School 2024 - Programme
Other information
Schedule
During the week, you can book in a session with one of the RSE team for advice, or to discuss ongoing projects.
Click a title to see its abstract and suggested pre-requisites for each session.
Wednesday 10th July, Centre for Mathematical Sciences
Start | End | Track 1 | Track 2 |
---|---|---|---|
14:00 | 14:15 | Introduction - Please be seated by 2pm sharp | |
14:15 | 15:45 | Introduction to Git and GitHub (MR4) | Intermediate Git and GitHub (MR2) |
15:45 | 16:15 | Coffee & Tea | |
16:15 | 17:45 | Scientific Visualisation (MR2) | Code clinic (MR4 & 5) |
Aromi Pizza and beer from 17:45; board games + Lego
Thursday 11th July, Centre for Mathematical Sciences
Optional running excursion in the morning:
- 07:00-07:30 - scenic 4k led by Jack Atkinson
- 07:00-07:30 - scenic but longer and slightly faster 5k led by Dominic Orchard
Both starting and leaving from the Faulkes Gatehouse at the CMS.
Start | End | Track 1 | Track 2 |
---|---|---|---|
08:15 | 09:00 | Continental breakfast at the CMS | |
09:00 | 10:30 | Introduction to Neural Networks with PyTorch (MR2) | Coupling PyTorch with Fortran via FTorch (MR4) |
10:30 | 11:00 | Break - tea, coffee, pastries | |
11:00 | 12:00 | Introduction to Neural Networks with PyTorch (MR2) | Code clinic (MR4 & 5) |
12:00 | 13:30 | Lunch - Churchill College | |
13:30 | 15:00 | OpenMP for GPUs (MR4) | Research Sofware Engineering Skills (in Python) (MR2) |
15:00 | 15:30 | Break - tea, coffee | |
15:30 | 17:00 | OpenMP for GPUs (lab) (MR4) | Typing Python with mypy (MR2) |
Dinner at Madingley Hall, Madingley, Cambridge CB23 8AQ. Transport from CMS will depart at 17:15.
Friday 12th July, CMS
Start | End | Track 1 | Track 2 |
---|---|---|---|
08:15 | 09:00 | Continental breakfast at the CMS | |
09:00 | 10:30 | Introduction to climate and weather modelling (MR2) | Explainable data science with the Fluid language (MR4) |
10:30 | 11:00 | Break - tea, coffee, pastries | |
11:00 | 12:00 | Introduction to climate and weather modelling (MR2) | What can abstract mathematics tell us about programming climate models? (MR4) |
12:00 | 13:30 | Lunch - Churchill College | |
13:30 | 15:30 | Profiling and performance testing (MR2) | Introduction to Computational Science in Julia (MR4) |
15:30 | 16:00 | Break - tea, coffee | |
16:00 | 17:00 | Closing Keynote - Dr Evelina Gabasova - Transformational power of openness: open source in research and beyond (MR2) |
Cambridge Wine Merchants and cheese and wine tasting session - 15 min short intro to wines and then wines and nibbles
This session is aimed to help participants taking their first steps with version control using Git and Github. We will learn the basic principles of Git, how we can upload our code (or other data) to a remote repository, collaborate on it with colleagues, receive their changes, go back to previous versions, etc.
No more emailing files forth and back, no more "version5.78_final_final_use-this-one"!
This is a hands-on session with live-coding and exercises.
We will use the Unix shell in this course. Previous experience with using the shell would be helpful, but we will help you out if you haven"t used it before.
Pre-requisites: Install git on your computer, set up a Github account and the SSH key and MFA.
You can follow the steps from here: https://swcarpentry.github.io/git-novice/ as well as https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account.
This session is intended for participants who want to expand their understanding of Git and GitHub. Building on the basic principles of Git (e.g., the commit, pull, and push commands), we will explore the concept of branching, when to use it, and useful tools for interrogating and manipulating branches. We will also learn about the core concepts of GitHub, how they interact, and how they can be used to build effective software development workflows.
This is a hands-on session with live-coding and exercises.
We will use the Unix shell in this course.
Pre-requisites: Attendees will need to have Git installed on their computers, have GitHub accounts, and have SSH keys and MFA set up.
We are assuming that attendees are familiar with Git commands `git add`, `git commit`, `git pull`, `git push`, and `git log`, and the GitHub concepts of Issues and Pull Requests.
The repository used for the exercises will include some simple Python code but understanding Python is not a requirement. However, attendees will need to have working Python 3 installations on their computers.
In this session we will look at viewing scientific data using python tools. We will cover how to open and access large datasets and prepare them for plotting - e.g. with xarray and (geo)pandas. We will look at libraries that are useful for plotting geospatial data such as cartopy, regionmask, cmocean. As well as technical skills we will discuss considerations for presenting data such as use of scales, colourmaps, and labelling. Finally we will look at examples of structuring matplotlib code for streamlining presentation and enabling easy re-use.
This session will include a general lecture to explain what the current approach to weather and climate modelling is, and how it links to supercomputing. This will be followed by a short practical session using a pre-built model, with some tasks via a Jupyter Notebook.
- Fundamentals of dynamics and physics for the atmosphere and ocean
- Numerical methods used in weather and climate prediction
- The supercomputing challenges in weather and climate simulation
- Aspects of Machine Learning
- ML emulators
- Improvement of parameterizations
- Uncertainty quantification
- ML techniques for operational weather forcast
Pre-requisites: The data required for the practical session is here:
Denvil-Sommer, A. (2024). Dataset for OSSE exercise at ICCS Summer School 2024 Cambridge [Data set]. Zenodo. https://doi.org/10.5281/zenodo.12567970
You may wish to look at the article:Denvil-Sommer, A., Gehlen, M., and Vrac, M.: Observation system simulation experiments in the Atlantic Ocean for enhanced surface ocean pCO2 reconstructions, Ocean Sci., 17, 1011–1030, https://doi.org/10.5194/os-17-1011-2021, 2021.
The code (including notebooks and install instructions) can be found here.Charts and other visual summaries, curated by journalists and scientists from real-world data and simulations, are how we understand our changing world and the anthopogenic sources of that change. But interpreting these visual outputs is a challenge, even for experts with access to the source code and data. Fluid (f.luid.org) is a new “transparent” programming language, being developed at the Institute of Computing for Climate Science in Cambridge, that can be used to create charts and figures that are linked to data so a user can interactively discover what visual elements actually represent. This is an opportunity to learn about and experiment with a new programming language designed to make climate science more open, intelligible and accessible.
Category theory is a subfield of mathematics that seeks to expose common underlying structure in other areas of mathematics. It has since also became a foundational technique for understanding logic and programming, with its use both in semantics of formal languages and as a tool for structuring programs. Many concepts in computer programming can be explained from a category theoretic perspective, yielding new insights about how to reason about programs and generalise their definitions. In this session, I will give an overview of a few key ideas that have applications to numerical programming tasks familiar in earth systems modelling. This will provide some fresh perspectives about how to structure and reason about programs both for correctness and efficiency.
To make the best use of today's massively parallel and heterogeneous (both CPU and GPU) computing resources we need to use several programming models. OpenMP is an open specification for a directive based programming model that can take advantage of all the cores on a processor and offload computations to GPUs making only minimal changes to the C, C++ or Fortran source code.
This session will serve as an introduction to the OpenMP programming model for GPU acceleration. You will learn how to introduce the directives into your code, and put this into practice using OpenMP to speed up example programs.
Pre-requisites:
- As we will be running the practical exercises on the Cambridge HPC system, basic linux shell knowledge is expected.
- Expect basis programming skills and the ability to read C or Fortran-style code, and the ability to compile and run code on systems using Makefiles.
- Some familiarity with GPU programming is beneficial but not essential
Python is the tool of choice for many applications in research, from data processing and analysis to producing plots and figures for publications.
However, much of this code is written to a base standard to achieve a single goal. Further, it is often written in a fluid style as interesting science appears. Whilst this is fast in the short-term, it does not lend well to re-usability by others (or even the future author!) or to well-written and structured code.
In this session we will explore a number of tools and techniques that can be easily applied to improve your code's quality, readability, reduce bugs, and facilitate re-use.
Pre-requisites: For the RSE Skills we require participants to:
- Have a working Python 3 installation on their system.
- Ideally clone the workshop repository in advance of the session: https://github.com/Cambridge-ICCS/rse-skills-python
- We expect basic programming skills, the ability to read and follow python code, and an enthusiasm to learn better practice - it is worth emphasising that many of the concepts will map across to other languages besides python.
Many compiled languages include a 'type checker' as part of their compilation process which applies automated checks to source code to rule out potential runtime errors due to mismatches in the format of data ('type errors'). The Python language does not include such a check: its types are 'dynamic', with type errors occurring only if encountered at runtime. Python however supports type annotations (since Python 3.0) which allows a programmer to insert optional type information into code which external tools can then use to type check a program. This session will teach how to use Python types alongside the mypy tool for ruling out program bugs and better documenting source code. We will also talk about some fundamental concepts in typing and program verification.
Pre-requisites
Python 3 and mypy should be installed before the session.
This session aims to teach the key theoretical concepts behind machine learning, and offers hands-on training in applying machine learning techniques using PyTorch, along with guidance on structuring resilient and sustainable machine learning code.
We will cover both regression and classification, learning about key concepts and applying them in parallel exercises. Once complete participants will have a good framework for building, training, and running neural nets that could be adapted for their own applications.
We will demonstrate the application of machine learning with examples from the geoscience domain.
Required Pre-Reading: To make the most of the session we expect participants to arrive with a (minimal) base-level understanding of machine learning concepts. In addition to this we will also assume knowledge of some basic mathematics and python abilities.
Pre-requisites:
Participants will have the choice of executing the material on Colab or locally on their own system. The latter will require familiarity with virtual environments and code deployment.
Mathematics and Machine Learning
We will not focus on the mathematics of ML too heavily but we expect some familiarty with calculus (differentiating a function), matrix algebra (matrix multiplication and representing data as a matrix) and the concept of regression (fitting a function to data)
Neural Networks
High level concepts can be obtained by watching the the [video series by 3Blue1Brown](https://www.3blue1brown.com/topics/neural-networks), at least chapters 1-3.
Python
The course will be taught in python using pyTorch. Whilst no prior knowledge of pyTorch is expected we assume users are familiar with the basics of Python3 which includes:
- Basic mathematical operations
- Writing and using functions
The concept of object orientation
i.e. that an object, e.g. a dataset, can have associated functions/methods associated with it.
Basic use of the following libraries:
- numpy for mathematical and array operations
- matplotlib for ploting and visualisation
- pandas for storing and accessing tabular data
- Familiarity with the concept of a jupyter notebook.
A key focus of many scientific computing domains at present is how to use machine learning to enhance and accelerate traditional simulations. Climate science is no exception, with this topic being part of all VESRI projects. To achieve coupling between ML and numerical models presents a number of technical and scientific challenges, however. FTorch is a library developed by ICCS to couple PyTorch-based machine learning models to Fortran code with the aim of reducing the burden on scientific researchers. It has already been used in DataWave and M2LInES projects and further afield. In this workshop we will introduce FTorch and review its capabilities before taking participants through the process of coupling a PyTorch model into a Fortran code bin a practical demonstration. There may also be time for questions/discussion from those seeking to use FTorch in their work, and the developers will be available for code-clinics and discussions throughout the week. Further information can be found in this video or this video.
Pre-requisites:
- A python installation. Preferably with pytorch pip installed in advance
- CMake installed
- Compilers (the gnu suite would be ideal)
- A Fortran Compiler
- A C compiler
- A C++ compiler
- Internet access
- Windows users are encouraged to use Windows Subsystem for Linux, or review the Windows guidance on the FTorch documentation in advance.
Have you ever found yourself in a position where your code feels slow but you can't quite put your finger on it.
- is it the new system your running on?
- the new dependencies installed by your system admin?
- or that new awesome feature you pushed to main branch last week without tests 😳 ?
Pre-requisites:
- Bring a code that you would like to profile (we can provide example code but it's always better to use your own)
- No need to install any profilers or tools prior to the workshop
- Access to a Unix machine would be ideal (if using Windows, please install Windows Subsystem for Linux WSL)
- Optional - score-p/cube, valgrind, clang/gnu compiler, tau profiler, python
This introductory tutorial provides a comprehensive overview of the core features and capabilities of the Julia programming language, designed for participants with a foundational understanding of programming concepts. We begin with an introduction to Julia and the interactive Pluto Notebook environment, followed by an exploration of functions, primary and composite data types, generic programming through multiple dispatch, and more. Afterwards, the tutorial provides several study cases to delve into applications of Julia in scientific computing and machine learning. The last part will be a hands-on lab to build an Earth energy balance model and train a neural network to solve its differential equation.
Pre-requisites:
- Basic experience in programming
- A Julia installation with Pluto.jl. Please following the setup instructios on the material for this session
- Some knowledge in calculus and linear algebra would be desirable