2024-07-10
To access links or follow on your own device these slides can be found at:
jackatkinson.net/slides
Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.
Vectors and icons by SVG Repo under CC0(1.0) or FontAwesome under SIL OFL 1.1
Before we start I want to be sure everyone has a working system.
As we move through the following exercise if you experience any issues please ask for help and we will aim to sort you out before the main exercises.
In terminal navigate to to the workshop directory, set up a python virtual environment as you prefer, and install the workshop dependencies:
Navigate to exercises/exercise_00/
where you will see a python and fortran code.
pytorchnet.py
defines a net SimpleNet
that takes an input vector of length 5 and multiplies it by 2.
Note:
nn.Module
classforward()
methodRunning:
should produce the output:
First we will check that we have Fortran, C and C++ compilers installed:
Running:
Should produce output similar to:
The file hello_fortran.f90
contains a program to take an input array and call a subroutine to multiply it by two before printing the result.
The subroutine is contained in a separate module math_mod.f90
, however.
We can compile both of these to produce .o
object files and a .mod
module files using:
We then link these together into an executable ftn_prog
using:
We typically think of Deep Learning as an end-to-end process;
a black box with an input and an output1.
Who’s that Pokémon?
\[\begin{bmatrix}\vdots\\a_{23}\\a_{24}\\a_{25}\\a_{26}\\a_{27}\\\vdots\\\end{bmatrix}=\begin{bmatrix}\vdots\\0\\0\\1\\0\\0\\\vdots\\\end{bmatrix}\] It’s Pikachu!
Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.
Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.
Many large scientific models are written in Fortran (or C, or C++).
Much machine learning is conducted in Python.
Mathematical Bridge by cmglee used under CC BY-SA 3.0
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.”
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
We consider 2 types:
Computational
Developer
At the academic end of research both have an equal effect on ‘time-to-science’.
Torch and PyTorch logos under Creative Commons
Python
env
Python
runtime
xkcd #1987 by Randall Munroe, used under CC BY-NC 2.5
iso_c_bindings
adding Fortranic featuresFTorch is available from GitHub:
github.com/Cambridge-ICCS/FTorch
With supporting documentation at:
cambridge-iccs.github.io/FTorch/
To get a copy of FTorch on your system run:
which will create a directory FTorch/
with the contents.
Libtorch is available from the PyTorch homepage:
We can use the version of libtorch that comes with pip-installed PyTorch (see docs).
Standalone libtorch removes Python and aids reproducibility, especially on HPC systems.
The source code for FTorch is contained in the src/
directory. This contains:
ctorch.cpp
- Bindings to the libtorch C++ APIctorch.h
- header file to bind to from Fortranftorch.f90
- the Fortran routines we will be calling
These are compiled, joined together, and installed using CMake.
CMakeLists.txt
fileTo keep things clean we will do all of our building from a self-contained build/
directory.
We can now execute CMake to build here using the code in the above directory:
In reality we often need to specify a number of additional options:
cmake .. -DCMAKE_BUILD_TYPE=Release \
> -DCMAKE_Fortran_COMPILER=gfortran \
> -DCMAKE_C_COMPILER=gcc \
> -DCMAKE_CXX_COMPILER=g++ \
> -DCMAKE_PREFIX_PATH=<path/to/libtorch/> \
> -DCMAKE_INSTALL_PREFIX=<path/to/install/ftorch/>
Notes:
Debug
for -g
Assuming everything proceeds successfully CMake will generate a Makefile
for us.
We can run this locally using1:
If we specified a particular location to install FTorch we can do this by running2:
The above two commands can be combined into a single option using:
Installation will place files in CMAKE_INSTALL_PREFIX/
:
include/
contains header and mod fileslib/
contains cmake and library files
lib64/
on some systems.so
files whilst Windows has .dll
filesNow that we have FTorch installed on the system we can move to writing code that uses it, needing only to link to our installation at compile and runtime.
We will start of with a basic example showing how to couple code in Exercise 1:
Examine exercises/exercise_01/simplenet.py
.
This contains a contrived PyTorch model with a single nn.Linear
layer that will multiply the input by two.
With our virtual environment active we can test this by running the code with1:
Input: tensor([0., 1., 2., 3., 4.])
Output: tensor([0., 2., 4., 6., 8.])
To use the net from Fortran we need to save it to TorchScript.
FTorch comes with a handy utility pt2ts.py
to help with this located at
FTorch/utils/pt2ts.py
.
We will now copy this across to exercise_01/
, modify it, and run it to save our code to TorchScript.
Notes:
TODO
comments where you need to adapt the codept2ts.py
expects an nn.Module
subclass with a forward()
methodtorch.jit.trace()
torch.jit.script()
We are now in a state to use our saved TorchScript model from within Fortran.
exercises/exercise_01/simplenet_fortran.f90
contains a skeleton code with a Fortran arrays to hold input data for the net, and the results returned.
We will modify it to create the neccessary data structures and load and call the net.
ftorch
moduletorch_tensors
and a torch_model
to hold the data and nettorch_tensors
torch_model_forward
to run the netNotes:
simplenet_fortran_sol.f90
for an ideal code.Once we have modified the Fortran we need to compile the code and link it to FTorch.
This is done in exactly the same way as you would compile and link to any other library, for example NetCDF.
By hand this can be done with:
gfortran -I<path/to-ftorch>/include/ftorch -L<path/to-ftorch>/lib -lftorch -c simplenet_fortran.f90
gfortran -I<path/to-ftorch>/include/ftorch -o simplenet_fortran simplenet_fortran.o -L<path/to-ftorch>/lib -lftorch
In reality, however, we would use a Makefile
or CMake. examples of both of these are included.
Make:
make
CMake:
To run the code we can use the generated executable:
0.00000000 2.00000000 4.00000000 6.00000000 8.00000000
What we have considered so far is a simple contrived example designed to teach the basics.
However, in reality the codes we will be using are more complex, and full of terrors.
In exercise 2 we will look at an example of how to ideally structure a slightly more complex code setup. For those familiar with climate models this may be nothing new. We make use of the traditional separation into:
Navigate to the exercises/exercise_02/
directory.
Here you will see 2 code directories, good/
and bad/
.
Both perform the same operation:
The exercise is the same for both folders:
pt2ts.py
script to save the net.simplenet_fortran.f90
fortran_ml_mod.f90
Supply as an array of tensors, innit.
pt2ts.py
save model on GPU
torch_tensor
s set the device to torch_kCUDA
instead of torch_kCPU
1device_index
argumentMPI_GATHER()
to reduceforpy
in Espinosa et al. (2022)1
Work by Will Chapman of NCAR/M2LInES
As representations of physics models have inherent, sometimes systematic, biases.
Run CESM for 9 years relaxing hourly to ERA5 observation (data assimilation)
Train CNN to predict anomaly increment at each level
Apply online as part of predictive runs
libtorch
is included on the software stack on Derecho
Derecho by NCAR
torch.autograd
ICCS Research Software Engineers:
Previous Members:
FTorch:
MiMA
CESM
For more information please book an ICCS code clinic, speak to me afterwards, or drop me a message.
Get in touch:
The ICCS received support from