Coupling Machine Learning to Fortran using the FTorch Library

Jack Atkinson

Senior Research Software Engineer
ICCS - University of Cambridge

The ICCS Team and Collaborators (see end)

2024-07-10

Precursors

Slides and Materials

To access links or follow on your own device these slides can be found at:
jackatkinson.net/slides

Licensing

Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.

Vectors and icons by SVG Repo under CC0(1.0) or FontAwesome under SIL OFL 1.1

Preparation

Exercise 0 – getting on the same page

Before we start I want to be sure everyone has a working system.

As we move through the following exercise if you experience any issues please ask for help and we will aim to sort you out before the main exercises.


In terminal navigate to to the workshop directory, set up a python virtual environment as you prefer, and install the workshop dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install ./


Navigate to exercises/exercise_00/ where you will see a python and fortran code.

Python

pytorchnet.py defines a net SimpleNet that takes an input vector of length 5 and multiplies it by 2.

Note:

  • the nn.Module class
  • the forward() method

Running:

python pytorch_net.py

should produce the output:

Input is  tensor([0., 1., 2., 3., 4.]).
Output is tensor([0., 2., 4., 6., 8.]).

Compilers and CMake

First we will check that we have Fortran, C and C++ compilers installed:

Running:

gfortran --version
gcc --version
g++ --version

Should produce output similar to:

GNU Fortran (Homebrew GCC 14.1.0_1) 14.1.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.

and not:

bash: command not found: gfortran


Later on we will also need CMake.

To check this is installed run:

cmake --version

and verify it is >= 3.1.

Fortran

The file hello_fortran.f90 contains a program to take an input array and call a subroutine to multiply it by two before printing the result.

The subroutine is contained in a separate module math_mod.f90, however.

We can compile both of these to produce .o object files and a .mod module files using:

gfortran -c math_mod.f90 hello_fortran.f90

We then link these together into an executable ftn_prog using:

gfortran -o ftn_prog hello_fortran.o math_mod.o

Running this as:

./ftn_prog

should produce the output

 Hello, World!
 Input:     0.00000000        1.00000000        2.00000000        3.00000000        4.00000000
 Output:    0.00000000        2.00000000        4.00000000        6.00000000        8.00000000

Motivation

Machine Learning in Science

We typically think of Deep Learning as an end-to-end process;
a black box with an input and an output1.

Who’s that Pokémon?

\[\begin{bmatrix}\vdots\\a_{23}\\a_{24}\\a_{25}\\a_{26}\\a_{27}\\\vdots\\\end{bmatrix}=\begin{bmatrix}\vdots\\0\\0\\1\\0\\0\\\vdots\\\end{bmatrix}\] It’s Pikachu!

Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.

Machine Learning in Science

Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.

Challenges

  • Reproducibility
    • Ensure net functions the same in-situ
  • Re-usability
    • Make ML parameterisations available to many models
    • Facilitate easy re-training/adaptation
  • Language Interoperation

Language interoperation

Many large scientific models are written in Fortran (or C, or C++).
Much machine learning is conducted in Python.

Mathematical Bridge by cmglee used under CC BY-SA 3.0
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.”
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Efficiency

We consider 2 types:

Computational

Developer

At the academic end of research both have an equal effect on ‘time-to-science’.

  • Don’t re-writing nets after you have already trained them
  • Scientists are not all not computer scientists
    • Should be simple to learn and deploy
    • May not have access to extensive software support
  • HPC environments want minimal additional dependencies
  • Needs to be as efficient as possible

How it Works

(Py)Torch

Torch

  • an open-source deep learning framework
  • developed at EPFL in Switzerland
  • written in C with a LUA interface

PyTorch

  • an open-source deep-learning framework
  • developed by Meta AI, now part of the Linux Foundation
  • written in C++ with a Python interface
  • port of Torch (ATen), but also includes Caffe2 etc.

Torch and PyTorch logos under Creative Commons

FTorch

Python
env

Python
runtime

xkcd #1987 by Randall Munroe, used under CC BY-NC 2.5

libtorch and FTorch

  • libtorch is a C++ library providing an interface into the underlying PyTorch
  • FTorch binds to this using the iso_c_bindings adding Fortranic features
  • We utilise shared memory (on cpu) reducing data transfer overheads

Installing FTorch

Clone the git repository

FTorch is available from GitHub:
github.com/Cambridge-ICCS/FTorch

With supporting documentation at:
cambridge-iccs.github.io/FTorch/

To get a copy of FTorch on your system run:

git clone https://github.com/Cambridge-ICCS/FTorch.git

which will create a directory FTorch/ with the contents.

Get Libtorch (optional)

Libtorch is available from the PyTorch homepage:

We can use the version of libtorch that comes with pip-installed PyTorch (see docs).
Standalone libtorch removes Python and aids reproducibility, especially on HPC systems.

Build the code

The source code for FTorch is contained in the src/ directory. This contains:

  • ctorch.cpp - Bindings to the libtorch C++ API
  • ctorch.h - header file to bind to from Fortran
  • ftorch.f90 - the Fortran routines we will be calling


These are compiled, joined together, and installed using CMake.

  • tool to simplify build process for users
  • accomodate different machines and setups
  • controlled by the CMakeLists.txt file

Build the code

To keep things clean we will do all of our building from a self-contained build/ directory.

cd FTorch/src/
mkdir build
cd build

We can now execute CMake to build here using the code in the above directory:

cmake .. -DCMAKE_BUILD_TYPE=Release

Build the code

In reality we often need to specify a number of additional options:

cmake .. -DCMAKE_BUILD_TYPE=Release \
>        -DCMAKE_Fortran_COMPILER=gfortran \
>        -DCMAKE_C_COMPILER=gcc \
>        -DCMAKE_CXX_COMPILER=g++ \
>        -DCMAKE_PREFIX_PATH=<path/to/libtorch/> \
>        -DCMAKE_INSTALL_PREFIX=<path/to/install/ftorch/>


Notes:

  • The Fortran compiler should match that being used to build your code.
  • We need gcc >= 9
  • If debugging set build type to Debug for -g
  • Prefix path is wherever libtorch is on your system
  • Install prefix can be set to anywhere. Defaults may require root/admin access

Build the code

Assuming everything proceeds successfully CMake will generate a Makefile for us.


We can run this locally using1:

cmake --build .

If we specified a particular location to install FTorch we can do this by running2:

cmake --install .


The above two commands can be combined into a single option using:

cmake --build . --target install

The FTorch installation

Installation will place files in CMAKE_INSTALL_PREFIX/:

  • include/ contains header and mod files
  • lib/ contains cmake and library files
    • this could be called lib64/ on some systems
    • UNIX will use .so files whilst Windows has .dll files

Basic coupling

Exercise 1

Now that we have FTorch installed on the system we can move to writing code that uses it, needing only to link to our installation at compile and runtime.


We will start of with a basic example showing how to couple code in Exercise 1:

  1. Design and train a PyTorch model.
  2. Save PyTorch model to TorchScript
  3. Write Fortran using FTorch to call saved model
  4. Compile and run code, linking to FTorch

PyTorch

Examine exercises/exercise_01/simplenet.py.


This contains a contrived PyTorch model with a single nn.Linear layer that will multiply the input by two.


With our virtual environment active we can test this by running the code with1:

python3 simplenet.py
Input:  tensor([0., 1., 2., 3., 4.])
Output: tensor([0., 2., 4., 6., 8.])

Saving to TorchScript

To use the net from Fortran we need to save it to TorchScript.


FTorch comes with a handy utility pt2ts.py to help with this located at
FTorch/utils/pt2ts.py.


We will now copy this across to exercise_01/, modify it, and run it to save our code to TorchScript.

Saving to TorchScript

Notes:

  • there are handy TODO comments where you need to adapt the code
  • pt2ts.py expects an nn.Module subclass with a forward() method
  • there are two options to save:
    • tracing using torch.jit.trace()
      This passes a dummy tensor throgh the model recording operations.
      It is the simplest approach.
    • scripting using torch.jit.script()
      This converts Python code directly to TorchScript.
      It is more complicated, but neccessary for advanced features and/or control operations.
  • A summary of the TorchScript model can be printed from Python

Calling from Fortran

We are now in a state to use our saved TorchScript model from within Fortran.

exercises/exercise_01/simplenet_fortran.f90 contains a skeleton code with a Fortran arrays to hold input data for the net, and the results returned.


We will modify it to create the neccessary data structures and load and call the net.

  • import the ftorch module
  • create torch_tensors and a torch_model to hold the data and net
  • map Fortran data from arrays to the torch_tensors
  • call torch_model_forward to run the net
  • clean up

Calling from Fortran

Notes:

  • See the solution file simplenet_fortran_sol.f90 for an ideal code.
  • for more information on the subroutines and API see the online API documentation

Building the code

Once we have modified the Fortran we need to compile the code and link it to FTorch.

This is done in exactly the same way as you would compile and link to any other library, for example NetCDF.

By hand this can be done with:

gfortran -I<path/to-ftorch>/include/ftorch -L<path/to-ftorch>/lib -lftorch -c simplenet_fortran.f90
gfortran -I<path/to-ftorch>/include/ftorch -o simplenet_fortran simplenet_fortran.o -L<path/to-ftorch>/lib -lftorch


In reality, however, we would use a Makefile or CMake. examples of both of these are included.

Make:

  • Modify the makefile to set location of ftorch
  • run make

CMake:

  • create a build directory and cuild using CMake similarly to the main library

Running the code

To run the code we can use the generated executable:

./simplenet_fortran
   0.00000000       2.00000000       4.00000000       6.00000000       8.00000000

Further Details

Exercise 2: Larger code considerations

What we have considered so far is a simple contrived example designed to teach the basics.

However, in reality the codes we will be using are more complex, and full of terrors.

  • We will be calling the net repeatedly over the course of many iterations
  • Reading in the net and weights from file is expensive
    • Don’t do this at every step!

Exercise 2: Larger code considerations

In exercise 2 we will look at an example of how to ideally structure a slightly more complex code setup. For those familiar with climate models this may be nothing new. We make use of the traditional separation into:

  • initialisation,
  • update, and
  • finalise subroutines.

Exercise 2

Navigate to the exercises/exercise_02/ directory.

Here you will see 2 code directories, good/ and bad/.

Both perform the same operation:

  • running the simplenet from example 1 10,000 times
  • increment the input vector at each step
  • accumulate the sum of the output vector

Exercise 2

The exercise is the same for both folders:

  • run the pre-prepared pt2ts.py script to save the net.
  • inspect the code to see how it works:
    • both have a main program in simplenet_fortran.f90
    • both have FTorch code extracted to a module fortran_ml_mod.f90
      • bad is in a single routine
      • good is split into init, iter, and finalise
  • modify the Makefile to link to FTorch and build the codes
  • time the codes and observe the difference

Multiple inputs and outputs

Supply as an array of tensors, innit.

GPU Acceleration

  • FTorch automatically has access to GPU acceleration through the PyTorch backend
  • When running pt2ts.py save model on GPU
    • Guidance provided in the file
  • When creating torch_tensors set the device to torch_kCUDA instead of torch_kCPU1
  • To target a specific device supply the device_index argument
  • CPU-GPU cannot avoid data transfer. Use MPI_GATHER() to reduce
  • For more details see:

Applications and Case Studies

MiMA - proof of concept

  • The origins of FTorch
    • Emulation of existing parameterisation
    • Coupled to an atmospheric model using forpy in Espinosa et al. (2022)1
    • Prohibitively slow and hard to implement
    • Asked for a faster, user-friendly implementation that can be used in future studies.


  • Follow up paper using FTorch: Uncertainty Quantification of a Machine Learning Subgrid-Scale Parameterization for Atmospheric Gravity Waves (Mansfield and Sheshadri 2024)
    • “Identical” offline networks have very different behaviours when deployed online.

ICON

  • Icosahedral Nonhydrostatic Weather and Climate Model
    • Developed by DKRZ (Deutsches Klimarechenzentrum)
    • Used by the DWD and Meteo-Swiss
  • Interpretable multiscale Machine Learning-Based Parameterizations of Convection for ICON (Heuer et al. 2023)1
    • Train U-Net convection scheme on high-res simulation
    • Deploy in ICON via FTorch coupling
    • Evaluate physical realism (causality) using SHAP values
    • Online stability improved when non-causal relations are eliminated from the net

CESM - Bias Correction

Work by Will Chapman of NCAR/M2LInES

  • As representations of physics models have inherent, sometimes systematic, biases.

  • Run CESM for 9 years relaxing hourly to ERA5 observation (data assimilation)

  • Train CNN to predict anomaly increment at each level

    • targeting just the MJO region
    • targeting globally
  • Apply online as part of predictive runs

CESM coupling

  • The Community Earth System Model
  • Part of CMIP (Coupled Model Intercomparison Project)
  • Make it easy for users
    • FTorch integrated into the build system (CIME)
    • libtorch is included on the software stack on Derecho
      • Improves reproducibility

Derecho by NCAR

Future Work

  • Online learning
  • Automatic differentiation via torch.autograd
  • MPS, XPU, and other GPU device support

Thanks

ICCS Research Software Engineers:

  • Chris Edsall - Director
  • Marion Weinzierl - Senior
  • Jack Atkinson - Senior
  • Matt Archer - Senior
  • Tom Meltzer - Senior
  • Surbhi Ghoel
  • Tianzhang Cai
  • Joe Wallwork
  • Amy Pike
  • James Emberton
  • Dominic Orchard - Director/Computer Science

Previous Members:

  • Paul Richmond - Sheffield
  • Jim Denholm - AstraZeneca

FTorch:

  • Jack Atkinson
  • Simon Clifford - Cambridge RSE
  • Athena Elafrou - Cambridge RSE, now NVIDIA
  • Elliott Kasoar - STFC
  • Joe Wallwork
  • Tom Meltzer

MiMA

  • Minah Yang - NYU, DataWave
  • Dave Conelly - NYU, DataWave

CESM

  • Will Chapman - NCAR/M2LInES
  • Jim Edwards - NCAR
  • Paul O’Gorman - MIT, M2LInES
  • Judith Berner - NCAR, M2LInES
  • Qiang Sun - U Chicago, DataWave
  • Pedram Hassanzadeh - U Chicago, DataWave
  • Joan Alexander - NWRA, DataWave

Thanks for Listening

For more information please book an ICCS code clinic, speak to me afterwards, or drop me a message.

The ICCS received support from

References

Espinosa, Zachary I, Aditi Sheshadri, Gerald R Cain, Edwin P Gerber, and Kevin J DallaSanta. 2022. “Machine Learning Gravity Wave Parameterization Generalizes to Capture the QBO and Response to Increased CO2.” Geophysical Research Letters 49 (8): e2022GL098174.
Heuer, Helge, Mierk Schwabe, Pierre Gentine, Marco A Giorgetta, and Veronika Eyring. 2023. “Interpretable Multiscale Machine Learning-Based Parameterizations of Convection for ICON.” arXiv Preprint arXiv:2311.03251.
Mansfield, Laura A, and Aditi Sheshadri. 2024. “Uncertainty Quantification of a Machine Learning Subgrid-Scale Parameterization for Atmospheric Gravity Waves.” Authorea Preprints.