An Introduction to Generative AI for RSEs

Basic principles and tooling

Tom Meltzer
Matt Archer

2026-03-30

Plan

  • ML basics
  • Transformer inference and agentic AI overview
  • Tooling overview and code demo
  • Opencode intro and configuration
  • Create your tool call with MCP + Skills

ML: The basics

  • Loss function: measures model error
  • Gradient descent: moves weights down the loss surface
  • Backpropagation: computes gradients layer by layer
  • Overfitting: memorises training data, fails to generalise

ML: The basics

%%{init: {"flowchart": {"nodeSpacing": 15, "rankSpacing": 18}, "theme": "base", "themeVariables": {"edgeLabelBackground": "#ffffff"}}}%%
flowchart TD
    A([Training data]) --> B["Forward pass"]
    B --> C[Compute loss]
    C --> D[Backpropagation]
    D --> E[Update weights]
    E -->|repeat| B
    C -->|loss small enough| F([Done ✓])

    classDef io     fill:#6c757d,stroke:#495057,color:#fff
    classDef step   fill:#4b9cd3,stroke:#2c6e9e,color:#fff
    classDef key    fill:#003b6f,stroke:#001f3f,color:#fff

    class A,F io
    class B,C step
    class D,E key

LLMs: Tokenisation

LLMs cannot process raw text, it must first be converted to numbers.

  • Text is split into sub-word tokens using a learned vocabulary
  • Each token is assigned a unique integer ID
  • Common words are single tokens; rare words split into pieces

“I went to the”

235285 “I” 3806 “▁went” 576 “▁to” 573 “▁the”

LLMs: Embeddings encode meaning

The embedding matrix maps tokens to vectors; directions encode meaning.

Attention enriches each vector with context: bank (river) vs bank (finance)

3Blue1Brown, Deep Learning Ch. 5

Transformers: Inference I

Transformers: Inference I

Convert text to tokens

Transformers: Inference I

Convert text to tokens

Predict new tokens

Transformers: Inference I

Convert text to tokens

Predict new tokens

Convert tokens to text

Transformers: Inference II

Transformers: Inference II

“I went to the”

I ▁went ▁to ▁the

Transformers: Inference II

“backpropagation”

▁back prop agation

3 tokens

Transformers: Inference II

Transformers: Inference II

Transformers: Inference II

Token ID Embedding (2048 dims)
I 235285 [ 0.21, -0.83, 0.54, 0.12, … ]
▁went 3806 [ -0.44, 0.31, 0.09, -0.77, … ]
▁to 576 [ 0.67, 0.02, -0.51, 0.38, … ]
▁the 573 [ 0.55, -0.19, 0.73, -0.02, … ]

Transformers: Inference II

Transformers: Inference II

transformer block

Transformers: Inference II

transformer block

Self-attention

I went to the

“I” ↔︎ “went”: subject-verb   |   “went” → “to”: verb-preposition

Self-attention

“The bank by the river was steep”

“bank” attends strongly to “river” - meaning is of a riverbank, not financial

Transformers: Inference II

Transformers: Inference II

▁the → [ 0.55, -0.19, 0.73, … ]

× unembedding matrix (2048 × 256k)

→ logits for every token in vocab

→ softmax → sample

Token Probability
▁library 31%
▁store 18%
▁park 12%
▁doctor 8%

Only “the” matters - it has been contextualised by attention.

Transformers: Inference II

Transformers: Inference II

Transformers: Inference II

[235285] [3806] [576] [573] → 4376 “▁library”

[235285] [3806] [576] [573] [4376] → 736 “▁this”

full sequence fed back in each loop

Transformers: Inference II

Transformers: Inference II

[235285, 3806, 576, 573, 4376, 736]

↓ vocab lookup

“I went to the library this”

just a lookup table, the inverse of tokenisation

Transformers: Summary

  • Tokenise: text → sub-word token IDs
  • Embed: token IDs → dense vectors (static meaning)
  • Self-attention: enrich each vector with context (dynamic meaning)
      • MLP × N layers: transform representations
  • Predict & sample: last token’s vector × unembedding matrix → next token ID
  • Autoregressive loop: append token, feed full sequence back in
  • Decode: token IDs → text (lookup table)

Transformers: Summary

  • The model is completely stateless
  • All context is in the text fed to it, there is no memory
  • Each forward pass re-processes the full sequence
  • Longer contexts are more expensive: attention is O(n²)

LLM Hello World

import os
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM

login(token=os.environ["HF_API_KEY"], add_to_git_credential=True)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")

model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")

input_text = "I went to the"
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids, max_new_tokens=10, do_sample=True, top_p=0.9)
print(tokenizer.decode(outputs[0]))

LLM Hello World

import os
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM

login(token=os.environ["HF_API_KEY"], add_to_git_credential=True)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")

model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")

input_text = "I went to the"
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids, max_new_tokens=10, do_sample=True, top_p=0.9)
print(tokenizer.decode(outputs[0]))
  • huggingface_hub / transformers: The platform and library where the ML community collaborates on models, datasets, and applications.

LLM Hello World

import os
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM

login(token=os.environ["HF_API_KEY"], add_to_git_credential=True)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")

model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")

input_text = "I went to the"
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids, max_new_tokens=10, do_sample=True, top_p=0.9)
print(tokenizer.decode(outputs[0]))
  • huggingface_hub / transformers: The platform and library where the ML community collaborates on models, datasets, and applications.
  • Login: Register with Hugging Face and obtain a key to download hosted models.

LLM Hello World

import os
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM

login(token=os.environ["HF_API_KEY"], add_to_git_credential=True)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")

model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")

input_text = "I went to the"
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids, max_new_tokens=10, do_sample=True, top_p=0.9)
print(tokenizer.decode(outputs[0]))
  • huggingface_hub / transformers: The platform and library where the ML community collaborates on models, datasets, and applications.
  • Login: Register with Hugging Face and obtain a key to download hosted models.
  • Load tokenizer & model: Downloads Google Gemma-2b and its matching tokenizer.

LLM Hello World

import os
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM

login(token=os.environ["HF_API_KEY"], add_to_git_credential=True)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")

model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")

input_text = "I went to the"
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids, max_new_tokens=10, do_sample=True, top_p=0.9)
print(tokenizer.decode(outputs[0]))
  • huggingface_hub / transformers: The platform and library where the ML community collaborates on models, datasets, and applications.
  • Login: Register with Hugging Face and obtain a key to download hosted models.
  • Load tokenizer & model: Downloads Google Gemma-2b and its matching tokenizer.
  • Tokenise input: Converts your text into a tensor of token IDs the model can read.

LLM Hello World

import os
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM

login(token=os.environ["HF_API_KEY"], add_to_git_credential=True)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")

model = AutoModelForCausalLM.from_pretrained("google/gemma-2b")

input_text = "I went to the"
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids, max_new_tokens=10, do_sample=True, top_p=0.9)
print(tokenizer.decode(outputs[0]))
  • huggingface_hub / transformers: The platform and library where the ML community collaborates on models, datasets, and applications.
  • Login: Register with Hugging Face and obtain a key to download hosted models.
  • Load tokenizer & model: Downloads Google Gemma-2b and its matching tokenizer.
  • Tokenise input: Converts your text into a tensor of token IDs the model can read.
  • Generate & decode: Model predicts tokens autoregressively; tokenizer converts them back to text.

Agentic AI

AI Agents are essentially some scaffolding / loop around LLM(s):

  • LLM: The reasoning engine. It can plan, evaluate, or decide whether to “act” or “answer.”
  • System Prompt: Defines the persona, available tools, and operational boundaries.
  • Working memory: Maintains the state, including history, tool outputs, and the current goal.
  • Tools: External capabilities like web search, code execution, APIs, or connecting to RAG.

Agentic AI

%%{init: {"flowchart": {"nodeSpacing": 40, "rankSpacing": 80}, "theme": "base", "themeVariables": {"edgeLabelBackground": "#ffffff"}}}%%
flowchart LR
    S([System prompt]) --> C
    H([Prompt]) --> C["Context window"]
    C --> L["LLM: generate text"]
    L --> D{Tool call?}
    D -->|yes| T["Tools (incl. RAG)"]
    T -->|result appended| C
    D -->|no| O([Output])

    classDef io      fill:#6c757d,stroke:#495057,color:#fff
    classDef core    fill:#003b6f,stroke:#001f3f,color:#fff
    classDef support fill:#4b9cd3,stroke:#2c6e9e,color:#fff
    classDef decision fill:#e9c46a,stroke:#f4a261,color:#000

    class S,H,O io
    class L core
    class C,T support
    class D decision

Self Study Resources

Further Reading

Foundational concepts and Transformers

Further Reading

Reinforcement Learning & Alignment

Gen-AI Concerns

  • Safety
  • Ethical
  • Environmental

Safety

And these are just from a software perspective…

Ethical

Environmental

Opinion

  • GenAI usage has parallels to HPC
  • If genAI can help science – I want to make it:
    • greener
    • safer
    • more ethical

Tools and Workflows

Opencode (CLI)

In this half of the training we will make use of opencode

Concepts can also be applied to similar tools e.g., VSCode, GitHub Copilot CLI etc.

Opencode Installation

Installation instructions here

  • Linux
curl -fsSL https://opencode.ai/install | bash
  • Mac
brew install anomalyco/tap/opencode
  • Windows (download .exe)

Opencode Configuration

We now need to configure opencode to run self-hosted LLMs

  1. Add API key to .basrhc (or equivalent) e.g.,
    export CAMLLM_API_KEY=sk-XXXXXXXXXXXXXXXXXXXXXX
  2. Configure opencode (see next slide)

Note

If you already have access to UoC’s LiteLLM (https://llm.hpc.cam.ac.uk) you can create one from the virtual keys page: Virtual Keys \(\rightarrow\) Create New Key.

Opencode Configuration

  • edit/create ~/.config/opencode/opencode.json
~/.config/opencode/opencode.json
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "cam-llm": {
      "options": {
        "baseURL": "https://llm.hpc.cam.ac.uk/v1",
        "apiKey": "{env:CAMLLM_API_KEY}"
      },
      "models": {
        "mistralai/Devstral-2-123B-Instruct-2512": {
          "name": "mistralai/Devstral-2-123B-Instruct-2512"
        },
        "Qwen/Qwen3-VL-30B-A3B-Instruct": {
          "name": "Qwen/Qwen3-VL-30B-A3B-Instruct",
          "modalities": { "input": ["text", "image"], "output": ["text"] }
        }
      }
    }
  },
  "permission": {
    "bash": { "*": "ask" },
    "edit": { "*": "allow" }
  }
}

Context Engineering

  • LLMs are powerful, but suffer from context bloat
  • Context window is finite resource
  • LOTR + Hobbit ~ 750k tokens / 100k LOC ~ 1M tokens
Model Name Context Size
Claude 4.6 Opus 1M
Gemini 3.1 Pro 1M – 10M
GPT-5.3-Codex 400k
Devstral-2-123B-Instruct-2512 256k

Solution

To resolve this issue, Anthropic open-sourced 2 methods:

MCP

  • Open-source standard
  • Connect LLMs to external systems

MCP examples

For example, opencode supports 11 built-in skills (see docs)

Note

LLMs can answer questions, but cannot interact with your system.

MCP example (add)

We will build our own using fastMCP

mcp-numbers.py
from fastmcp import FastMCP

mcp = FastMCP(name="mcp-numbers")

@mcp.tool
def add(a: int, b: int) -> int:
  """Add two numbers"""
  return a + b

if __name__ == "__main__":
  mcp.run()

MCP example (add)

  1. Now let’s add mcp-numbers to our opencode configuration
  2. Follow instructions in mcp/README.md
  3. Running /status in opencode should display
  4. Try Use mcp tool "numbers_add" to add 4 and -1

MCP examples (netcdf)

  • What about a more interesting example…
  • Can we give LLM power to inspect netcdf .nc files?
  • Let’s try with MCP.

MCP examples (netcdf)

  • Inspect file mcp/mcp-netcdf.py
mcp/mcp-netcdf.py
# /// script
# dependencies = [
#   "netCDF4",
#   "fastmcp",
# ]
# ///

import netCDF4
from fastmcp import FastMCP

# Initialize the FastMCP server
mcp = FastMCP("nc-mcp")


@mcp.tool()
def get_variables(path: str) -> str:
    """
    Reads a NetCDF file from the given path and returns its variables.

    Args:
        path: The absolute or relative path to the NetCDF (.nc) file.

    Returns:
        A string representation of the NetCDF file's variables.
    """
    try:
        # Open the dataset
        dset = netCDF4.Dataset(path)

        # Capture the variables as a string to return to the client
        variables_output = ", ".join(dset.variables.keys())

        # Close the dataset to free up resources
        dset.close()

        return variables_output

    except FileNotFoundError:
        return f"Error: Could not find the file at path: {path}"
    except Exception as e:
        return f"Error reading NetCDF file: {str(e)}"


@mcp.tool()
def get_variable_shape(path: str, variable_name: str) -> dict:
    """
    Reads a NetCDF file from the given path and returns the shape of a specific
    variable.

    Args:
        path: The absolute or relative path to the NetCDF (.nc) file.
        variable_name: The name of the variable to get the shape for.

    Returns:
        A dictionary containing the shape of the specified variable.
        Example: {'temperature': (365, 180, 360)}
        Returns an error if the variable is not found.
    """
    pass


if __name__ == "__main__":
    mcp.run()

MCP examples (netcdf)

  • mcp/mcp-netcdf.py contains 2 MCP tools
    • netcdf_get_variables
    • netcdf_get_variable_shape (to be implemented)
  • Try using netcdf_get_variables on file simple.nc

MCP examples (netcdf)

  • Implement netcdf_get_variable_shape
  • See stub in mcp/mcp-netcdf.py
mcp/mcp-netcdf.py
@mcp.tool()
def get_variable_shape(path: str, variable_name: str) -> dict:
    """
    Reads a NetCDF file from the given path and returns the shape of a specific
    variable.
    ...
    """
    pass

(15 minutes for exercise)

Skills

  • Define reusable behavior via SKILL.md definitions
  • Agent skills let LLMs discover reusable instructions
  • Skills are loaded on-demand
  • Skills are “just” markdown files

Anatomy of a Skill

  • Many genAI tools support skills e.g., Claude code, opencode, codex etc.

Note

opencode requires that skills are stored in a specific set of locations (A full list can be found here). We will focus on these:

  • Project config: .opencode/skills/<name>/SKILL.md
  • Global config: ~/.config/opencode/skills/<name>/SKILL.md
<name>/               # Required: unique skill name
├── SKILL.md          # Required: instructions + metadata
├── scripts/          # Optional: executable code
├── references/       # Optional: documentation
└── assets/           # Optional: templates, resources

Skills Example (netcdf)

  • Let’s refactor our netcdf MCP tool as a skill
  • Follow instructions in skill/README.md:
cd project/root/GenAI-teaching
mkdir -p .opencode/skills/netcdf
ln -sf $(pwd)/skill/netcdf/SKILL.md .opencode/skills/netcdf/
  • Run /skills in opencode to check registration

Skills Example (netcdf)

skill/netcdf/SKILL.md
---
name: netcdf-processing
description: Use this skill for any operations involving NetCDF (.nc) files, including inspecting metadata, reading variable shapes, extracting data slices, or generating new NetCDF datasets.
---

# What I do

This skill provides guidance for inspecting and generating NetCDF files using
standard command-line utilities. Use these commands to understand dataset
structures before writing extraction scripts.

# When to use this skill

Use this skill whenever a user mentions climate data, multidimensional arrays,
.nc files, or atmospheric datasets.

## Workflow Decision Tree

- **Inspecting Schema**: Use `ncdump -h` first to understand dimensions.
- **Data Access**: If the file is large, only request specific variable slices (don't read entire arrays into context).
- **Creating Files**: Use `ncgen` for small CDL templates or `netCDF4` Python scripts for large datasets.

## Viewing Metadata with `ncdump`
`ncdump` is the standard tool for converting NetCDF binary files into
human-readable text (CDL format).

* **View Header Only (Recommended):** Displays dimensions, variables, and attributes without printing raw data.
    ```bash
    ncdump -h filename.nc
    ```
* **View Specific Variable:** Look at the data for a single variable (e.g., 'temperature').
    ```bash
    ncdump -v temperature filename.nc
    ```
* **Coordinate Formatting:** Use `-c` to see the header plus the values of coordinate variables (lat, lon, time).
    ```bash
    ncdump -c filename.nc
    ```

## Creating Files with `ncgen`
`ncgen` takes a text-based CDL file and compiles it into a binary `.nc` file.

* **Generate Binary from CDL:**
    ```bash
    ncgen -o output_file.nc input_text.cdl
    ```

Skills Example (netcdf)

Try running the following command

Note

Disable netcdf MCP server before trying to test the skill. They may conflict.

Skills Exercise

  • Create your own SKILL.md
  • Register it in opencode
  • Try using it

(15 minutes for exercise)

Skills vs. MCP

So how do I choose between Skills vs MCP?

MCP Server SKILL.md (Instruction)
Primary Purpose Tool calling – Interact with external services or perform short actions. Domain Expertise – Provides workflows, rules, and domain knowledge.
Context/Loading Loaded immediately into context window (regardless of query) reducing effective context window size. Lazy loaded when needed. Will still impact context window.
Timeout Timeout ~ 1-2 minutes. Ideal for short, quick function calls No timeout.

Taking it further

  • Live demo of profiling skill

Taking it further

  • opencode and other genAI tools often support agents/sub-agents (see docs)
  • Agents are specialized AI assistants that can be configured for specific tasks and workflows
  • They allow you to create focused tools with custom prompts, models, and tool access
  • More markdown 👀

Sub-Agent

  • Let’s create a sub-agent to generate PR messages
  • Use opencode agent create
  • Try creating your own
  • Modify it and see what difference it makes

(15 minutes for exercise)

Thanks for listening

References

Achiam, Joshua. 2018. Spinning up in Deep Reinforcement Learning. OpenAI. https://spinningup.openai.com.
DeepLearning.AI. 2024. Build and Train an LLM with JAX. Online course, DeepLearning.AI. https://learn.deeplearning.ai/courses/build-and-train-an-llm-with-jax/lesson/gy364z/introduction.
Hugging Face. 2022. Deep Reinforcement Learning Course. Online course. https://huggingface.co/learn/deep-rl-course.
Karpathy, Andrej. 2022. Neural Networks: Zero to Hero. YouTube playlist. https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ.
Sanderson, Grant. 2017. Neural Networks. YouTube playlist, 3Blue1Brown. https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi.
Stanford University. 2021. CS25: Transformers United. Stanford University Course. https://web.stanford.edu/class/cs25/.