Best practices¶
Time limits¶
By default, a job will continue running for 360 minutes (6 hours) before being cancelled. This can be extremely wasteful in cases where the code has stalled, for example. As such, it is good practice to provide a shorter time limit after which the job will be cancelled. This should be an over-estimate, so that the job will still pass when the code is working as expected.
Some example GitHub Actions workflow syntax for implementing a 10 minute time
limit on a job called test-ubuntu-serial with the ubuntu-latest runner
is as follows:
jobs:
test-ubuntu-serial:
runs-on: ubuntu-latest
timeout-minutes: 10
# <Further job definition>
In most cases, your job will hopefully complete and pass within the allotted time limit. However, there may be cases where this doesn’t happen. This could happen due to:
There is an intermittent or random issue (e.g., connection issue to an web resource, hardware malfunction, cosmic ray). In such cases, it may be sufficient to retry the workflow manually by clicking the red cross indicating the workflow failure, selecting the offending job, and clicking the ‘Re-run jobs’ button near the top of the page. Then select ‘Re-run failed jobs’. (See the Rerun only failed tests for more details.)
The job has stalled due to an issue on your branch. Addressing this will require debugging your change.
You made a change that just causes your tests or docs build to require longer to run. In this case, you’ll need to increase the time limit.
Concurrency¶
Suppose you or one of your collaborators has triggered a GitHub Actions
workflow and it has jobs that are still running. If someone triggers the same
workflow again (e.g., by pushing a commit) then it usually doesn’t make sense
for the original workflow to continue running, given that the subsequent one
corresponds to the more recent version of the branch. In such cases, it can make
sense to configure the concurrency so that in-progress jobs will be cancelled.
This can be achieved with the following syntax:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
Fail-fast¶
Sometimes workflow jobs contain several steps that are executed in series, for example both formatting checks and test suite runs. For such jobs, it is important to order the steps such that failure is detected as soon as possible, i.e., a fail-fast policy. Assuming that formatting checks execute signficantly faster than running a test suite, failing fast is achieved by putting the formatting check before the test suite run.
A good general ordering is as follows:
Formatting/style/linting checks
Other static analysis checks
Unit test suites
System/integration test suites
Triggers¶
Suppose you want to make a small change such as fixing typos or updating Markdown files. In a naive CI setup, this will trigger the full suite, involving tasks such as running tests, building documentation, and running static analysis tools. This can be extremely wasteful, especially for large repositories with long-running test suites. One way to avoid this is to make use of triggers. You might be familiar with triggers related to pushing to particular branches, pushes to open PRs, and manual triggering from the Actions tab, as demonstrated below:
name: MyTestSuite
on:
# Triggers the workflow on pushes to the "main" branch, i.e., PR merges
push:
branches: [ "main" ]
# Triggers the workflow on pushes to open pull requests
pull_request:
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
In the following, we restrict attention to pull_request triggers because it
usually makes sense to run the full workflow when merging into main and the
workflow_dispatch option already allows for finer-grained control over
specific jobs. Returning to the case of updating a Markdown file, the test suite
above will still run if a commit is pushed to an open PR doing so. To avoid this
unnecessary test run, we can specify a list of paths for files that would
trigger the test suite when they are run. For example, in Python we could use:
name: MyPythonTestSuite
on:
# Triggers the workflow on pushes to open pull requests with code changes
pull_request:
paths:
- '.github/workflows/test_suite_python.yml'
- '**.py'
- 'requirements.txt'
where test_suite_python.yml is the name of the workflow file itself. Here
the workflow will only be run if the file itself, any Python source files, or
the repository’s requirements.txt dependencies file change.
For compiled languages such as C, it often also is a good idea to include files related to the build system. For example:
name: MyCTestSuite
on:
# Triggers the workflow on pushes to open pull requests with code changes
pull_request:
paths:
- '.github/workflows/test_suite_c.yml'
- '**.c'
- '**.h'
- '**CMakeLists.txt'
for source code with extension .c, header files with extension .h, and
CMake build files.
Warning
Often administrators configure repository settings such that certain (or all)
workflows are required to pass (on the latest commit) before a PR is merged.
In this case, untriggered workflows can become problematic. One way to get
around this issue is to include pull_request_review triggers to any
workflows that need to be run before merging so that they are triggered
whenever a review is received. A downside of this workaround is that it can
itself lead to unnecessary workflow runs.
Separation of concerns¶
The information on failing fast above is useful for workflows containing several stages, but further improvements can be made if those stages are independent of one another. We can do this by applying a separation-of-concerns approach to split the workflow into several workflows and accounting for triggers in each of them.
In the example case of a Python code that uses the ruff static analysis tool for linting and formatting, the triggers could take the form:
name: MyPythonStaticAnalysisWorkflow
on:
# Triggers the workflow on pushes to open pull requests with code changes
pull_request:
paths:
- '.github/workflows/static_analysis_python.yml'
- '**.py'
where static_analysis_python.yml is the filename of the workflow. Here, the
workflow will only be triggered when commits are pushed that include changes to
the workflow configuration or Python source code.
In the example case where Fortran documentation is built using FORD, the triggers could take the form:
name: BuildDocs
on:
# Triggers the workflow on pushes to open pull requests to main with documentation changes
pull_request:
paths:
- '.github/workflows/build_docs_ford.yml'
- '**.md'
- 'pages/*'
The above can be extended to separate out CPU vs. GPU test suites, test suites on different operating systems (e.g., Ubuntu, Mac, Windows), and JOSS paper rendering, for example. Having separated concerns in this way, the overall number of jobs can be reduced, provided the contributor doesn’t modify several different parts of the repository in the same change.
Note
In some cases it can be a good thing for contributors to edit multiple different types of files in the same change. For example, it is good practice to update documentation in line with changes to source code.
Skip CI¶
GitHub Actions supports manually skipping of CI workflows that would be
triggered by push or pull_request by including any of the following
strings in a commit message:
[skip ci][ci skip][no ci][skip actions][actions skip]
The same warning applies as mentioned in the Triggers section. As such, you should not use this notation in the final commit included in a PR before requesting reviews.
See the GitHub documentation page for more details.
Rerun only failed tests¶
When re-running tests in GitHub Actions from the Actions tab, there are two options: ‘Re-run all tests’ and ‘Re-run failed tests’. The latter option is the more energy efficient and so is preferred.
Some testing frameworks support similar features when running tests locally. For
example, Pytest has pytest --last-failed (or
pytest --lf) and CTest has
ctest --rerun-failed. These can cut down both the energy consumption and
turn-around time when debugging code or test changes.
Debugging¶
If your code fails during a CI run, it sometimes can be hard to find the issue without trying and pushing a series of fixes, which in turn will trigger the CI to run each time - and thus waste energy. If the issue is not in the test suite which can be easily rerun (for example, using Pytest, a tool like act can be used to run your CI pipeline locally in a container.
Alternatively, you can interact with your GitHub actions using action-tmate (with tmate being a fork of tmux). This enables you to use SSH to connect with the machine that the actions are run on.
Test PRs¶
If you have set up triggers properly, you should look at the way you are debugging/changing PRs. Rather than making small changes to a big PR and checking whether your CI runs through, it can be more energy efficient to separate out a smaller PR with just those changes. This will then (hopefully) trigger a smaller set of tests being rerun instead of all those that were affected in the big PR.