View on GitHub

SSC Coding Guidelines

General Guidelines

The general guidelines provide an overview of sustainable software development and good scientific practice in research software engineering. They provide a general framework which researchers should adhere to when writing scientific software. However, each programming language introduces its own flavor and so the tools that are recommended can be language-specific. Please check the guidelines for the language that you are using. Currently, C++, Python and Julia are available. Please open an issue for requesting guides for further programming languages or other updates.

Version control

Any software that is actively developed should be under version control. Version control allows you to keep track of any changes to your software and review the history at any time. It also allows you to track issues and work collaboratively. A version-control system such as git for your software should be the default. To host your remote repository, different providers are available such as GitHub, GitLab, and Bitbucket.

Content of a repository

A repository is the central storage location for software source code, documentation and other related files. Through a repository and a version-control system, changes to the software are being tracked and managed.

A software repository should contain the following files:

The following files are optional files:

GitHub etiquette

If you use GitHub or GitLab, you should use the respective etiquette as appropriate. A standard way of developing software under git version control is git-flow; a lightweight version of this and recommended and predominantly used by the SSC is github-flow.

Never push directly to the main branch! The git and GitHub workflow entail making changes in branches, running (and passing) automated checks (see below) like code linter, code formatter, code quality reviewer, unit tests, further tests, code test coverage, … before merging with main. A merge with main is preceded by a Pull Request where teams and collaborators can review code, and that trigger the automated checks. These means are meant to keep the main branch functional and deliverable at all times.

Commiting Code: Commit messages

When committing code to the repository, use meaningful commit messages that explain others (and yourself) what you did and can be used to “tell a story”. The commits should be grouped in Pull Requests (PR) that ideally serves only one purpose. The changes in the line of development in the PR should be summarized in the PR description. You may also link issues during the PR.

A way to structure your commit messages is using conventional commits. When you merge PRs that have multiple commits (as they have usually) with the main branch, it is recommended that you squash and merge to group the commits in this line of development together.

Continuous integration (CI) / continuous delivery (CD)

It is recommended to use CI in Pull (Merge) requests, to run automated checks that ensure your code is adhering to certain standards and passing tests. Different CI/CD integration tools are available, such as GitHub actions, GitLab CI, Jenkins, and Travis CI, to name a few.

Depending on the scope and computational demands of your software, checks that can be included entail pre-commit, Sonarcloud, codecov, and unit/regression/system/… tests (see the language-specific recommendations). You may also integrate dependabot alerts and snyk for automated PRs about new dependency releases and vulnerabilities. GitGuardian is a tool that helps you keep your code and environment secrets safe.

To learn about CI and CD on GitHub, see here. GitHub actions are free for public repositories, and underly a quota for private repositories. One of the big advantages of CI is that you can run checks in different environments (operating systems, dependency versions, etc).

Documentation

Good documentation is essential for ensuring users know what to do with the code, and are applying it to the correct use cases. It will save time in the long run, as code by itself is rarely obvious, and it will help outline dependencies. Depending on the programming language used, we recommend different tools like doxygen, sphinx, or using markdown/a wiki.

The documentation can be hosted on ecosystem-wide platforms, for example, readthedocs for Python projects, or more generally using GitHub pages or Gitlab. To learn how to set up GitHub pages, see here.

The documentation should be structured in Tutorials, How-to guides, API references, and Explanations (see Diataxis documentation guide). The essential content of a documentation entails:

Testing

(Automated) testing ensures that errors are detected early and that results are reproducible. Implementing the tests and keeping them updated requires effort, but will pay off in the long run. When planning your software, you can also make use of a test-driven development philosophy which makes structuring your code easier. Keep in mind that reproducibility is a requirement for good scientific practices. Automated testing allows for continuous integration/continuous delivery (deployment) (CI/CD).

Code review

Code review - involving your collaborators in any changes you made to the code and vice versa - increases the chance of errors being detected early. Furthermore, it also aids knowledge transfer and keeps you up to date on what features are being implemented. Code review is easy to incorporate in your work flow through GitHub pull requests.