View on GitHub

SSC Coding Guidelines

AI-assisted research software development

Disclaimer: We try to keep this updated, but due to the pace of developments in the field, we cannot promise that the latest developments have been considered.

For a list of AI-assisted code contribution policies, please take a look at this repo.

In the planning and possible solution research phase

In the prototyping phase: reach of the software = 1 person

The below are minimal best practices for any software that impacts at most one person. As soon as results from that software are published, the impact level is higher than one!

In the development phase: reach of the software > 1 person

Here the software achieves a higher reach and impact. As soon as the software and its produced output impact more than one person, the following applies in addition to the above:

Please make sure to also check the ethical and legal concerns below, as they adress the important topics of AI slop and copyright.

Additional best practices for research software development

Code review

You have the responsibility for the generated code and should review it carefully. You can also ask the AI to review code critically. You can also iterate roles with the AI, that it suggests code, you refactor the code, and then the AI reviews the code to scan for issues.

Code documentation

It is perfectly applicable to use AI to help you generate a good documentation for your project. Make sure that the resulting documentation is factually correct, references are included and that it is not overly verbose or written in an “AI tone”. This requires some effort from your side to refactor and review all the generated text. It is further a good practice to note if you used AI for support with this, see above.

The underlying models for the AI coding tools were trained on large amounts of open-source libraries. These libraries are published open-source under a certain license. Now the model can reproduce code that exactly matches code in these libraries. It is therefore advisable to verify that generated code does not violate copyright or licenses of code or subject matter that was part of the training data. In practice, this is quite difficult to ensure. GitHub Copilot does have a mode where it will refrain from including suggestions that match public code:

alt text

Additionally, there is an ongoing debate whether code produced by software achieves the originality standard to be protected under copyright. Copyright may be achievable if the prompting of the AI tool is sufficiently original. See for example here.

Ethical use

AI assistance can lead to a large amount of content or code being produced in a very short time. This is currently referred to as “AI slop”. The intentional creation of AI slop is widely considered unethical, due to several factors:

  1. The creator is responsible for the content and should aim at the best possible quality;
  2. mass low-quality content can spam and deter from original and high-quality content;
  3. it may create an additional burden on others, that are for some reason forced to consume the content, for example in open-source collaborative work.

To reduce AI slop, review the created content critically and remove anything that is stating the obvious. Include your reasoning and not the AI summary. It is better to be clear and succinct than to achieve perfect grammar.

It is very easy to include unintentional AI slop:

  1. By adding in unrelated changes to the problem that you are currently trying to solve, thus creating an incomprehensive line of development;
  2. by over-reliance on the AI tool for example in the automatic generation of commit messages and Pull Request Summaries.

Prompt suggestions and agent instructions

It is a good practice to use somewhat standardized instructions for projects. There are different ways that you can tell your AI tool or agent how it should behave:

  1. Copilot prompt: Place them in the .github folder as copilot-instructions.md (see here). For an example, look at Apache Airflow.

  2. Codex prompt: Place an AGENT.md file in the respective folder as described here.

  3. Claude CLI prompt: Place a CLAUDE.md file in your repository root. An example can be found here.

  4. Antigravity prompt: Place the instructions in .agents/rules folder as described here.

In most tools, you can place a AGENT.md file in your repository root. It is also possible to define skills and provide further context like here. Be aware that this poses further security risks.