The AutoFix patchflow aims to automatically fix code vulnerabilities in your repository.

How to run?

You can run it as follows:

patchwork AutoFix

by default you will need to provide the openai_api_key and the github_api_key you can pass them as arguments:

patchwork AutoFix openai_api_key=<Your_API_KEY> github_api_key=<Your_GH_Token>

What it does?

The AutoFix patchflow will first scan the code in your repository using an open-source scanner Semgrep. It will then take the vulnerabilities that are detected by the scan and create a prompt to be sent to gpt-3.5-turbo to generate the fix for the vulnerabilities. You can check the default prompt template. The generated fixes are then committed to the repository under a new branch and finally a pull request is created for the user to review and merge the changes.

Configuration

The following are the default configurations that can be modified by the user to adapt the AutoFix patchflow to their needs. All the options can be set both via CLI arguments and and the yaml config file.

Model

You can choose any LLM API as long as it has an OpenAI API compatible chat completions endpoint. Just update the default values of the following options:

- model: gpt-3.5-turbo
- client_base_url: https://api.openai.com/v1

E.g. to use Meta’s CodeLlama model from HuggingFace you can set:

client_base_url: https://api-inference.huggingface.co/models/codellama/CodeLlama-70b-Instruct-hf/v1
model: codellama/CodeLlama-70b-Instruct-hf
model_temperature: 0.2
model_top_p: 0.95
model_max_tokens: 2000

and pass your HuggingFace token in the openai_api_key option.

You can also use llama.cpp to run inference on CPU locally. Just install the llama-cpp-python package and run their OpenAI compatible web server as described here with the command:

python3 -m llama_cpp.server --hf_model_repo_id TheBloke/deepseek-coder-6.7B-instruct-GGUF --model 'deepseek-coder-6.7b-instruct.Q4_K_M.gguf' --chat_format chatml

Once the local server is running you can set:

client_base_url: http://localhost:8000/v1
model: TheBloke/deepseek-coder-6.7B-instruct-GGUF
model_temperature: 0.2
model_top_p: 0.95
model_max_tokens: 1000

and use the local model for inference.

Context size

By default we chunk the code to context_size tokens to pass on to the LLM. You can change the default value by setting:

context_size: 1000

in general we have found that a larger context_size doesn’t necessarily lead to better fixes.

Semgrep extra args

You can pass any additional arguments to the Semgrep scanner using the semgrep_extra_args option as follows:

semgrep_extra_args: --config auto

SARIF input

We can use any SAST scanner that can export the results in SARIF format. Just set the sarif_file_path to the SARIF file and AutoFix will use the information there to generate the fixes. Otherwise, we will do a scan using Semgrep.

sarif_file_path: /path/to/sarif/file

Vulnerability limit

By default we process and fix only vulnerability_limit number of issues. This is to avoid making large number of LLMs calls and to keep the generated PR manageable. You can set the value to your preference:

vulnerability_limit: 10

set this value to a negative number (e.g. -1) to fix all vulnerabilities found in the scan.

Severity

You can also set the severity of the vulnerabilities that you want to fix. Severity is derived from the information available in the SARIF file and can take values of ‘unknown’, ‘note’/‘info’, ‘warning’/‘low’, ‘medium’, ‘error’/‘high’, and ‘critical’. E.g.

severity: medium

means that all medium and above vulnerabilities will be fixed.

Compatibility

You can also set the compatibility threshold of the suggested fixes to be included in the pull request. Compatibility can take values of ‘unknown’, ‘low’, ‘medium’, and ‘high’. E.g.

compatibility: medium

means that all fixes with mediuam and above threshold will be included in the PR.

Manage PRs

In addition, there are options to let you manage the PRs as you like, by setting a branch_prefix, or disabling the creation of new branches with disable_branch (commits will be made on the current branch). You can also disable PR creation with disable_pr or force push commits to existing PR with force_pr_creation.

branch_prefix: Auto-Generated-Fixes
disable_branch: false
disable_pr: false
force_pr_creation: false

Scanner

By default we use the open-source Semgrep scanner that comes with a community rule set for finding vulnerabilities. You can also do a semgrep login before running the AutoFix patchflow to get access to your own custom or pro rules if you have a Semgrep account (available for free). In addition, you can use any SAST scanner that outputs results in the standard SARIF format. Just pass your scan results with the following:

sarif_file_path: /path/to/your/sarif.json

Prompt template

You can update the default prompt template. Note the use of variables {{messageText}} and {{affectedCode}}. They are generated by the steps within the AutoFix patchflow and replaced by the actual values during the execution. Also, remember to keep the output format as given in the default prompt as it is used to extract information that is used by the steps after the model response is processed. The following output format is expected at the moment:

A. Commit message:
<brief summary of the diff>

B. Change summary:
<description of the changes made in the diff>

C. Compatibility Risk:
<Low, Medium, High> 

D. Fixed Code:
<original code with the vulnerability now fixed>

Examples

Here are some example PRs generated with the AutoFix patchflow: