GenerateDocstring
Generate docstrings for your code using LLMs
The GenerateDocstring patchflow automatically documents methods in your code by generating docstrings.
How to run?
You can run it as follows:
patchwork GenerateDocstring
by default you will need to provide the openai_api_key
and the github_api_key
you can pass them as arguments:
patchwork GenerateDocstring openai_api_key=<Your_API_KEY> github_api_key=<Your_GH_Token>
What it does?
The GenerateDocstring patchflow starts from a base_path
(default is the current directory), recursively traverse the directory to extract methods from the source code files. It will then use them to create a prompt to be sent to gpt-3.5-turbo
to update the source code with the docstrings. You can check the default prompt template. The udpated files are then committed to the repository under a new branch and finally a pull request is created for the user to review and merge the changes.
Configuration
The following are the default configurations that can be modified by the user to adapt the DependencyUpgrade patchflow to their needs. All the options can be set both via CLI arguments and and the yaml config file.
Model
You can choose any LLM API as long as it has an OpenAI API compatible chat completions endpoint. Just update the default values of the following options:
- model: gpt-3.5-turbo
- client_base_url: https://api.openai.com/v1
E.g. to use Meta’s CodeLlama model from HuggingFace you can set:
client_base_url: https://api-inference.huggingface.co/models/codellama/CodeLlama-70b-Instruct-hf/v1
model: codellama/CodeLlama-70b-Instruct-hf
model_temperature: 0.2
model_top_p: 0.95
model_max_tokens: 2000
and pass your HuggingFace token in the openai_api_key
option.
You can also use llama.cpp to run inference on CPU locally. Just install the llama-cpp-python package and run their OpenAI compatible web server as described here with the command:
python3 -m llama_cpp.server --hf_model_repo_id TheBloke/deepseek-coder-6.7B-instruct-GGUF --model 'deepseek-coder-6.7b-instruct.Q4_K_M.gguf' --chat_format chatml
Once the local server is running you can set:
client_base_url: http://localhost:8000/v1
model: TheBloke/deepseek-coder-6.7B-instruct-GGUF
model_temperature: 0.2
model_top_p: 0.95
model_max_tokens: 1000
and use the local model for inference.
Base Path
You can set the base path from where to begin the docstring generation. E.g.
base_path: ./GitHub/example-python/
Rewrite Existing
You can choose if you want to retain the existing docstrings or generate new ones. E.g.
rewrite_existing: true
means that existing docstrings in methods would be replaced by new ones that are generated.
Manage PRs
In addition, there are options to let you manage the PRs as you like, by setting a branch_prefix
, or disabling the creation of new branches with disable_branch
(commits will be made on the current branch). You can also disable PR creation with disable_pr
or force push commits to existing PR with force_pr_creation
.
branch_prefix: Auto-Generated-Docstrings
disable_branch: false
disable_pr: false
force_pr_creation: false
Prompt template
You can update the default prompt template. The basic prompt that generates the docstrings is with "id": "generate_docstring"
. Note the use of variable {{affectedCode}}
. During the GenerateDocstring patchflow the actual value of this variable is filled in during the execution. The expected output response is complete content of the file with the docstrings.
Examples
Here are some example PRs generated with the GenerateDocstring patchflow: