Developers deal with code-change-related tasks daily, e.g., reviewing code.
Pre-trained code and code-change-oriented models have been adapted to help
developers with such tasks. Recently, large language models (LLMs) have shown
their effectiveness in code-related tasks. However, existing LLMs for code
focus on general code syntax and semantics rather than the differences between
two code versions. Thus, it is an open question how LLMs perform on
code-change-related tasks.
To answer this question, we conduct an empirical study using \textgreater 1B
parameters LLMs on three code-change-related tasks, i.e., code review
generation, commit message generation, and just-in-time comment update, with
in-context learning (ICL) and parameter-efficient fine-tuning (PEFT, including
LoRA and prefix-tuning). We observe that the performance of LLMs is poor
without examples and generally improves with examples, but more examples do not
always lead to better performance. LLMs tuned with LoRA have comparable
performance to the state-of-the-art small pre-trained models. Larger models are
not always better, but Llama2 and CodeLlama families are
always the best. The best LLMs outperform small pre-trained models on the code
changes that only modify comments and perform comparably on other code changes.
We suggest future work should focus more on guiding LLMs to learn the knowledge
specific to the changes related to code rather than comments for
code-change-related tasks.
Exploring the Capabilities of LLMs for Code Change Related Tasks
Empirical study shows that large language models perform poorly on code-change-related tasks without examples, but some parameter-efficient fine-tuning methods can improve their performance, and models like Llama~2 and Code~Llama outperform smaller models on these tasks.
- Year
- 2024
- Venue
- arXiv 2024
- Authors
- 6
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2407.02824ARXIV-DEFAULT
- TL;DR
- Semantic Scholar