Prompting GPT-5 on Scrum Certification Questions: An Empirical Accuracy Study

Large Language Models (LLMs) are increasingly used in Agile Software Development for documentation, coaching, and training. As practitioners adopt these tools to prepare for certifications such as Professional Scrum Master (PSM), a key question is whether LLMs can reliably reason about Scrum, a framework with normative, well-defined rules described in the Scrum Guide (2020). This paper examines how different prompt techniques affect the factual accuracy of LLM responses to Scrum certification-style questions. A dataset of 993 validated PSM-aligned questions was answered by GPT-5 using three techniques: zero-shot, chain-of-thought, and with-source citation. All prompts achieved certification-level accuracy above 85%, with the citation-based variant performing best (89.1%) and yielding the lowest error rate. Correct answers concentrated in well-defined topics, such as Definition of Done, Events, and Product Backlog Management, and in single-answer multiple-choice items, while multi-select questions and more interpretive areas, such as Scrum Team and Product Value, were less stable. Among questions where at least one prompt failed (16.2%), errors clustered into misalignment with the Scrum Guide (28%), content outside its scope (34%), and outdated or biased interpretations (38%). Overall, prompt techniques produced modest but consistent improvements, particularly in reducing misinterpretation and version drift, supporting more reliable use of LLMs in Agile learning and certification preparation.