QCircuitBench

Description

QCircuitBench is an environment for evaluating an agent's ability to design and implement quantum algorithms. Agents are given quantum computing tasks (e.g., Bernstein-Vazirani, Grover's search, Shor's algorithm) and must produce correct quantum circuit implementations in OpenQASM 3.0 with Qiskit-based post-processing code.

This OpenReward implementation is ported from the Harbor Framework implementation originally made by Estel Yang.

Capabilities

Designing quantum circuits for standard algorithms
Implementing circuits in OpenQASM 3.0
Writing post-processing code with Qiskit and AerSimulator
Debugging and testing quantum programs

Compute Requirements

Agents are given a sandboxed environment with bash access, file editing tools, and a Qiskit runtime. Sandbox size is 1 CPU and 2 GB RAM.

License

CC BY 4.0.

Tasks

There is one split in this environment:

Test: 28 quantum circuit tasks

Tasks cover algorithms including Bernstein-Vazirani, Deutsch-Jozsa, Grover's search, Shor's factoring, quantum Fourier transform, Simon's algorithm, and others, at varying qubit sizes.

Reward Structure

This is a multi-turn, sandbox-based environment. The agent writes a solution.py file containing OpenQASM 3.0 code, then calls submit_answer to trigger verification. The verifier parses the agent's quantum circuit and computes state fidelity against a ground truth circuit using Qiskit's state_fidelity function.

1.0: The agent's circuit produces a quantum state identical to the expected state.
0.0-1.0: Partial credit based on state fidelity between the agent's output and the ground truth.
0.0: Invalid QASM syntax, missing solution, or circuit produces incorrect state.

Data

Each task directory contains an instruction.md with the problem specification and a tests/ directory with verification scripts. Task data is stored on the OpenReward platform.

Tools

Tool	Description
`bash`	Execute shell commands in the sandbox.
`str_replace`	Replace a unique string in a file.
`view`	View file contents or list directory contents.
`create_file`	Create a new file with specified content.
`submit_answer`	Submit work for automated verification. Triggers test execution and returns reward.

Time Horizon

QCircuitBench is a multi-turn environment. Agents read task instructions, write quantum circuits and post-processing code, test their solutions, and submit for verification.

Environment Difficulty

QCircuitBench is a challenging benchmark. The original paper evaluates LLMs on quantum algorithm design and finds that semantic correctness (producing functionally correct circuits) remains difficult even for frontier models:

Model	QASM Syntax (5-shot)	Semantic Correctness (5-shot)
GPT-4o	0.578	0.201
Llama3-8B	0.460	0.032
Qwen 2.5	0.431	0.100
DeepSeek-R1	0.177	0.010
Human baseline	0.686	0.137

LLMs exhibit consistent error patterns in quantum algorithm design, and fine-tuning does not always outperform few-shot learning.

Other Environment Requirements

There are no further environment requirements; QCircuitBench works out of the box with the OpenReward endpoint without any external API keys.

Safety

Agents in QCircuitBench write and execute quantum computing code in a sandboxed environment. The environment does not present direct safety risks.

Citations

@inproceedings{yang2024qcircuitbench,
  author    = {Yang, Rui and Wang, Ziruo and Gu, Yuntian and Chen, Tianyi and Liang, Yitao and Li, Tongyang},
  title     = {QCircuitBench: A Large-Scale Dataset for Benchmarking Quantum Algorithm Design},
  booktitle = {NeurIPS 2025 Datasets and Benchmarks Track},
  year      = {2024},
  url       = {https://arxiv.org/abs/2410.07961}
}