0

DAComp DE

Fresh

DAComp-DE is a benchmark of 110 data engineering tasks that require repository-level engineering on industrial schemas, including designing and building multi-stage SQL pipelines

Type
RL Env
Runtime
ORS
License
unknown
Size
110 tasks
Published
Mar 2026

Cite

Notes

Only stored in your browser.

DAComp-DE

OpenReward Environment

Description

DAComp-DE (Data Agent Competition — Data Engineering) is an environment for evaluating AI agents on multi-stage data engineering tasks. Agents build, extend, or design dbt-style SQL pipelines.

  • DE-Impl (30 tasks): Build a complete SQL pipeline from scratch (staging → intermediate → marts).
  • DE-Evol (50 tasks): Modify or extend an existing pipeline to meet new requirements.
  • DE-Arch (30 tasks): Design a comprehensive data architecture blueprint in YAML.

Capabilities

  • SQL pipeline construction (DuckDB, dbt-style layers)
  • Repository exploration and modification
  • Data architecture design (YAML blueprints)
  • Python scripting and data tooling

Compute Requirements

  • Sandbox: 2 CPU / 4GB memory per session
  • LLM evaluation: OpenAI API access (gpt-5-mini) for DE-Arch scoring only

License

MIT License

Tasks

Sub-typeSplitCountDescription
DE-Impltest30Build SQL pipeline from scratch
DE-Evoltest50Extend existing SQL pipeline
DE-Archtest30Design architecture blueprint

Reward Structure

DE-Impl/Evol (Deterministic, 0–100 scale)

Row-hash multiset comparison of each table against gold DuckDB, with layer-weighted scoring:

  • Staging: 15%
  • Intermediate: 25%
  • Marts: 60%

DE-Arch (LLM-judged, 0–100 scale)

LLM evaluates YAML blueprint against rubric with evidence-based scoring.

Data

  • Source: HuggingFace (dacomp-de, dacomp-de-gold)
  • DE: 110 task repositories, 80 gold DuckDB databases, 30 architecture rubrics

Tools

ToolDescription
bashExecute bash commands in the sandbox (Python, SQL, DuckDB, file I/O)
submitSubmit work for evaluation (YAML for DE-Arch, triggers pipeline run for DE-Impl/Evol)

Time Horizon

Multi-turn. DE-Impl: 20–50 tool calls, DE-Evol: 10–30, DE-Arch: 5–15.

Environment Difficulty

Even state-of-the-art agents achieve success rates under 20% on DE-Impl/Evol.

Other Environment Requirements

  • OpenAI API key for DE-Arch LLM evaluation
  • OpenReward API key for sandbox access

Safety

Tasks involve synthetic/public data engineering schemas. No sensitive personal data. Sandboxes are network-isolated.

Citations

@misc{lei2025dacomp,
      title={DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle},
      author={Fangyu Lei and Jinxiang Meng and Yiming Huang and Junjie Zhao and Yitong Zhang and Jianwen Luo and Xin Zou and Ruiyi Yang and Wenbo Shi and Yan Gao and Shizhu He and Zuo Wang and Qian Liu and Yang Wang and Ke Wang and Jun Zhao and Kang Liu},
      year={2025},
      eprint={2512.04324},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.04324},
}