DAComp-DE

Description

DAComp-DE (Data Agent Competition — Data Engineering) is an environment for evaluating AI agents on multi-stage data engineering tasks. Agents build, extend, or design dbt-style SQL pipelines.

DE-Impl (30 tasks): Build a complete SQL pipeline from scratch (staging → intermediate → marts).
DE-Evol (50 tasks): Modify or extend an existing pipeline to meet new requirements.
DE-Arch (30 tasks): Design a comprehensive data architecture blueprint in YAML.

Capabilities

SQL pipeline construction (DuckDB, dbt-style layers)
Repository exploration and modification
Data architecture design (YAML blueprints)
Python scripting and data tooling

Compute Requirements

Sandbox: 2 CPU / 4GB memory per session
LLM evaluation: OpenAI API access (gpt-5-mini) for DE-Arch scoring only

License

MIT License

Tasks

Sub-type	Split	Count	Description
DE-Impl	test	30	Build SQL pipeline from scratch
DE-Evol	test	50	Extend existing SQL pipeline
DE-Arch	test	30	Design architecture blueprint

Reward Structure

DE-Impl/Evol (Deterministic, 0–100 scale)

Row-hash multiset comparison of each table against gold DuckDB, with layer-weighted scoring:

Staging: 15%
Intermediate: 25%
Marts: 60%

DE-Arch (LLM-judged, 0–100 scale)

LLM evaluates YAML blueprint against rubric with evidence-based scoring.

Data

Source: HuggingFace (dacomp-de, dacomp-de-gold)
DE: 110 task repositories, 80 gold DuckDB databases, 30 architecture rubrics

Tools

Tool	Description
`bash`	Execute bash commands in the sandbox (Python, SQL, DuckDB, file I/O)
`submit`	Submit work for evaluation (YAML for DE-Arch, triggers pipeline run for DE-Impl/Evol)

Time Horizon

Multi-turn. DE-Impl: 20–50 tool calls, DE-Evol: 10–30, DE-Arch: 5–15.

Environment Difficulty

Even state-of-the-art agents achieve success rates under 20% on DE-Impl/Evol.

Other Environment Requirements

OpenAI API key for DE-Arch LLM evaluation
OpenReward API key for sandbox access

Safety

Tasks involve synthetic/public data engineering schemas. No sensitive personal data. Sandboxes are network-isolated.

Citations

@misc{lei2025dacomp,
      title={DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle},
      author={Fangyu Lei and Jinxiang Meng and Yiming Huang and Junjie Zhao and Yitong Zhang and Jianwen Luo and Xin Zou and Ruiyi Yang and Wenbo Shi and Yan Gao and Shizhu He and Zuo Wang and Qian Liu and Yang Wang and Ke Wang and Jun Zhao and Kang Liu},
      year={2025},
      eprint={2512.04324},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.04324},
}