DeepSynth
Description
DeepSynth is an environment for evaluating program synthesis from input-output examples. Agents are given examples of integer and integer-list transformations and must write a Python function that implements the transformation, generalizing beyond the shown examples.
Capabilities
- Pattern recognition from input-output examples
- Program synthesis / function induction
- Integer and list manipulation
- Generalization from examples to unseen inputs
Compute Requirements
No special compute requirements. Lightweight non-sandbox environment.
Tasks
- Train split: 1,353 tasks (T=1: 21, T=2: 464, T=3: 455, T=4: 413)
- Test split: 792 tasks (T=1: 24, T=2: 99, T=3: 487, T=4: 96, T=5: 86)
- Difficulty levels T=1 (single operation) through T=5 (five chained operations)
- Each task has 5 visible I/O examples and ~20 hidden test cases
Tasks are drawn from the DeepCoder dataset, which covers integer list transformations using ~40 primitives (sorting, filtering, mapping, zipping, scanning, etc.).
Reward Structure
Binary reward: 1.0 if the submitted function passes ALL hidden test cases, 0.0 otherwise. Fully verifiable — no LLM grader.
Data
- Source: DeepCoder dataset (Balog et al., ICLR 2017)
- Enriched with hidden test cases generated from ground-truth DSL programs
Tools
- test(code) — Test Python code against the visible I/O examples. Returns pass/fail per example. Non-terminal.
- submit(code) — Submit final Python code for grading against hidden test cases. Terminal action, one attempt only.
Time Horizon
Multi-turn. Agents can iterate using the test tool before submitting. Typical: 1-5 tool calls.
Environment Difficulty
- T=1: Easy (single operation, e.g., sort, map, filter)
- T=2: Medium (two chained operations)
- T=3: Hard (three chained operations, often with multiple inputs)
- T=4-5: Very hard (four-five chained operations, complex compositions)
Safety
No safety concerns — tasks involve only integer and list manipulation.
Citations
@inproceedings{Balog2017,
author = {Balog, Matej and Gaunt, Alexander L. and Brockschmidt, Marc and Nowozin, Sebastian and Tarlow, Daniel},
title = {DeepCoder: Learning to Write Programs},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2017},
url = {https://arxiv.org/abs/1611.01989}
}
@inproceedings{Fijalkow2022,
author = {Fijalkow, Nathana{\"{e}}l and Lagarde, Guillaume and Matricon, Th{\'{e}}o and Ellis, Kevin and Ohlmann, Pierre and Potta, Akarsh},
title = {Scaling Neural Program Synthesis with Distribution-based Search},
booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
volume = {36},
number = {6},
pages = {6623--6630},
year = {2022},
doi = {10.1609/aaai.v36i6.20616}
}