Existing evaluations of tabular synthesis models rely primarily on low-order statistics and downstream task performance, leaving multivariate causal relationships that go beyond pairwise correlations largely unmeasured. We argue that a systematic evaluation on high-order structural information is a crucial first step in addressing this issue in tabular data synthesis. In this paper, we present high-order structural causal information as a natural form of prior knowledge and introduce a benchmark framework to evaluate tabular synthesis models. This framework allows us to generate benchmark datasets through a flexible range of data generation processes, allowing for the training of tabular synthesis models using these datasets for further evaluation. We propose multiple benchmark tasks, high-order metrics, and causal inference tasks as downstream tasks for evaluating the quality of synthetic data generated by the trained models. Our experiments demonstrate the effectiveness of the benchmark framework in evaluating the model's ability to capture high-order structural causal information. Furthermore, our benchmarking results provide an initial assessment of state-of-the-art tabular synthesis models. These results reveal significant gaps between ideal and actual performance and highlight how baseline methods differ. We position the framework as a controlled diagnostic benchmark for causal fidelity, complementing existing low-order and downstream evaluations. We open source the benchmark framework, including both code and data along with documentation, to support further research in this area.
Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark Framework
Existing evaluations of tabular synthesis models rely primarily on low-order statistics and downstream task performance, leaving multivariate causal relationships that go beyond pairwise correlations largely unmeasured.
- Year
- 2024
- Hosting
- Excerpt onlyCC-BY-NC-SA-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2406.08311CC-BY-NC-SA-4.0
- TL;DR
- Semantic Scholar