0

A Note on Statistically Accurate Tabular Data Generation Using Large Language Models

Probability-driven prompting using large language models improves the statistical fidelity of synthetic tabular data by accurately capturing complex feature dependencies.

Year
2025
Venue
arXiv 2025
Authors
1
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2505.02659v2ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Large language models (LLMs) have shown promise in synthetic tabular data generation, yet existing methods struggle to preserve complex feature dependencies, particularly among categorical variables. This work introduces a probability-driven prompting approach that leverages LLMs to estimate conditional distributions, enabling more accurate and scalable data synthesis. The results highlight the potential of prompting probability distributions to enhance the statistical fidelity of LLM-generated tabular data.

Authors

1