0

From Unstructured Data to Demand Counterfactuals: Theory and Practice

Empirical models of multi-product demand rely on low-dimensional product representations to capture substitution patterns, increasingly using proxies built from unstructured data. When proxies are imperfect, standard workflows yield biased counterfactuals and invalid inference.

Preview
Year
2026
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2601.05374ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Empirical models of multi-product demand rely on low-dimensional product representations to capture substitution patterns, increasingly using proxies built from unstructured data. When proxies are imperfect, standard workflows yield biased counterfactuals and invalid inference. We develop a practical toolkit to address these issues. Our methods apply to market-level and/or individual data, require minimal additional computation, provide simple standard-error formulas, and accommodate proxies from fine-tuned models. Further, we propose diagnostics to assess proxy quality. Our methods yield meaningful improvements in predicting substitution in empirically calibrated simulations and in an application where we assess counterfactual prediction performance against a ground truth.