A brand whose customers use both ChatGPT and Claude for product recommendations faces a strategic choice: a single optimization playbook, or one per provider? Across 215 commercially-framed prompts in four measurement batches, the two providers disagree on which brands they recommend roughly two-thirds of the time (cross-provider recommendation Jaccard 0.35, below the 0.50-0.61 same-prompt rerun baseline). The picks diverge. But when neither provider recommends a brand, we classify the failure into one of three modes -- discoverability (the brand never reaches the model), compellingness (it reaches the model but isn't mentioned), or positioning (it's mentioned but not recommended) -- and on 7,763 such joint failures, both providers diagnose the same failure mode 95.1% of the time (clustered 95% CI [94.3%, 95.7%]). Agreement rises monotonically with falling brand prominence, from 81% [78.2%, 84.0%] on category leaders to 99.6% [99.3%, 99.9%] on long-tail regional brands. The two providers reach their picks by measurably different generative routes -- Anthropic recommends from priors 43-52% of the time, OpenAI 8-29% -- but they converge on the failure diagnosis where it matters most for the long tail. Work that addresses the diagnosed failure mode lifts visibility on both providers; positioning - and content-level work for category leaders is more provider-specific.
Divergent Recommendations, Convergent Diagnoses: Cross-Provider Failure-Mode Convergence in AI Commercial Recommendation
A brand whose customers use both ChatGPT and Claude for product recommendations faces a strategic choice: a single optimization playbook, or one per provider? Across 215 commercially-framed prompts in four measurement batches, the two providers disagree on which brands they…
- Preview

- Year
- 2026
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2606.26116CC-BY-4.0
- TL;DR
- Semantic Scholar