0

Trip+: Benchmarking Agents in Personalized Interactive Travel Planning

Interactive travel planning has become a popular use case for language models. Agents are deployed to manage evolving preferences and unexpected disruptions over multiple turns. Such settings require models to make complex, profile-conditioned planning decisions.

Preview
Year
2026
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2606.21169ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Interactive travel planning has become a popular use case for language models. Agents are deployed to manage evolving preferences and unexpected disruptions over multiple turns. Such settings require models to make complex, profile-conditioned planning decisions. However, existing benchmarks often evaluate feasibility, personalization, or interaction in relatively isolated settings. We therefore introduce Trip+ to measure the ability of agents to plan travel holistically. In Trip+, given traveler profiles and dynamic interactions, agents must generate and revise minute-level itineraries. End-to-end traveler experiences are evaluated via an LLM-based simulator, enabling the assessment of subjective metrics like fatigue. Our scenarios range from simple request resolutions to complex environment-driven replanning. We evaluate 18 LMs and find a consistent gap in experiential quality. Models favor technically feasible but exhausting itineraries that diverge sharply from profiled traveler preferences.