Two-Action Apple Tasting with Switching Costs

We study the two-action apple-tasting problem with switching costs against an oblivious adversary. In an equivalent normalized formulation, at each round the learner chooses between a revealing action and a blind action: the revealing action gives reward 0 and reveals the hidden value x_t\in[-1,1] of the blind action; the blind action gives reward x_t but reveals nothing. The learner pays one unit whenever they switches actions, and regret is measured against the best fixed action in hindsight. General feedback-graph algorithms with switching costs give \widetilde O(T^{2/3}) regret guarantees for this problem. The two-action apple-tasting graph was the natural candidate for the missing Ω(T^{2/3}) obstruction in the switching-cost classification: such a lower bound would have transferred to a large family of still-unclassified feedback graphs. We prove that this obstruction is not there: the oblivious minimax expected regret for this problem satisfies [ \frac{1}{2\sqrt3}\cdot\sqrt T \le R_T^\star \le 2\sqrt{3}\cdot \sqrt{T}. ]