0

Continual Learning for Instruction Following from Realtime Feedback

A contextual bandit learning approach improves an instruction-following agent's accuracy over time by leveraging binary user feedback during collaborative interactions.

Year
2022
Venue
continual-learning-for-instruction-following
Authors
2
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2212.09710v2ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent following their instructions. We design a contextual bandit learning approach, converting user feedback to immediate reward. We evaluate through thousands of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution accuracy over time. We also show our approach is robust to several design variations, and that the feedback signal is roughly equivalent to the learning signal of supervised demonstration data.

Authors

2