0

One-Shot Imitation under Mismatched Execution

RHyME uses optimal transport to align human and robot demonstrations for policy training, improving task success without paired data.

Year
2024
Venue
arXiv 2024
Authors
5
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2409.06615v6ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks. However, translating these demonstrations into robot-executable actions presents significant challenges due to execution mismatches in movement styles and physical capabilities. Existing methods for human-robot translation either depend on paired data, which is infeasible to scale, or rely heavily on frame-level visual similarities that often break down in practice. To address these challenges, we propose RHyME, a novel framework that automatically pairs human and robot trajectories using sequence-level optimal transport cost functions. Given long-horizon robot demonstrations, RHyME synthesizes semantically equivalent human videos by retrieving and composing short-horizon human clips. This approach facilitates effective policy training without the need for paired data. RHyME successfully imitates a range of cross-embodiment demonstrators, both in simulation and with a real human hand, achieving over 50% increase in task success compared to previous methods. We release our code and datasets at https://portal-cornell.github.io/rhyme/.

Authors

5