0

PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM

PoseLess uses synthetic training data and transformer-based decoders to map 2D images to joint angles for robot hand control, achieving zero-shot generalization to real-world scenarios and cross-morphology transfer.

Year
2025
Venue
arXiv 2025
Authors
4
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2503.07111v2ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

This paper introduces PoseLess, a novel framework for robot hand control that eliminates the need for explicit pose estimation by directly mapping 2D images to joint angles using projected representations. Our approach leverages synthetic training data generated through randomized joint configurations, enabling zero-shot generalization to real-world scenarios and cross-morphology transfer from robotic to human hands. By projecting visual inputs and employing a transformer-based decoder, PoseLess achieves robust, low-latency control while addressing challenges such as depth ambiguity and data scarcity. Experimental results demonstrate competitive performance in joint angle prediction accuracy without relying on any human-labelled dataset.

Authors

4