Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

In this report, we present Hy-Embodied-0.5-VLA, abbreviated as HyVLA-0.5, an end-to-end system that spans the full robot learning stack: data collection, model design, continued pre-training and supervised fine-tuning, RL post-training, and real-world deployment.

Open

Preview
Year: 2026
ArXiv: arxiv.org/abs/2606.14409
Hosting: Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2606.14409ARXIV-DEFAULT
TL;DR: Semantic Scholar

Attribution policy →