Cite
Notes
Only stored in your browser.
Attribution
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting
arXiv 2025
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray
from 2 papers
Chaoyou Fu
Haoyu Cao
Xing Sun
Yunhang Shen
Bin Luo
Caifeng Shan
Cheng Qian
Chi Yan
Chu Wu
Deqiang Jiang