Cite
Notes
Only stored in your browser.
Attribution
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
arXiv 2025
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
arXiv 2024
from 2 papers
Liangtao Shi
Richang Hong
Ting Liu
Yue Hu
Linfeng Zhang
Xiantao Hu