Haibo Wang

Cite

Notes

Only stored in your browser.

Attribution

4papers

Authored papers

Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding

arXiv 2026

Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering

arXiv 2024

Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge

arXiv 2024

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

arXiv 2024

No known affiliations.

from 4 papers

Weifeng Ge

Lifu Huang

Zhiyang Xu

Chenghang Lai

Qifan Wang

Shizhe Diao

Yixin Cao

Yixuan Sun

Yu Cheng

Yufan Zhou