Ziyue Wang
- Papers
- 16
Cite
Notes
Only stored in your browser.
Authored papers
16TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
arXiv 2026
Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors
arXiv 2026
EgoLife: Towards Egocentric Life Assistant
CVPR 2025 1
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
arXiv 2025
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
arXiv 2025
SurgRAW: Multi-Agent Workflow with Chain-of-Thought Reasoning for Surgical Intelligence
arXiv 2025
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking
arXiv 2025
SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking
arXiv 2025
Visual Abstract Thinking Empowers Multimodal Reasoning
arXiv 2025
DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms
arXiv 2025
SegNet4D: Efficient Instance-Aware 4D Semantic Segmentation for LiDAR Point Cloud
arXiv 2024
Long Context Transfer from Language to Vision
arXiv 2024
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
arXiv 2024
Model Composition for Multimodal Large Language Models
arXiv 2024
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
arXiv 2024
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
arXiv 2023
Affiliations
Frequent co-authors
10from 16 papers