Boyuan Sun
- Papers
- 10
Cite
Notes
Only stored in your browser.
Authored papers
10See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding
arXiv 2026
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
arXiv 2026
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
arXiv 2025
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
arXiv 2025
Depth Anything at Any Condition
arXiv 2025
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding
arXiv 2025
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness
arXiv 2025
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
arXiv 2025
Towards RAW Object Detection in Diverse Conditions
CVPR 2025 1
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
CVPR 2024 1
Affiliations
Frequent co-authors
10from 10 papers