YaoWei Wang
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
arXiv 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
arXiv 2025
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
CVPR 2025 1
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
arXiv 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
arXiv 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
arXiv 2025
HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
ICCV 2025
VMamba: Visual State Space Model
arXiv 2024
vHeat: Building Vision Models upon Heat Conduction
arXiv 2024
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
arXiv 2024
Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition
arXiv 2024
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
arXiv 2024
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
arXiv 2024
Towards Visual Grounding: A Survey
arXiv 2024
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
arXiv 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
arXiv 2024
M$^3$GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation
arXiv 2024
Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation
arXiv 2024
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
arXiv 2023
Strip-MLP: Efficient Token Interaction for Vision MLP
ICCV 2023 1
CiteTracker: Correlating Image and Text for Visual Tracking
ICCV 2023 1
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
arXiv 2023
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
arXiv 2023
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors
arXiv 2022
Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples
CVPR 2023 1
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
CVPR 2023 1
Affiliations
Frequent co-authors
10from 26 papers