Yansong Tang
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking
arXiv 2026
TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction
arXiv 2026
Meta-CoT: Enhancing Granularity and Generalization in Image Editing
arXiv 2026
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
arXiv 2025
GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting
arXiv 2025
FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios
arXiv 2025
VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning
arXiv 2025
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
arXiv 2025
KV-Edit: Training-Free Image Editing for Precise Background Preservation
arXiv 2025
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
arXiv 2025
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
arXiv 2024
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
arXiv 2024
Q-VLM: Post-training Quantization for Large Vision-Language Models
arXiv 2024
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
arXiv 2024
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
CVPR 2024 1
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
CVPR 2024 1
VoCo-LLaMA: Towards Vision Compression with Large Language Models
CVPR 2025 1
Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
ICCV 2023 1
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
soc-semantic-assisted-object-cluster-for-1
Efficient Meshy Neural Fields for Animatable Human Avatars
arXiv 2023
Segment and Caption Anything
CVPR 2024 1
MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory
mcuformer-deploying-vision-tranformers-on
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
arXiv 2022
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
ICCV 2023 1
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
CVPR 2022 1
Affiliations
Frequent co-authors
10from 25 papers