Xu sun
- Papers
- 30
Cite
Notes
Only stored in your browser.
Authored papers
30TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
arXiv 2026
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
arXiv 2025
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning
arXiv 2025
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
arXiv 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
arXiv 2025
Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling
arXiv 2025
TEMPLE:Temporal Preference Learning of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment
arXiv 2025
Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence
arXiv 2025
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
arXiv 2025
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
arXiv 2024
VidTwin: Video VAE with Decoupled Structure and Dynamics
CVPR 2025 1
TempCompass: Do Video LLMs Really Understand Videos?
arXiv 2024
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
arXiv 2024
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
arXiv 2024
Temporal Reasoning Transfer from Text to Video
arXiv 2024
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
arXiv 2024
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
CVPR 2024 1
Can Language Models Understand Physical Concepts?
arXiv 2023
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
arXiv 2023
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
prompt-pre-training-with-twenty-thousand
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
arXiv 2023
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
arXiv 2023
Towards Codable Watermarking for Injecting Multi-bits Information to LLMs
arXiv 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
arXiv 2023
A Survey on In-context Learning
arXiv 2022
Delving into the Openness of CLIP
arXiv 2022
Well-classified Examples are Underestimated in Classification with Deep Neural Networks
arXiv 2021
Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models
NAACL 2021 4
RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
EMNLP 2021 11
An Adaptive and Momental Bound Method for Stochastic Learning
arXiv 2019
Affiliations
Frequent co-authors
10from 30 papers