0

Xu sun

Papers
30

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
30papers

Authored papers

30

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

arXiv 2026

2026

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

arXiv 2025

2025

SpaceR: Reinforcing MLLMs in Video Spatial Reasoning

arXiv 2025

2025

VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

arXiv 2025

2025

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

arXiv 2025

2025

Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling

arXiv 2025

2025

TEMPLE:Temporal Preference Learning of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment

arXiv 2025

2025

Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence

arXiv 2025

2025

UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?

arXiv 2025

2025

Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents

arXiv 2024

2024

VidTwin: Video VAE with Decoupled Structure and Dynamics

CVPR 2025 1

2024

TempCompass: Do Video LLMs Really Understand Videos?

arXiv 2024

2024

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

arXiv 2024

2024

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension

arXiv 2024

2024

Temporal Reasoning Transfer from Text to Video

arXiv 2024

2024

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

arXiv 2024

2024

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

CVPR 2024 1

2023

Can Language Models Understand Physical Concepts?

arXiv 2023

2023

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

arXiv 2023

2023

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

prompt-pre-training-with-twenty-thousand

2023

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

arXiv 2023

2023

TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

arXiv 2023

2023

Towards Codable Watermarking for Injecting Multi-bits Information to LLMs

arXiv 2023

2023

MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning

arXiv 2023

2023

A Survey on In-context Learning

arXiv 2022

2022

Delving into the Openness of CLIP

arXiv 2022

2022

Well-classified Examples are Underestimated in Classification with Deep Neural Networks

arXiv 2021

2021

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

NAACL 2021 4

2021

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

EMNLP 2021 11

2021

An Adaptive and Momental Bound Method for Stochastic Learning

arXiv 2019

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 30 papers