Yang Shi
- Papers
- 19
Cite
Notes
Only stored in your browser.
Authored papers
19Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos
arXiv 2026
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
arXiv 2026
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV
arXiv 2026
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
arXiv 2026
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?
arXiv 2026
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining
arXiv 2026
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
arXiv 2026
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
arXiv 2026
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization
arXiv 2026
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
arXiv 2025
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
arXiv 2025
Monet: Reasoning in Latent Visual Space Beyond Images and Language
arXiv 2025
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
arXiv 2025
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
arXiv 2025
MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
arXiv 2025
EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
arXiv 2025
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
arXiv 2025
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark
arXiv 2025
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing
arXiv 2025
Affiliations
Frequent co-authors
10from 19 papers