Yixuan Li
- Papers
- 45
Cite
Notes
Only stored in your browser.
Authored papers
45Advancing Open-source World Models
arXiv 2026
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
arXiv 2026
Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models
arXiv 2026
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
arXiv 2025
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
arXiv 2025
Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach
arXiv 2025
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text
arXiv 2025
HunyuanVideo 1.5 Technical Report
arXiv 2025
MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
arXiv 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
arXiv 2025
LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals
arXiv 2025
ACVUBench: Audio-Centric Video Understanding Benchmark
arXiv 2025
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
arXiv 2025
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting
ICCV 2025
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
arXiv 2025
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
arXiv 2025
Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation
arXiv 2025
Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment
arXiv 2025
AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model
arXiv 2025
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
arXiv 2024
AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation
arXiv 2024
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
arXiv 2024
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare
arXiv 2024
ARGS: Alignment as Reward-Guided Search
arXiv 2024
PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning
arXiv 2024
PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
arXiv 2024
HYPO: Hyperspherical Out-of-Distribution Generalization
arXiv 2024
How Does Unlabeled Data Provably Help Out-of-Distribution Detection?
arXiv 2024
Understanding the Learning Dynamics of Alignment with Human Feedback
arXiv 2024
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection
arXiv 2024
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
arXiv 2024
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
arXiv 2024
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
arXiv 2024
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models
arXiv 2024
OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection
arXiv 2023
Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection
arXiv 2023
Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection
arXiv 2023
BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View Images
ICCV 2023 1
Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment
CVPR 2023 1
InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint
arXiv 2023
MOS: Towards Scaling Out-of-distribution Detection for Large Semantic Space
CVPR 2021 1
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
ICCV 2021 10
Energy-based Out-of-distribution Detection
NeurIPS 2020 12
Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks
enhancing-the-reliability-of-out-of-1
Convergent Learning: Do different neural networks learn the same representations?
arXiv 2015
Affiliations
Frequent co-authors
10from 45 papers