0

Yixuan Li

Papers
45

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
45papers

Authored papers

45

Advancing Open-source World Models

arXiv 2026

2026

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

arXiv 2026

2026

Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models

arXiv 2026

2026

video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models

arXiv 2025

2025

MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems

arXiv 2025

2025

Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach

arXiv 2025

2025

The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text

arXiv 2025

2025

HunyuanVideo 1.5 Technical Report

arXiv 2025

2025

MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues

arXiv 2025

2025

video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

arXiv 2025

2025

LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals

arXiv 2025

2025

ACVUBench: Audio-Centric Video Understanding Benchmark

arXiv 2025

2025

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

arXiv 2025

2025

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting

ICCV 2025

2025

Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding

arXiv 2025

2025

Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning

arXiv 2025

2025

Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation

arXiv 2025

2025

Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment

arXiv 2025

2025

AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model

arXiv 2025

2025

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

arXiv 2024

2024

AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation

arXiv 2024

2024

Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models

arXiv 2024

2024

Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

arXiv 2024

2024

ARGS: Alignment as Reward-Guided Search

arXiv 2024

2024

PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning

arXiv 2024

2024

PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing

arXiv 2024

2024

HYPO: Hyperspherical Out-of-Distribution Generalization

arXiv 2024

2024

How Does Unlabeled Data Provably Help Out-of-Distribution Detection?

arXiv 2024

2024

Understanding the Learning Dynamics of Alignment with Human Feedback

arXiv 2024

2024

HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection

arXiv 2024

2024

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

arXiv 2024

2024

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

arXiv 2024

2024

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

arXiv 2024

2024

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

arXiv 2024

2024

OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection

arXiv 2023

2023

Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection

arXiv 2023

2023

Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection

arXiv 2023

2023

BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View Images

ICCV 2023 1

2023

Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment

CVPR 2023 1

2023

InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint

arXiv 2023

2023

MOS: Towards Scaling Out-of-distribution Detection for Large Semantic Space

CVPR 2021 1

2021

MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions

ICCV 2021 10

2021

Energy-based Out-of-distribution Detection

NeurIPS 2020 12

2020

Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks

enhancing-the-reliability-of-out-of-1

2017

Convergent Learning: Do different neural networks learn the same representations?

arXiv 2015

2015

Affiliations

No known affiliations.

Frequent co-authors

10

from 45 papers