Xiangyu Yue
- Papers
- 46
Cite
Notes
Only stored in your browser.
Authored papers
46OpenGame: Open Agentic Coding for Games
arXiv 2026
BitDance: Scaling Autoregressive Generative Models with Binary Tokens
arXiv 2026
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery
arXiv 2026
Gen-Searcher: Reinforcing Agentic Search for Image Generation
arXiv 2026
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook
arXiv 2026
Exploring Reasoning Reward Model for Agents
arXiv 2026
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
arXiv 2026
Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models
arXiv 2026
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification
arXiv 2025
HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States
arXiv 2025
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
arXiv 2025
Native-Resolution Image Synthesis
arXiv 2025
Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model
arXiv 2025
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
arXiv 2025
AdaTooler-V: Adaptive Tool-Use for Images and Videos
arXiv 2025
NaTex: Seamless Texture Generation as Latent Color Diffusion
arXiv 2025
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
arXiv 2025
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
arXiv 2025
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
arXiv 2025
SpaceVista: All-Scale Visual Spatial Reasoning from mm to km
arXiv 2025
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines
arXiv 2025
VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning
arXiv 2025
Video-R1: Reinforcing Video Reasoning in MLLMs
arXiv 2025
Unleashing Vecset Diffusion Model for Fast Shape Generation
ICCV 2025
Multimodal Long Video Modeling Based on Temporal Dynamic Context
arXiv 2025
OneThinker: All-in-one Reasoning Model for Image and Video
arXiv 2025
Transition Models: Rethinking the Generative Learning Objective
arXiv 2025
LATTICE: Democratize High-Fidelity 3D Generation at Scale
arXiv 2025
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
arXiv 2025
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
arXiv 2024
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
arXiv 2024
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
CVPR 2025 1
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
arXiv 2024
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
arXiv 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
CVPR 2024 1
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
CVPR 2025 1
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
arXiv 2024
Explore the Limits of Omni-modal Pretraining at Scale
arXiv 2024
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
arXiv 2024
ImageBind-LLM: Multi-modality Instruction Tuning
arXiv 2023
Meta-Transformer: A Unified Framework for Multimodal Learning
arXiv 2023
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
arXiv 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
arXiv 2023
Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models
ICCV 2023 1
Beating Backdoor Attack at Its Own Game
ICCV 2023 1
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024 1
Affiliations
Frequent co-authors
10from 46 papers