0

Xiangyu Yue

Papers
46

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
46papers

Authored papers

46

OpenGame: Open Agentic Coding for Games

arXiv 2026

2026

BitDance: Scaling Autoregressive Generative Models with Binary Tokens

arXiv 2026

2026

InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

arXiv 2026

2026

Gen-Searcher: Reinforcing Agentic Search for Image Generation

arXiv 2026

2026

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

arXiv 2026

2026

Exploring Reasoning Reward Model for Agents

arXiv 2026

2026

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

arXiv 2026

2026

Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models

arXiv 2026

2026

NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification

arXiv 2025

2025

HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States

arXiv 2025

2025

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

arXiv 2025

2025

Native-Resolution Image Synthesis

arXiv 2025

2025

Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model

arXiv 2025

2025

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

arXiv 2025

2025

AdaTooler-V: Adaptive Tool-Use for Images and Videos

arXiv 2025

2025

NaTex: Seamless Texture Generation as Latent Color Diffusion

arXiv 2025

2025

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

arXiv 2025

2025

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

arXiv 2025

2025

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

arXiv 2025

2025

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

arXiv 2025

2025

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

arXiv 2025

2025

VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning

arXiv 2025

2025

Video-R1: Reinforcing Video Reasoning in MLLMs

arXiv 2025

2025

Unleashing Vecset Diffusion Model for Fast Shape Generation

ICCV 2025

2025

Multimodal Long Video Modeling Based on Temporal Dynamic Context

arXiv 2025

2025

OneThinker: All-in-one Reasoning Model for Image and Video

arXiv 2025

2025

Transition Models: Rethinking the Generative Learning Objective

arXiv 2025

2025

LATTICE: Democratize High-Fidelity 3D Generation at Scale

arXiv 2025

2025

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

arXiv 2025

2025

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

arXiv 2024

2024

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

arXiv 2024

2024

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

CVPR 2025 1

2024

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

arXiv 2024

2024

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

arXiv 2024

2024

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

CVPR 2024 1

2024

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models

CVPR 2025 1

2024

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

arXiv 2024

2024

Explore the Limits of Omni-modal Pretraining at Scale

arXiv 2024

2024

Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines

arXiv 2024

2024

ImageBind-LLM: Multi-modality Instruction Tuning

arXiv 2023

2023

Meta-Transformer: A Unified Framework for Multimodal Learning

arXiv 2023

2023

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

arXiv 2023

2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

arXiv 2023

2023

Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models

ICCV 2023 1

2023

Beating Backdoor Attack at Its Own Game

ICCV 2023 1

2023

OneLLM: One Framework to Align All Modalities with Language

CVPR 2024 1

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 46 papers