0

Xiaoye Qu

Papers
43

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
43papers

Authored papers

43

GEMS: Agent-Native Multimodal Generation with Memory and Skills

arXiv 2026

2026

Toward Efficient Agents: Memory, Tool learning, and Planning

arXiv 2026

2026

XSkill: Continual Learning from Experience and Skills in Multimodal Agents

arXiv 2026

2026

π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

arXiv 2026

2026

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

arXiv 2026

2026

Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

arXiv 2026

2026

LatentMem: Customizing Latent Memory for Multi-Agent Systems

arXiv 2026

2026

Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

arXiv 2026

2026

Learning to Reason under Off-Policy Guidance

arXiv 2025

2025

Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

arXiv 2025

2025

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

arXiv 2025

2025

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

arXiv 2025

2025

Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

arXiv 2025

2025

SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards

arXiv 2025

2025

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

arXiv 2025

2025

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

arXiv 2025

2025

VideoSSR: Video Self-Supervised Reinforcement Learning

arXiv 2025

2025

A Survey of Reinforcement Learning for Large Reasoning Models

arXiv 2025

2025

ExGRPO: Learning to Reason from Experience

arXiv 2025

2025

FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting

arXiv 2025

2025

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

arXiv 2025

2025

Spotlight on Token Perception for Multimodal Reinforcement Learning

arXiv 2025

2025

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

arXiv 2025

2025

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

arXiv 2025

2025

Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think

CVPR 2025 1

2025

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

arXiv 2025

2025

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

arXiv 2025

2025

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

arXiv 2025

2025

VA-π: Variational Policy Alignment for Pixel-Aware Autoregressive Generation

arXiv 2025

2025

LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

arXiv 2024

2024

CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling

arXiv 2024

2024

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

arXiv 2024

2024

Timo: Towards Better Temporal Reasoning for Language Models

arXiv 2024

2024

Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning

arXiv 2024

2024

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

arXiv 2024

2024

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

arXiv 2024

2024

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

ICCV 2025

2024

ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

arXiv 2024

2024

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

arXiv 2024

2024

SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information

arXiv 2024

2024

Mirror: A Universal Framework for Various Information Extraction Tasks

arXiv 2023

2023

MIRACLE: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute Control

arXiv 2023

2023

Enhancing Low-Resource Relation Representations through Multi-View Decoupling

arXiv 2023

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 43 papers