Xiangyu Zhao
- Papers
- 42
Cite
Notes
Only stored in your browser.
Authored papers
42RISE-Video: Can Video Generators Decode Implicit World Rules?
arXiv 2026
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation
arXiv 2026
GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
The Best of the Two Worlds: Harmonizing Semantic and Hash IDs for Sequential Recommendation
arXiv 2025
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
arXiv 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
arXiv 2025
Towards Multi-Granularity Memory Association and Selection for Long-Term Conversational Agents
arXiv 2025
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement
arXiv 2025
Redundancy Principles for MLLMs Benchmarks
arXiv 2025
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
arXiv 2025
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
arXiv 2025
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework
arXiv 2025
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence
arXiv 2025
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
arXiv 2025
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
arXiv 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
arXiv 2025
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
arXiv 2025
GenExam: A Multidisciplinary Text-to-Image Exam
arXiv 2025
MM-IFEngine: Towards Multimodal Instruction Following
arXiv 2025
TAPO: Task-Referenced Adaptation for Prompt Optimization
arXiv 2025
Training-free LLM Merging for Multi-task Learning
arXiv 2025
ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph
arXiv 2025
G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models
arXiv 2024
NoteLLM-2: Multimodal Large Representation Models for Recommendation
arXiv 2024
Large Language Model Distilling Medication Recommendation Model
arXiv 2024
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation
arXiv 2024
ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems
arXiv 2024
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation
arXiv 2024
An Open and Comprehensive Pipeline for Unified Object Grounding and Detection
arXiv 2024
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
arXiv 2024
Extracting polygonal footprints in off-nadir images with Segment Anything Model
arXiv 2024
Pre-train, Align, and Disentangle: Empowering Sequential Recommendation with Large Language Models
arXiv 2024
Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention
arXiv 2024
Bridging Relevance and Reasoning: Rationale Distillation in Retrieval-Augmented Generation
arXiv 2024
Large Language Models for Generative Information Extraction: A Survey
arXiv 2023
Multi-Task Recommendations with Reinforcement Learning
arXiv 2023
EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs
arXiv 2023
MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion
arXiv 2023
Building a 3-Player Mahjong AI using Deep Reinforcement Learning
arXiv 2022
Affiliations
Frequent co-authors
10from 42 papers