0

Xiangyu Zhao

Papers
42

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
42papers

Authored papers

42

RISE-Video: Can Video Generators Decode Implicit World Rules?

arXiv 2026

2026

Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

arXiv 2026

2026

GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing

arXiv 2026

2026

Kimi K2.5: Visual Agentic Intelligence

arXiv 2026

2026

The Best of the Two Worlds: Harmonizing Semantic and Hash IDs for Sequential Recommendation

arXiv 2025

2026

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

arXiv 2025

2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

arXiv 2025

2025

Towards Multi-Granularity Memory Association and Selection for Long-Term Conversational Agents

arXiv 2025

2025

FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement

arXiv 2025

2025

Redundancy Principles for MLLMs Benchmarks

arXiv 2025

2025

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

arXiv 2025

2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

arXiv 2025

2025

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

arXiv 2025

2025

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

arXiv 2025

2025

TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework

arXiv 2025

2025

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

arXiv 2025

2025

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

arXiv 2025

2025

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

arXiv 2025

2025

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

arXiv 2025

2025

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

arXiv 2025

2025

GenExam: A Multidisciplinary Text-to-Image Exam

arXiv 2025

2025

MM-IFEngine: Towards Multimodal Instruction Following

arXiv 2025

2025

TAPO: Task-Referenced Adaptation for Prompt Optimization

arXiv 2025

2025

Training-free LLM Merging for Multi-task Learning

arXiv 2025

2025

ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph

arXiv 2025

2025

G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models

arXiv 2024

2024

NoteLLM-2: Multimodal Large Representation Models for Recommendation

arXiv 2024

2024

Large Language Model Distilling Medication Recommendation Model

arXiv 2024

2024

UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation

arXiv 2024

2024

ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems

arXiv 2024

2024

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

arXiv 2024

2024

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

arXiv 2024

2024

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

arXiv 2024

2024

Extracting polygonal footprints in off-nadir images with Segment Anything Model

arXiv 2024

2024

Pre-train, Align, and Disentangle: Empowering Sequential Recommendation with Large Language Models

arXiv 2024

2024

Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention

arXiv 2024

2024

Bridging Relevance and Reasoning: Rationale Distillation in Retrieval-Augmented Generation

arXiv 2024

2024

Large Language Models for Generative Information Extraction: A Survey

arXiv 2023

2023

Multi-Task Recommendations with Reinforcement Learning

arXiv 2023

2023

EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs

arXiv 2023

2023

MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion

arXiv 2023

2023

Building a 3-Player Mahjong AI using Deep Reinforcement Learning

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 42 papers