Zhiyang Xu
- Papers
- 11
Cite
Notes
Only stored in your browser.
Authored papers
11Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding
arXiv 2026
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
arXiv 2025
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
arXiv 2025
LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer
arXiv 2025
LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
arXiv 2025
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
arXiv 2024
UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
arXiv 2024
The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language Models
arXiv 2023
MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
arXiv 2023
AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes
arXiv 2023
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
arXiv 2022
Affiliations
Frequent co-authors
10from 11 papers