Zeyuan Chen
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13X-Dyna: Expressive Dynamic Human Image Animation
CVPR 2025 1
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents
arXiv 2025
GTA1: GUI Test-time Scaling Agent
arXiv 2025
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
arXiv 2025
CoDA: Coding LM via Diffusion Adaptation
arXiv 2025
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models
arXiv 2024
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
arXiv 2024
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
arXiv 2023
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
arXiv 2023
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
arXiv 2023
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
arXiv 2023
MGTBench: Benchmarking Machine-Generated Text Detection
arXiv 2023
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
arXiv 2022
Affiliations
Frequent co-authors
10from 13 papers