Jinyi Hu
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
arXiv 2025
EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
arXiv 2025
NVILA: Efficient Frontier Visual Language Models
CVPR 2025 1
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
arXiv 2024
GUICourse: From General Vision Language Models to Versatile GUI Agents
arXiv 2024
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer
arXiv 2024
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions
arXiv 2024
Exploring Perceptual Limitation of Multimodal Large Language Models
arXiv 2024
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
arXiv 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
CVPR 2024 1
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
arXiv 2023
Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation
fuse-it-more-deeply-a-variational-transformer
Affiliations
Frequent co-authors
10from 12 papers