0

Chunyuan Li

Papers
41

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
41papers

Authored papers

41

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

arXiv 2026

2026

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

arXiv 2026

2026

LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

arXiv 2025

2025

Seed1.5-VL Technical Report

arXiv 2025

2025

Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning

arXiv 2025

2025

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

arXiv 2025

2025

LLaVA-OneVision: Easy Visual Task Transfer

arXiv 2024

2024

Long Context Transfer from Language to Vision

arXiv 2024

2024

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

arXiv 2024

2024

TrustLLM: Trustworthiness in Large Language Models

arXiv 2024

2024

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

arXiv 2024

2024

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

arXiv 2024

2024

Graphic Design with Large Multimodal Model

arXiv 2024

2024

Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

arXiv 2024

2024

Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

arXiv 2024

2024

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

arXiv 2024

2024

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

arXiv 2023

2023

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

arXiv 2023

2023

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

arXiv 2023

2023

A Simple Framework for Open-Vocabulary Segmentation and Detection

ICCV 2023 1

2023

Visual In-Context Prompting

CVPR 2024 1

2023

Instruction Tuning with GPT-4

arXiv 2023

2023

Semantic-SAM: Segment and Recognize Anything at Any Granularity

arXiv 2023

2023

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

arXiv 2023

2023

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

arXiv 2023

2023

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

arXiv 2023

2023

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

arXiv 2023

2023

GLIGEN: Open-Set Grounded Text-to-Image Generation

CVPR 2023 1

2023

Towards Building the Federated GPT: Federated Instruction Tuning

arXiv 2023

2023

HallE-Control: Controlling Object Hallucination in Large Multimodal Models

arXiv 2023

2023

Parameter-efficient Model Adaptation for Vision Transformers

arXiv 2022

2022

Generalized Decoding for Pixel, Image, and Language

CVPR 2023 1

2022

Focal Modulation Networks

arXiv 2022

2022

RegionCLIP: Region-based Language-Image Pretraining

CVPR 2022 1

2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation

arXiv 2021

2021

Florence: A New Foundation Model for Computer Vision

arXiv 2021

2021

Contrastive Attraction and Contrastive Repulsion for Representation Learning

contrastive-attraction-and-contrastive

2021

Few-shot Natural Language Generation for Task-Oriented Dialog

Findings of the Association for Computational Linguistics 2020

2020

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

ECCV 2020 8

2020

POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training

EMNLP 2020 11

2020

Measuring the Intrinsic Dimension of Objective Landscapes

measuring-the-intrinsic-dimension-of-1

2018

Affiliations

No known affiliations.

Frequent co-authors

10

from 41 papers