0

Dongdong Chen

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

arXiv 2026

2026

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

arXiv 2026

2026

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

arXiv 2026

2026

Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation

arXiv 2026

2026

OmniVid: A Generative Framework for Universal Video Understanding

CVPR 2024 1

2024

ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities

arXiv 2024

2024

Olympus: A Universal Task Router for Computer Vision Tasks

CVPR 2025 1

2024

Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation

arXiv 2024

2024

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

arXiv 2024

2024

Designing a Better Asymmetric VQGAN for StableDiffusion

arXiv 2023

2023

AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

ICCV 2023 1

2023

Equivariant Multi-Modality Image Fusion

CVPR 2024 1

2023

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

arXiv 2023

2023

Diversity-Aware Meta Visual Prompting

CVPR 2023 1

2023

Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting

ICCV 2023 1

2023

Reduce Information Loss in Transformers for Pluralistic Image Inpainting

CVPR 2022 1

2022

Semantic Image Synthesis via Diffusion Models

arXiv 2022

2022

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

arXiv 2022

2022

Vector Quantized Diffusion Model for Text-to-Image Synthesis

CVPR 2022 1

2021

Dynamic Head: Unifying Object Detection Heads with Attentions

CVPR 2021 1

2021

HairCLIP: Design Your Hair by Text and Reference Image

CVPR 2022 1

2021

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

cswin-transformer-a-general-vision-1

2021

Florence: A New Foundation Model for Computer Vision

arXiv 2021

2021

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

CVPR 2022 1

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers