JianFeng Wang
- Papers
- 21
Cite
Notes
Only stored in your browser.
Authored papers
21MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
arXiv 2024
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
arXiv 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
arXiv 2024
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
arXiv 2024
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
arXiv 2024
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
CVPR 2024 1
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design
arXiv 2023
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
arXiv 2023
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
arXiv 2023
Segment and Caption Anything
CVPR 2024 1
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
arXiv 2023
Interfacing Foundation Models' Embeddings
arXiv 2023
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
arXiv 2022
Generalized Decoding for Pixel, Image, and Language
CVPR 2023 1
GRiT: A Generative Region-to-text Transformer for Object Understanding
arXiv 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
coarse-to-fine-vision-language-pre-training-1
Exploring Discrete Diffusion Models for Image Captioning
arXiv 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
arXiv 2022
Florence: A New Foundation Model for Computer Vision
arXiv 2021
End-to-End Semi-Supervised Object Detection with Soft Teacher
ICCV 2021 10
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
arXiv 2021
Affiliations
Frequent co-authors
10from 21 papers