Yue Cao
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
arXiv 2026
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
arXiv 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
MAGI-1: Autoregressive Video Generation at Scale
arXiv 2025
Sequential Diffusion Language Models
arXiv 2025
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
arXiv 2024
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
arXiv 2024
Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection
arXiv 2024
EVA-CLIP: Improved Training Techniques for CLIP at Scale
arXiv 2023
SegGPT: Segmenting Everything In Context
arXiv 2023
CapsFusion: Rethinking Image-Text Data at Scale
CVPR 2024 1
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
arXiv 2023
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
arXiv 2023
IRAD: Implicit Representation-driven Image Resampling against Adversarial Attacks
arXiv 2023
Revisiting Discriminative vs. Generative Classifiers: Theory and Implications
arXiv 2023
Deep Incubation: Training Large Models by Divide-and-Conquering
ICCV 2023 1
SimMIM: A Simple Framework for Masked Image Modeling
CVPR 2022 1
Video Swin Transformer
CVPR 2022 1
Self-Supervised Learning with Swin Transformers
arXiv 2021
ParaSCI: A Large Scientific Paraphrase Dataset for Longer Paraphrase Generation
EACL 2021 2
Global Context Networks
arXiv 2020
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning
CVPR 2021 1
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
ICLR 2020 1
Bayesian active learning for optimization and uncertainty quantification in protein docking
arXiv 2019
Affiliations
Frequent co-authors
10from 24 papers