0

Yue Cao

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

arXiv 2026

2026

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

arXiv 2025

2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

arXiv 2025

2025

MAGI-1: Autoregressive Video Generation at Scale

arXiv 2025

2025

Sequential Diffusion Language Models

arXiv 2025

2025

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

arXiv 2024

2024

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

arXiv 2024

2024

Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection

arXiv 2024

2024

EVA-CLIP: Improved Training Techniques for CLIP at Scale

arXiv 2023

2023

SegGPT: Segmenting Everything In Context

arXiv 2023

2023

CapsFusion: Rethinking Image-Text Data at Scale

CVPR 2024 1

2023

One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale

arXiv 2023

2023

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

arXiv 2023

2023

IRAD: Implicit Representation-driven Image Resampling against Adversarial Attacks

arXiv 2023

2023

Revisiting Discriminative vs. Generative Classifiers: Theory and Implications

arXiv 2023

2023

Deep Incubation: Training Large Models by Divide-and-Conquering

ICCV 2023 1

2022

SimMIM: A Simple Framework for Masked Image Modeling

CVPR 2022 1

2021

Video Swin Transformer

CVPR 2022 1

2021

Self-Supervised Learning with Swin Transformers

arXiv 2021

2021

ParaSCI: A Large Scientific Paraphrase Dataset for Longer Paraphrase Generation

EACL 2021 2

2021

Global Context Networks

arXiv 2020

2020

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning

CVPR 2021 1

2020

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

ICLR 2020 1

2019

Bayesian active learning for optimization and uncertainty quantification in protein docking

arXiv 2019

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers