0

Xinlong Wang

Papers
28

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
28papers

Authored papers

28

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

arXiv 2026

2026

Uniform Discrete Diffusion with Metric Path for Video Generation

arXiv 2025

2025

Emu3.5: Native Multimodal Models are World Learners

arXiv 2025

2025

Unified Vision-Language-Action Model

arXiv 2025

2025

OmniGen2: Exploration to Advanced Multimodal Generation

arXiv 2025

2025

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

ICCV 2025

2025

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

arXiv 2024

2024

Emu3: Next-Token Prediction is All You Need

arXiv 2024

2024

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

arXiv 2024

2024

Autoregressive Video Generation without Vector Quantization

arXiv 2024

2024

Diffusion Feedback Helps CLIP See Better

arXiv 2024

2024

Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation

arXiv 2024

2024

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

CVPR 2025 1

2024

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

arXiv 2024

2024

EVA-CLIP: Improved Training Techniques for CLIP at Scale

arXiv 2023

2023

GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation

arXiv 2023

2023

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

arXiv 2023

2023

SegGPT: Segmenting Everything In Context

arXiv 2023

2023

Generative Multimodal Models are In-Context Learners

CVPR 2024 1

2023

3D-GPT: Procedural 3D Modeling with Large Language Models

arXiv 2023

2023

Tokenize Anything via Prompting

arXiv 2023

2023

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

arXiv 2023

2023

Fine-Grained Visual Prompting

NeurIPS 2023 11

2023

Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

arXiv 2023

2023

CapsFusion: Rethinking Image-Text Data at Scale

CVPR 2024 1

2023

Poseur: Direct Human Pose Regression with Transformers

arXiv 2022

2022

Conditional Positional Encodings for Vision Transformers

arXiv 2021

2021

End-to-End Video Instance Segmentation with Transformers

CVPR 2021 1

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 28 papers