Gang Yu
- Papers
- 47
Cite
Notes
Only stored in your browser.
Authored papers
47Step-Audio-R1.5 Technical Report
arXiv 2026
OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens
arXiv 2026
RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models
arXiv 2026
Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks
arXiv 2026
SkillNet: Create, Evaluate, and Connect AI Skills
arXiv 2026
GEditBench v2: A Human-Aligned Benchmark for General Image Editing
arXiv 2026
PixelSmile: Toward Fine-Grained Facial Expression Editing
arXiv 2026
Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting
arXiv 2026
Step1X-Edit: A Practical Framework for General Image Editing
arXiv 2025
OmniSVG: A Unified Scalable Vector Graphics Generation Model
arXiv 2025
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
arXiv 2025
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
arXiv 2025
ViStoryBench: Comprehensive Benchmark Suite for Story Visualization
arXiv 2025
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers
arXiv 2025
MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent
ICCV 2025
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models
arXiv 2025
StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians
arXiv 2025
WithAnyone: Towards Controllable and ID Consistent Image Generation
arXiv 2025
Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models
arXiv 2025
iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
arXiv 2025
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
arXiv 2025
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models
arXiv 2025
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
arXiv 2025
Step-Audio 2 Technical Report
arXiv 2025
UniVerse-1: Unified Audio-Video Generation via Stitching of Experts
arXiv 2025
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
arXiv 2025
RegionE: Adaptive Region-Aware Generation for Efficient Image Editing
arXiv 2025
Learning an Efficient Multi-Turn Dialogue Evaluator from Multiple Judges
arXiv 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
arXiv 2025
Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
arXiv 2025
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
arXiv 2024
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
CVPR 2025 1
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
metric3d-v2-a-versatile-monocular-geometric
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
arXiv 2024
MeshXL: Neural Coordinate Field for Generative 3D Foundation Models
arXiv 2024
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
arXiv 2024
AppAgent: Multimodal Agents as Smartphone Users
arXiv 2023
MotionGPT: Human Motion as a Foreign Language
NeurIPS 2023 11
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
CVPR 2024 1
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
arXiv 2023
StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation
arXiv 2023
Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation
michelangelo-conditional-3d-shape-generation
FaceStudio: Put Your Face Everywhere in Seconds
arXiv 2023
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
arXiv 2023
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis
arXiv 2023
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
arXiv 2023
NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results
arXiv 2022
Affiliations
Frequent co-authors
10from 47 papers