Yang You
- Papers
- 39
Cite
Notes
Only stored in your browser.
Authored papers
39SceneTeract: Agentic Functional Affordances and VLM Grounding in 3D Scenes
arXiv 2026
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
arXiv 2025
Enhance-A-Video: Better Generated Video for Free
arXiv 2025
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
arXiv 2025
Region-Adaptive Sampling for Diffusion Transformers
arXiv 2025
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
arXiv 2025
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
arXiv 2025
Robot Learning from a Physical World Model
arXiv 2025
Neural-Driven Image Editing
arXiv 2025
REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
arXiv 2025
SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation
arXiv 2025
Recurrent Diffusion for Large-Scale Parameter Generation
arXiv 2025
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
arXiv 2025
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
mdk12-bench-a-multi-discipline-benchmark-for
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy
arXiv 2025
Info-Coevolution: An Efficient Framework for Data Model Coevolution
arXiv 2025
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
arXiv 2024
Real-Time Video Generation with Pyramid Attention Broadcast
arXiv 2024
DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers
arXiv 2024
Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
arXiv 2024
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
arXiv 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
arXiv 2024
Neural Network Diffusion
arXiv 2024
RPMArt: Towards Robust Perception and Manipulation for Articulated Objects
arXiv 2024
ASCIIEval: Benchmarking Models' Visual Perception in Text Strings via ASCII Art
arXiv 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
arXiv 2024
Can a large language model be a gaslighter?
arXiv 2024
CAME: Confidence-guided Adaptive Memory Efficient Optimization
arXiv 2023
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
response-length-perception-and-sequence-1
InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning
arXiv 2023
Dataset Quantization
ICCV 2023 1
DREAM: Efficient Dataset Distillation by Representative Matching
ICCV 2023 1
Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models
ICCV 2023 1
Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching
arXiv 2023
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
CVPR 2023 1
MLLMs-Augmented Visual-Language Representation Learning
arXiv 2023
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
arXiv 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
arXiv 2021
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
large-batch-optimization-for-deep-learning
Affiliations
Frequent co-authors
10from 39 papers