Shuicheng Yan
- Papers
- 65
Cite
Notes
Only stored in your browser.
Authored papers
65Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation
arXiv 2026
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome
arXiv 2026
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook
arXiv 2026
DrawMotion: Generating 3D Human Motions by Freehand Drawing
arXiv 2026
Anisotropic Modality Align
arXiv 2026
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
arXiv 2026
The Trinity of Consistency as a Defining Principle for General World Models
arXiv 2026
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
arXiv 2025
MemEvolve: Meta-Evolution of Agent Memory Systems
arXiv 2025
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
arXiv 2025
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
arXiv 2025
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
arXiv 2025
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
arXiv 2025
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
arXiv 2025
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
arXiv 2025
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
arXiv 2025
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
arXiv 2025
SAIL-VL2 Technical Report
arXiv 2025
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
arXiv 2025
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
arXiv 2025
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
arXiv 2025
On Path to Multimodal Generalist: General-Level and General-Bench
arXiv 2025
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling
arXiv 2025
Cradle: Empowering Foundation Agents Towards General Computer Control
arXiv 2024
Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model
arXiv 2024
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket
arXiv 2024
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
arXiv 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
arXiv 2024
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
arXiv 2024
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
arXiv 2024
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
arXiv 2024
Point Cloud Mamba: Point Cloud Learning via State Space Model
arXiv 2024
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
arXiv 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
arXiv 2024
UniVST: A Unified Framework for Training-free Localized Video Style Transfer
arXiv 2024
Poison-splat: Computation Cost Attack on 3D Gaussian Splatting
arXiv 2024
MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
arXiv 2024
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
arXiv 2024
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries
arXiv 2024
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs
arXiv 2024
Two are better than one: Context window extension with multi-grained self-injection
arXiv 2024
Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration
arXiv 2024
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
ICCV 2023 1
Towards Garment Sewing Pattern Reconstruction from a Single Image
arXiv 2023
Instant3D: Instant Text-to-3D Generation
arXiv 2023
Skywork: A More Open Bilingual Foundation Model
arXiv 2023
Better Diffusion Models Further Improve Adversarial Training
arXiv 2023
Efficient Diffusion Policies for Offline Reinforcement Learning
efficient-diffusion-policies-for-offline
Generative Table Pre-training Empowers Models for Tabular Prediction
arXiv 2023
ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection
scalelong-towards-more-stable-training-of
Bag of Tricks for Training Data Extraction from Language Models
arXiv 2023
On Calibrating Diffusion Probabilistic Models
on-calibrating-diffusion-probabilistic-models
BAFFLE: A Baseline of Backpropagation-Free Federated Learning
arXiv 2023
Improving and Benchmarking Offline Reinforcement Learning Algorithms
arXiv 2023
Nonparametric Generative Modeling with Conditional Sliced-Wasserstein Flows
arXiv 2023
EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
arXiv 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
arXiv 2022
Inception Transformer
arXiv 2022
Position-guided Text Prompt for Vision-Language Pre-training
CVPR 2023 1
Mugs: A Multi-Granular Self-Supervised Learning Framework
arXiv 2022
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition
arXiv 2022
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
ICCV 2021 10
Deep Long-Tailed Learning: A Survey
arXiv 2021
ConvBERT: Improving BERT with Span-based Dynamic Convolution
NeurIPS 2020 12
PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer
psgan-pose-and-expression-robust-spatial
Affiliations
Frequent co-authors
10from 65 papers