Rui Wang
- Papers
- 85
Cite
Notes
Only stored in your browser.
Authored papers
85LoL: Longer than Longer, Scaling Video Generation to Hour
arXiv 2026
Hybrid Policy Distillation for LLMs
arXiv 2026
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
arXiv 2026
Enhancing Spatial Understanding in Image Generation via Reward Modeling
arXiv 2026
AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding
arXiv 2026
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models
arXiv 2026
FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance
arXiv 2026
SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning
arXiv 2026
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
arXiv 2026
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning
arXiv 2026
Step1X-Edit: A Practical Framework for General Image Editing
arXiv 2025
Seed1.5-VL Technical Report
arXiv 2025
PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation
arXiv 2025
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
arXiv 2025
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
arXiv 2025
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks
arXiv 2025
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
arXiv 2025
FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
WithAnyone: Towards Controllable and ID Consistent Image Generation
arXiv 2025
C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling
arXiv 2025
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
arXiv 2025
GDKVM: Echocardiography Video Segmentation via Spatiotemporal Key-Value Memory with Gated Delta Rule
arXiv 2025
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
arXiv 2025
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models
arXiv 2025
Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents
arXiv 2025
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
arXiv 2025
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
arXiv 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
arXiv 2025
UniTTS: An end-to-end TTS system without decoupling of acoustic and semantic information
arXiv 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
arXiv 2025
SkyReels-A2: Compose Anything in Video Diffusion Transformers
arXiv 2025
SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
arXiv 2025
Learning to Normalize on the SPD Manifold under Bures-Wasserstein Geometry
learning-to-normalize-on-the-spd-manifold
Do Large Language Models Truly Understand Geometric Structures?
arXiv 2025
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
arXiv 2025
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
arXiv 2025
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
ICCV 2025
See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning
arXiv 2025
Wavelet Diffusion Neural Operator
arXiv 2024
AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction
arXiv 2024
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
arXiv 2024
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
arXiv 2024
VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition
arXiv 2024
Nyonic Technical Report
arXiv 2024
Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding
arXiv 2024
Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training
arXiv 2024
MakeupAttack: Feature Space Black-box Backdoor Attack on Face Recognition via Makeup Transfer
arXiv 2024
Segment as You Wish -- Free-Form Language-Based Segmentation for Medical Images
arXiv 2024
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding
arXiv 2024
Restoring Images in Adverse Weather Conditions via Histogram Transformer
arXiv 2024
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
dart-math-difficulty-aware-rejection-tuning
Longhorn: State Space Models are Amortized Online Learners
arXiv 2024
AffineQuant: Affine Transformation Quantization for Large Language Models
arXiv 2024
AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and Dynamic Weighting Strategies
arXiv 2024
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
arXiv 2023
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
arXiv 2023
Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation
michelangelo-conditional-3d-shape-generation
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
arXiv 2023
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
arXiv 2023
FaceStudio: Put Your Face Everywhere in Seconds
arXiv 2023
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
arXiv 2023
Clustering-Aware Negative Sampling for Unsupervised Sentence Representation
arXiv 2023
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading
arXiv 2023
MELA: Multilingual Evaluation of Linguistic Acceptability
arXiv 2023
Penalty Decoding: Well Suppress the Self-Reinforcement Effect in Open-Ended Text Generation
arXiv 2023
PACO: Parts and Attributes of Common Objects
CVPR 2023 1
EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers
arXiv 2023
AlignDet: Aligning Pre-training and Fine-tuning in Object Detection
ICCV 2023 1
Exploring Human-Like Translation Strategy with Large Language Models
arXiv 2023
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
CVPR 2024 1
AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration
ICCV 2023 1
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
CVPR 2023 1
POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities
arXiv 2023
LidarGait: Benchmarking 3D Gait Recognition with Point Clouds
CVPR 2023 1
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
arXiv 2022
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
arXiv 2022
Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation
ACL 2022 5
CASSPR: Cross Attention Single Scan Place Recognition
ICCV 2023 1
4Seasons: Benchmarking Visual SLAM and Long-Term Localization for Autonomous Driving in Challenging Conditions
arXiv 2022
EfficientTDNN: Efficient Architecture Search for Speaker Recognition
arXiv 2021
Meta-Learning Dynamics Forecasting Using Task Inference
meta-learning-dynamics-forecasting-using-task-1
Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond
arXiv 2020
Multi-Domain Dialogue Acts and Response Co-Generation
multi-domain-dialogue-acts-and-response-co-1
Accuracy Prediction with Non-neural Model for Neural Architecture Search
arXiv 2020
Affiliations
Frequent co-authors
10from 85 papers