Lei Wang
- Papers
- 55
Cite
Notes
Only stored in your browser.
Authored papers
55WaDi: Weight Direction-aware Distillation for One-step Image Synthesis
arXiv 2026
RefAlign: Representation Alignment for Reference-to-Video Generation
arXiv 2026
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome
arXiv 2026
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
arXiv 2026
Understanding and Enforcing Weight Disentanglement in Task Arithmetic
arXiv 2026
MARS: Enabling Autoregressive Models Multi-Token Generation
arXiv 2026
Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models
arXiv 2026
From Perception to Action: An Interactive Benchmark for Vision Reasoning
arXiv 2026
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation
arXiv 2026
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
arXiv 2025
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
arXiv 2025
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
arXiv 2025
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
arXiv 2025
CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection
arXiv 2025
ChiseLLM: Unleashing the Power of Reasoning LLMs for Chisel Agile Hardware Development
arXiv 2025
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
CVPR 2025 1
HunyuanVideo 1.5 Technical Report
arXiv 2025
POLARIS: Projection-Orthogonal Least Squares for Robust and Adaptive Inversion in Diffusion Models
arXiv 2025
HunyuanImage 3.0 Technical Report
arXiv 2025
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?
arXiv 2025
One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models
arXiv 2025
Truth in the Few: High-Value Data Selection for Efficient Multi-Modal Reasoning
arXiv 2025
Scalable Chain of Thoughts via Elastic Reasoning
arXiv 2025
Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning
arXiv 2024
All in an Aggregated Image for In-Image Learning
arXiv 2024
ThinK: Thinner Key Cache by Query-Driven Pruning
arXiv 2024
Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight
arXiv 2024
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
arXiv 2024
Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models without Fine-Tuning
arXiv 2024
Optimizing Calibration by Gaining Aware of Prediction Correctness
arXiv 2024
STEMO: Early Spatio-temporal Forecasting with Multi-Objective Reinforcement Learning
arXiv 2024
YuLan: An Open-source Large Language Model
arXiv 2024
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
arXiv 2024
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
arXiv 2024
SATO: Stable Text-to-Motion Framework
arXiv 2024
A Survey on Large Language Model based Autonomous Agents
arXiv 2023
In-context Autoencoder for Context Compression in a Large Language Model
arXiv 2023
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
arXiv 2023
User Behavior Simulation with Large Language Model based Agents
arXiv 2023
Towards Semi-supervised Learning with Non-random Missing Labels
ICCV 2023 1
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction
ICCV 2023 1
Empower Your Model with Longer and Better Context Comprehension
arXiv 2023
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering
arXiv 2023
Hierarchical Spatio-Temporal Representation Learning for Gait Recognition
ICCV 2023 1
Enhancing Sample Utilization through Sample Adaptive Augmentation in Semi-Supervised Learning
ICCV 2023 1
Roll With the Punches: Expansion and Shrinkage of Soft Label Selection for Semi-supervised Fine-Grained Learning
arXiv 2023
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites
arXiv 2023
YAYI-UIE: A Chat-Enhanced Instruction Tuning Framework for Universal Information Extraction
arXiv 2023
Adaptive Multi-head Contrastive Learning
arXiv 2023
HiH: A Multi-modal Hierarchy in Hierarchy Network for Unconstrained Gait Recognition
arXiv 2023
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
arXiv 2022
MutexMatch: Semi-Supervised Learning with Mutex-Based Consistency Regularization
mutexmatch-semi-supervised-learning-with
RDA: Reciprocal Distribution Alignment for Robust Semi-supervised Learning
arXiv 2022
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
arXiv 2022
GaitMM: Multi-Granularity Motion Sequence Learning for Gait Recognition
arXiv 2022
Affiliations
Frequent co-authors
10from 55 papers