Yu Wang
- Papers
- 77
Cite
Notes
Only stored in your browser.
Authored papers
77Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey
arXiv 2026
CocoaBench: Evaluating Unified Digital Agents in the Wild
arXiv 2026
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
arXiv 2026
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook
arXiv 2026
AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation
arXiv 2026
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
arXiv 2026
Urban Socio-Semantic Segmentation with Vision-Language Reasoning
arXiv 2026
AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization
arXiv 2026
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning
arXiv 2026
RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI
arXiv 2026
Can Large Language Models Reinvent Foundational Algorithms?
arXiv 2026
Agentic Reasoning for Large Language Models
arXiv 2026
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
arXiv 2025
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
arXiv 2025
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
arXiv 2025
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
arXiv 2025
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users
arXiv 2025
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey
arXiv 2025
RM-R1: Reward Modeling as Reasoning
arXiv 2025
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments
arXiv 2025
VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate
arXiv 2025
Knowledge Homophily in Large Language Models
arXiv 2025
Sleep-time Compute: Beyond Inference Scaling at Test-time
arXiv 2025
EfficientLLM: Efficiency in Large Language Models
arXiv 2025
Megrez-Omni Technical Report
arXiv 2025
MIRIX: Multi-Agent Memory System for LLM-Based Agents
arXiv 2025
Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
arXiv 2025
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
arXiv 2025
EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis
arXiv 2025
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
arXiv 2025
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation
arXiv 2025
MedS$^3$: Towards Medical Small Language Models with Self-Evolved Slow Thinking
arXiv 2025
WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages
arXiv 2025
Holistic Semantic Representation for Navigational Trajectory Generation
arXiv 2025
Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases
arXiv 2025
DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation
arXiv 2025
VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models
arXiv 2025
M+: Extending MemoryLLM with Scalable Long-Term Memory
arXiv 2025
RARE: Retrieval-Augmented Reasoning Modeling
arXiv 2025
Personalized Graph-Based Retrieval for Large Language Models
arXiv 2025
SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion
CVPR 2025 1
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment
arXiv 2025
Retrieval-Augmented Generation with Graphs (GraphRAG)
arXiv 2024
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts
arXiv 2024
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
arXiv 2024
LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K
arXiv 2024
DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models
arXiv 2024
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
arXiv 2024
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
arXiv 2024
Hybrid Fourier Score Distillation for Efficient One Image to 3D Object Generation
arXiv 2024
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models
ICCV 2025
Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
arXiv 2024
PiCO: Peer Review in LLMs based on the Consistency Optimization
arXiv 2024
CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
arXiv 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
arXiv 2024
LVCHAT: Facilitating Long Video Comprehension
arXiv 2024
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
arXiv 2024
LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization
arXiv 2024
DynaSaur: Large Language Agents Beyond Predefined Actions
arXiv 2024
Evaluating Quantized Large Language Models
arXiv 2024
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
CVPR 2025 1
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
arXiv 2024
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
arXiv 2024
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
arXiv 2024
MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception
arXiv 2024
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation
arXiv 2024
Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives
arXiv 2024
LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination
arXiv 2023
OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control
arXiv 2023
Learning Concise and Descriptive Attributes for Visual Recognition
ICCV 2023 1
Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation
ICCV 2023 1
Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models
arXiv 2023
A Topological Perspective on Demystifying GNN-Based Link Prediction Performance
arXiv 2023
Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation
arXiv 2023
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models
arXiv 2023
Spatial-temporal Concept based Explanation of 3D ConvNets
CVPR 2023 1
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
deep-gradient-compression-reducing-the-1
Affiliations
Frequent co-authors
10from 77 papers