Yue Wang
- Papers
- 53
Cite
Notes
Only stored in your browser.
Authored papers
53InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
arXiv 2026
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
arXiv 2026
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception
arXiv 2026
RealWonder: Real-Time Physical Action-Conditioned Video Generation
arXiv 2026
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
arXiv 2026
LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model
arXiv 2026
Representation Fréchet Loss for Visual Generation
arXiv 2026
MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments
arXiv 2025
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
arXiv 2025
SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending
arXiv 2025
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents
arXiv 2025
ETP-R1: Evolving Topological Planning with Reinforcement Fine-tuning for Vision-Language Navigation in Continuous Environments
arXiv 2025
Robot Learning from a Physical World Model
arXiv 2025
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving
arXiv 2025
BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs
arXiv 2025
Latent Denoising Makes Good Visual Tokenizers
arXiv 2025
DiffuSETS: 12-lead ECG Generation Conditioned on Clinical Text Reports and Patient-Specific Information
arXiv 2025
StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following
arXiv 2025
Redefining Machine Translation on Social Network Services with Large Language Models
arXiv 2025
NatureLM: Deciphering the Language of Nature for Scientific Discovery
arXiv 2025
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces
arXiv 2025
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis
CVPR 2025 1
HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for Autonomous Driving
arXiv 2024
OmniRe: Omni Urban Scene Reconstruction
arXiv 2024
InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds
arXiv 2024
Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
arXiv 2024
Wavelet Diffusion Neural Operator
arXiv 2024
Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey
arXiv 2024
Large Spatial Model: End-to-end Unposed Images to Semantic 3D
arXiv 2024
Yi: Open Foundation Models by 01.AI
arXiv 2024
Aria: An Open Multimodal Native Mixture-of-Experts Model
arXiv 2024
Denoising Vision Transformers
arXiv 2024
Towards Realistic Scene Generation with LiDAR Diffusion Models
arXiv 2024
Learning Temporally Consistent Video Depth from Video Diffusion Priors
CVPR 2025 1
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
arXiv 2024
Yuan 2.0-M32: Mixture of Experts with Attention Router
arXiv 2024
RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation
arXiv 2024
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
arXiv 2024
Aria-UI: Visual Grounding for GUI Instructions
arXiv 2024
Extrapolated Urban View Synthesis Benchmark
ICCV 2025
CodeT5+: Open Code Large Language Models for Code Understanding and Generation
arXiv 2023
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
arXiv 2023
Better Neural PDE Solvers Through Data-Free Mesh Movers
arXiv 2023
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving
arXiv 2023
On Uni-Modal Feature Learning in Supervised Multi-Modal Learning
arXiv 2023
GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training
arXiv 2023
MathChat: Converse to Tackle Challenging Math Problems with LLM Agents
arXiv 2023
A Language Agent for Autonomous Driving
arXiv 2023
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
arXiv 2022
VectorMapNet: End-to-end Vectorized HD Map Learning
arXiv 2022
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries
arXiv 2021
Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?
ECCV 2020 8
Dynamic Graph CNN for Learning on Point Clouds
arXiv 2018
Affiliations
Frequent co-authors
10from 53 papers