0

Yue Wang

Papers
53

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
53papers

Authored papers

53

InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields

arXiv 2026

2026

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

arXiv 2026

2026

Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

arXiv 2026

2026

RealWonder: Real-Time Physical Action-Conditioned Video Generation

arXiv 2026

2026

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

arXiv 2026

2026

LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model

arXiv 2026

2026

Representation Fréchet Loss for Visual Generation

arXiv 2026

2026

MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments

arXiv 2025

2025

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

arXiv 2025

2025

SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending

arXiv 2025

2025

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

arXiv 2025

2025

ETP-R1: Evolving Topological Planning with Reinforcement Fine-tuning for Vision-Language Navigation in Continuous Environments

arXiv 2025

2025

Robot Learning from a Physical World Model

arXiv 2025

2025

Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving

arXiv 2025

2025

BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

arXiv 2025

2025

Latent Denoising Makes Good Visual Tokenizers

arXiv 2025

2025

DiffuSETS: 12-lead ECG Generation Conditioned on Clinical Text Reports and Patient-Specific Information

arXiv 2025

2025

StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following

arXiv 2025

2025

Redefining Machine Translation on Social Network Services with Large Language Models

arXiv 2025

2025

NatureLM: Deciphering the Language of Nature for Scientific Discovery

arXiv 2025

2025

UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces

arXiv 2025

2025

EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis

CVPR 2025 1

2025

HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for Autonomous Driving

arXiv 2024

2024

OmniRe: Omni Urban Scene Reconstruction

arXiv 2024

2024

InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds

arXiv 2024

2024

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning

arXiv 2024

2024

Wavelet Diffusion Neural Operator

arXiv 2024

2024

Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey

arXiv 2024

2024

Large Spatial Model: End-to-end Unposed Images to Semantic 3D

arXiv 2024

2024

Yi: Open Foundation Models by 01.AI

arXiv 2024

2024

Aria: An Open Multimodal Native Mixture-of-Experts Model

arXiv 2024

2024

Denoising Vision Transformers

arXiv 2024

2024

Towards Realistic Scene Generation with LiDAR Diffusion Models

arXiv 2024

2024

Learning Temporally Consistent Video Depth from Video Diffusion Priors

CVPR 2025 1

2024

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

arXiv 2024

2024

Yuan 2.0-M32: Mixture of Experts with Attention Router

arXiv 2024

2024

RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation

arXiv 2024

2024

HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks

arXiv 2024

2024

Aria-UI: Visual Grounding for GUI Instructions

arXiv 2024

2024

Extrapolated Urban View Synthesis Benchmark

ICCV 2025

2024

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

arXiv 2023

2023

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

arXiv 2023

2023

Better Neural PDE Solvers Through Data-Free Mesh Movers

arXiv 2023

2023

SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving

arXiv 2023

2023

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

arXiv 2023

2023

GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training

arXiv 2023

2023

MathChat: Converse to Tackle Challenging Math Problems with LLM Agents

arXiv 2023

2023

A Language Agent for Autonomous Driving

arXiv 2023

2023

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

arXiv 2022

2022

VectorMapNet: End-to-end Vectorized HD Map Learning

arXiv 2022

2022

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

arXiv 2021

2021

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

ECCV 2020 8

2020

Dynamic Graph CNN for Learning on Point Clouds

arXiv 2018

2018

Affiliations

No known affiliations.

Frequent co-authors

10

from 53 papers