Zhiding Yu
- Papers
- 20
Cite
Notes
Only stored in your browser.
Authored papers
20ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
arXiv 2026
Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline
arXiv 2026
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models
arXiv 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
arXiv 2025
Slow-Fast Architecture for Video Multi-Modal Large Language Models
arXiv 2025
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
arXiv 2024
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
arXiv 2024
Fully Attentional Networks with Self-emerging Token Labeling
fully-attentional-networks-with-self-emerging
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
arXiv 2024
LITA: Language Instructed Temporal-Localization Assistant
arXiv 2024
FB-BEV: BEV Representation from Forward-Backward View Transformations
ICCV 2023 1
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion
CVPR 2023 1
Prismer: A Vision-Language Model with Multi-Task Experts
arXiv 2023
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection
arXiv 2023
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving
arXiv 2023
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
arXiv 2022
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
CVPR 2022 1
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
NeurIPS 2021 12
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers
CVPR 2022 1
Partial Convolution based Padding
arXiv 2018
Affiliations
Frequent co-authors
10from 20 papers