Xuelong Li
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
arXiv 2025
SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model
arXiv 2025
Hume: Introducing System-2 Thinking in Visual-Language-Action Model
arXiv 2025
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding
arXiv 2025
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
arXiv 2025
RecTok: Reconstruction Distillation along Rectified Flow
arXiv 2025
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables
arXiv 2025
Exploring the Potential of Encoder-free Architectures in 3D LMMs
arXiv 2025
Technical Report of TeleChat2, TeleChat2.5 and T1
arXiv 2025
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation
arXiv 2025
Learning Manipulation by Predicting Interaction
arXiv 2024
SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context
arXiv 2024
StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging
arXiv 2024
Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
arXiv 2024
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
arXiv 2024
Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training
arXiv 2024
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding
arXiv 2024
AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots
arXiv 2024
CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning
arXiv 2024
Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
arXiv 2023
Behavior Contrastive Learning for Unsupervised Skill Discovery
arXiv 2023
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
diffusion-model-is-an-effective-planner-and
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance
arXiv 2023
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs
arXiv 2023
Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction
ICCV 2023 1
Affiliations
Frequent co-authors
10from 25 papers