Li Zhang
- Papers
- 40
Cite
Notes
Only stored in your browser.
Authored papers
40Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining
arXiv 2026
SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving
arXiv 2026
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
arXiv 2025
Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning
CVPR 2025 1
Reinforcing Action Policies by Prophesying
arXiv 2025
TexVerse: A Universe of 3D Objects with High-Resolution Textures
arXiv 2025
Reasoning in Space via Grounding in the World
arXiv 2025
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation
arXiv 2025
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
arXiv 2025
Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study
arXiv 2025
CodeSwift: Accelerating LLM Inference for Efficient Code Generation
arXiv 2025
UniScene: Unified Occupancy-centric Driving Scene Generation
CVPR 2025 1
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
arXiv 2024
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
arXiv 2024
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation
arXiv 2024
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
arXiv 2024
S-Agents: Self-organizing Agents in Open-ended Environments
arXiv 2024
CAMixerSR: Only Details Need More "Attention"
CVPR 2024 1
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving
arXiv 2024
A Survey of Resource-efficient LLM and Multimodal Foundation Models
arXiv 2024
DroidCall: A Dataset for LLM-powered Android Intent Invocation
arXiv 2024
Brain3D: Generating 3D Objects from fMRI
arXiv 2024
On the Limit of Language Models as Planning Formalizers
arXiv 2024
PDDLEGO: Iterative Planning in Textual Environments
arXiv 2024
What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
arXiv 2024
Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review
arXiv 2023
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning
arXiv 2023
Faithful Chain-of-Thought Reasoning
arXiv 2023
Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
arXiv 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
CVPR 2023 1
PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection
ICCV 2023 1
Causal Reasoning of Entities and Events in Procedural Texts
arXiv 2023
Exploring the Curious Case of Code Prompts
arXiv 2023
Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
arXiv 2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Softmax-free Linear Transformers
arXiv 2022
Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data
ACL 2022 5
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
CVPR 2021 1
Improving Text-to-SQL Evaluation Methodology
improving-text-to-sql-evaluation-methodology-1
Learning a Deep Embedding Model for Zero-Shot Learning
learning-a-deep-embedding-model-for-zero-shot-1
Affiliations
Frequent co-authors
10from 40 papers