Joyce Chai
- Papers
- 29
Cite
Notes
Only stored in your browser.
Authored papers
29MolmoAct2: Action Reasoning Models for Real-world Deployment
arXiv 2026
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies
arXiv 2026
Fast Spatial Memory with Elastic Test-Time Training
arXiv 2026
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
CVPR 2025 1
AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies
arXiv 2025
4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time
arXiv 2025
Proactive Assistant Dialogue Generation from Streaming Egocentric Videos
arXiv 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
arXiv 2025
Next-Embedding Prediction Makes Strong Vision Learners
arXiv 2025
Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors
arXiv 2025
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
arXiv 2024
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
CVPR 2025 1
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
arXiv 2024
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions
arXiv 2024
Multi-Object Hallucination in Vision-Language Models
arXiv 2024
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use
arXiv 2024
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
cyclenet-rethinking-cycle-consistency-in-text
Inversion-Free Image Editing with Natural Language
arXiv 2023
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
arXiv 2023
Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties
arXiv 2023
Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation
arXiv 2023
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
arXiv 2023
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?
arXiv 2023
From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning
arXiv 2023
Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue
arXiv 2023
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents
arXiv 2022
DANLI: Deliberative Agent for Following Natural Language Instructions
arXiv 2022
Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring
Findings (ACL) 2021 8
CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models
arXiv 2021
Affiliations
Frequent co-authors
10from 29 papers