0

Joyce Chai

Papers
29

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
29papers

Authored papers

29

MolmoAct2: Action Reasoning Models for Real-world Deployment

arXiv 2026

2026

RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

arXiv 2026

2026

Fast Spatial Memory with Elastic Test-Time Training

arXiv 2026

2026

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

CVPR 2025 1

2025

AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies

arXiv 2025

2025

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

arXiv 2025

2025

Proactive Assistant Dialogue Generation from Streaming Egocentric Videos

arXiv 2025

2025

Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation

arXiv 2025

2025

Next-Embedding Prediction Makes Strong Vision Learners

arXiv 2025

2025

Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors

arXiv 2025

2025

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

arXiv 2024

2024

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

CVPR 2025 1

2024

DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

arXiv 2024

2024

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

arXiv 2024

2024

Multi-Object Hallucination in Vision-Language Models

arXiv 2024

2024

Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use

arXiv 2024

2024

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

cyclenet-rethinking-cycle-consistency-in-text

2023

Inversion-Free Image Editing with Natural Language

arXiv 2023

2023

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

arXiv 2023

2023

Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties

arXiv 2023

2023

Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation

arXiv 2023

2023

World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models

arXiv 2023

2023

Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

arXiv 2023

2023

From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning

arXiv 2023

2023

Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue

arXiv 2023

2023

DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents

arXiv 2022

2022

DANLI: Deliberative Agent for Following Natural Language Instructions

arXiv 2022

2022

Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring

Findings (ACL) 2021 8

2021

CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 29 papers