0

VQA Therapy: Exploring Answer Differences by Visually Grounding Answers

The study presents a new dataset, VQAAnswerTherapy, for visually grounding answers in visual question answering, along with methodologies for predicting single answer groundings and localizing multiple groundings.

Year
2023
Venue
ICCV 2023 1
Authors
3
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2308.11662v2ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Visual question answering is a task of predicting the answer to a question about an image. Given that different people can provide different answers to a visual question, we aim to better understand why with answer groundings. We introduce the first dataset that visually grounds each unique answer to each visual question, which we call VQAAnswerTherapy. We then propose two novel problems of predicting whether a visual question has a single answer grounding and localizing all answer groundings. We benchmark modern algorithms for these novel problems to show where they succeed and struggle. The dataset and evaluation server can be found publicly at https://vizwiz.org/tasks-and-datasets/vqa-answer-therapy/.

Authors

3