The CLEVR dataset has been used extensively in language grounded visual reasoning in Machine Learning (ML) and Natural Language Processing (NLP) domains. We present a graph parser library for CLEVR, that provides functionalities for object-centric attributes and relationships extraction, and construction of structural graph representations for dual modalities. Structural order-invariant representations enable geometric learning and can aid in downstream tasks like language grounding to vision, robotics, compositionality, interpretability, and computational grammar construction. We provide three extensible main components - parser, embedder, and visualizer that can be tailored to suit specific learning setups. We also provide out-of-the-box functionality for seamless integration with popular deep graph neural network (GNN) libraries. Additionally, we discuss downstream usage and applications of the library, and how it accelerates research for the NLP research community.
CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes
A graph parser library for the CLEVR dataset facilitates the extraction and representation of attributes and relationships for dual-modal learning, aiding in language grounding and other tasks through integration with deep graph neural networks.
- Year
- 2020
- Venue
- EMNLP (NLPOSS) 2020 11
- Authors
- 2
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2009.09154v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar