0

LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos

A novel framework called LifelongMemory leverages multiple pre-trained models to address long-range temporal dependencies in egocentric video NLQ tasks, achieving competitive performance through integrated captioning and query generation.

Year
2023
Venue
arXiv 2023
Authors
3
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2312.05269v3ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

In this paper we introduce LifelongMemory, a new framework for accessing long-form egocentric videographic memory through natural language question answering and retrieval. LifelongMemory generates concise video activity descriptions of the camera wearer and leverages the zero-shot capabilities of pretrained large language models to perform reasoning over long-form video context. Furthermore, LifelongMemory uses a confidence and explanation module to produce confident, high-quality, and interpretable answers. Our approach achieves state-of-the-art performance on the EgoSchema benchmark for question answering and is highly competitive on the natural language query (NLQ) challenge of Ego4D. Code is available at https://github.com/agentic-learning-ai-lab/lifelong-memory.

Authors

3