0

LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild

Research evaluates the geolocation capabilities of multimodal language models using a new dataset, showing that closed-source models outperform open-source models, which can match their performance through fine-tuning.

Year
2024
Venue
arXiv 2024
Authors
6
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2405.20363ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal language models, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images from various countries via Google Street View. Then, we conduct training-free and training-based evaluations on closed-source and open-source multi-modal language models. we conduct both training-free and training-based evaluations on closed-source and open-source multimodal language models. Our findings indicate that closed-source models demonstrate superior geolocation abilities, while open-source models can achieve comparable performance through fine-tuning.

Authors

6