0

DISCO: Document Intelligence Suite for COmparative Evaluation

DISCO evaluates OCR pipelines and vision-language models for document intelligence tasks, revealing varying performance across document types and highlighting the importance of task-aware approach selection.

Year
2026
Venue
arXiv 2026
Authors
4
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2603.23511ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Document intelligence requires accurate text extraction and reliable reasoning over document content. We introduce DISCO, a Document Intelligence Suite for COmparative Evaluation, that evaluates optical character recognition (OCR) pipelines and vision-language models (VLMs) separately on parsing and question answering across diverse document types, including handwritten text, multilingual scripts, medical forms, infographics, and multi-page documents. Our evaluation shows that performance varies substantially across tasks and document characteristics, underscoring the need for complexity-aware approach selection. OCR pipelines are generally more reliable for handwriting and for long or multi-page documents, where explicit text grounding supports text-heavy reasoning, while VLMs perform better on multilingual text and visually rich layouts. Task-aware prompting yields mixed effects, improving performance on some document types while degrading it on others. These findings provide empirical guidance for selecting document processing strategies based on document structure and reasoning demands.

Authors

4