Cite
Notes
Only stored in your browser.
Attribution
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
arXiv 2025
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
arXiv 2021
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
from 3 papers
Cha Zhang
Dinei Florencio
Furu Wei
Lei Cui
Tengchao Lv
Dan Roth
Guoxin Wang
Jianwei Yang
Jingye Chen
John Corring