Cite
Notes
Only stored in your browser.
Attribution
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
arXiv 2021
from 1 papers
Bokyung Son
Wonjae Kim