MABViT -- Modified Attention Block Enhances Vision Transformers

Recent studies have demonstrated the effectiveness of Gated Linear Units (GLU) in enhancing transformer models, particularly in Large Language Models (LLMs). Additionally, utilizing a parallel configuration within each Transformer block rather than the conventional serialized…

Open

Year: 2023
ArXiv: arxiv.org/abs/2312.01324
URL: arxiv.org/abs/2312.01324v2
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2312.01324v2
TL;DR: Semantic Scholar

Attribution policy →