0

MABViT -- Modified Attention Block Enhances Vision Transformers

Recent studies have demonstrated the effectiveness of Gated Linear Units (GLU) in enhancing transformer models, particularly in Large Language Models (LLMs). Additionally, utilizing a parallel configuration within each Transformer block rather than the conventional serialized…

Year
2023
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2312.01324v2
TL;DR
Semantic Scholar
Attribution policy →