MABViT -- Modified Attention Block Enhances Vision Transformers
Recent studies have demonstrated the effectiveness of Gated Linear Units (GLU) in enhancing transformer models, particularly in Large Language Models (LLMs). Additionally, utilizing a parallel configuration within each Transformer block rather than the conventional serialized…
- Year
- 2023
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.