0

Gated Multimodal Units for Information Fusion

The Gated Multimodal Unit (GMU) model improvesmultilabel genre classification by integrating different data modalities through multiplicative gates, outperforming other fusion strategies on a newly released large-scale dataset.

Year
2017
Venue
arXiv 2017
Authors
4
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/1702.01992ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

This paper presents a novel model for multimodal learning based on gated neural networks. The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies.

Authors

4