SIMPLE AND EFFECTIVE MULTIMODAL LEARNING BASED ON PRE-TRAINED TRANSFORMER MODELS

Simple and Effective Multimodal Learning Based on Pre-Trained Transformer Models

Simple and Effective Multimodal Learning Based on Pre-Trained Transformer Models

Blog Article

Transformer-based models have garnered attention because of their success in i teach my kids to hit and steal natural language processing, and in several other fields, such as image and automatic speech recognition.In addition to them being trained on unimodal information, many transformer-based models have been proposed for multimodal information.In multimodal learning, a common problem encountered is the insufficiency of multimodal training data.In this study, to address this problem, a simple and effective method is proposed by using 1) unimodal pre-trained transformer models as encoders orly happy camper for each modal input and 2) a set of transformer layers to fuse their output representations.Further, the proposed method is evaluated by conducting several experiments on two common benchmarks: CMU multimodal opinion sentiment intensity dataset and multimodal internet movie database.

The proposed model exhibits state-of-the-art performances on both benchmarks and is robust against the reduction in the amount of training data.

Report this page