Research Repository

Multimodal Deep Features Fusion For Video Memorability Prediction

Leyva, Roberto and Doctor, Faiyaz and Garcia Seco De Herrera, Alba and Sahab, Sohail Multimodal Deep Features Fusion For Video Memorability Prediction. In: MediaEval, 2019-10-27 - 2019-10-29, Sophia Antipolis, France. (In Press)

[img]
Preview
Text
mediaEval2019.pdf - Submitted Version

Download (769kB) | Preview

Abstract

This paper describes a multimodal feature fusion approach for predicting the short and long term video memorability where the goal to design a system that automatically predicts scores reflecting the probability of a video being remembered. The approach performs early fusion of text, image, and video features. Text features are extracted using a Convolutional Neural Network (CNN), an FBResNet152 pre-trained on ImageNet is used to extract image features and and video features are extracted using 3DResNet152 pre-trained on Kinetics 400.We use Fisher Vectors to obtain a single vector associated with each video that overcomes the need for using a non-fixed global vector representation for handling temporal information. The fusion approach demonstrates good predictive performance and regression superiority in terms of correlation over standard features.

Item Type: Conference or Workshop Item (Paper)
Additional Information: Published proceedings: _not provided_
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Elements
Date Deposited: 27 Jan 2020 10:52
Last Modified: 27 Jan 2020 11:15
URI: http://repository.essex.ac.uk/id/eprint/26580

Actions (login required)

View Item View Item