Multimodal Deep Features Fusion For Video Memorability Prediction

Leyva, Roberto and Doctor, Faiyaz and Garcia Seco De Herrera, Alba and Sahab, Sohail (2019) Multimodal Deep Features Fusion For Video Memorability Prediction. In: MediaEval, 2019-10-27 - 2019-10-29, Sophia Antipolis, France. (In Press)

Abstract

This paper describes a multimodal feature fusion approach for predicting the short and long term video memorability where the goal to design a system that automatically predicts scores reflecting the probability of a video being remembered. The approach performs early fusion of text, image, and video features. Text features are extracted using a Convolutional Neural Network (CNN), an FBResNet152 pre-trained on ImageNet is used to extract image features and and video features are extracted using 3DResNet152 pre-trained on Kinetics 400.We use Fisher Vectors to obtain a single vector associated with each video that overcomes the need for using a non-fixed global vector representation for handling temporal information. The fusion approach demonstrates good predictive performance and regression superiority in terms of correlation over standard features.

Item Metadata

Item Type:	Conference or Workshop Item (Paper)
Additional Information:	Published proceedings: CEUR Workshop Proceedings
Divisions:	Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of
SWORD Depositor:	Unnamed user with email elements@essex.ac.uk
Depositing User:	Unnamed user with email elements@essex.ac.uk
Date Deposited:	27 Jan 2020 10:52
Last Modified:	23 Sep 2022 19:37
URI:	http://repository.essex.ac.uk/id/eprint/26580

Available files

Submitted Version

Filename: mediaEval2019.pdf

Download

Multimodal Deep Features Fusion For Video Memorability Prediction

Abstract

Item Metadata

Share and export

Available files

Submitted Version

Statistics

Downloads