Savran Kızıltepe, Rukiye (2022) Spatiotemporal Features and Deep Learning Methods for Video Classification. PhD thesis, University of Essex.
Savran Kızıltepe, Rukiye (2022) Spatiotemporal Features and Deep Learning Methods for Video Classification. PhD thesis, University of Essex.
Savran Kızıltepe, Rukiye (2022) Spatiotemporal Features and Deep Learning Methods for Video Classification. PhD thesis, University of Essex.
Abstract
Classification of human actions from real-world video data is one of the most important topics in computer vision and it has been an interesting and challenging research topic in recent decades. It is commonly used in many applications such as video retrieval, video surveillance, human-computer interaction, robotics, and health care. Therefore, robust, fast, and accurate action recognition systems are highly demanded. Deep learning techniques developed for action recognition from the image domain can be extended to the video domain. Nonetheless, deep learning solutions for two-dimensional image data cannot be directly applicable for the video domain because of the larger scale and temporal nature of the video. Specifically, each frame involves spatial information, while the sequence of frames carries temporal information. Therefore, this study focused on both spatial and temporal features, aiming to improve the accuracy of human action recognition from videos by making use of spatiotemporal information. In this thesis, several deep learning architectures were proposed to model both spatial and temporal components. Firstly, a novel deep neural network was developed for video classification by combining Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). Secondly, an action template-based keyframe extraction method was proposed and temporal clues between action regions were used to extract more informative keyframes. Thirdly, a novel decision-level fusion rule was proposed to better combine spatial and temporal aspects of videos in two-stream networks. Finally, an extensive investigation was conducted to find out how to combine various information from feature and decision fusion to improve the video classification performance in multi-stream neural networks. Extensive experiments were conducted using the proposed methods and the results highlighted that using both spatial and temporal information is required in video classification architectures and employing temporal information effectively in multi-stream deep neural networks is crucial to improve video classification accuracy.
Item Type: | Thesis (PhD) |
---|---|
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
Depositing User: | Rukiye Savran Kiziltepe |
Date Deposited: | 27 Jun 2022 16:41 |
Last Modified: | 27 Jun 2022 16:41 |
URI: | http://repository.essex.ac.uk/id/eprint/33068 |
Available files
Filename: PhD_RukiyeSavranKiziltepe_27June2022.pdf