ThermalAttackNet: Are CNNs Making It Easy to Perform Temperature Side-Channel Attack in Mobile Edge Devices?

: Side-channel attacks remain a challenge to information ﬂow control and security in mobile edge devices till this date. One such important security ﬂaw could be exploited through temperature side-channel attacks, where heat dissipation and propagation from the processing cores are observed over time in order to deduce security ﬂaws. In this paper, we study how computer vision-based convolutional neural networks (CNNs) could be used to exploit temperature (thermal) side-channel attack on different Linux governors in mobile edge device utilizing multi-processor system-on-chip (MPSoC). We also designed a power- and memory-efﬁcient CNN model that is capable of performing thermal side-channel attack on the MPSoC and can be used by industry practitioners and academics as a benchmark to design methodologies to secure against such an attack in MPSoC.


Introduction
Recently, mobile devices have become an integral part of daily life. These mobile devices are utilized to run different types of applications, including video calling, web browsing, gaming, navigation; hence, energy-efficient processing on these battery-empowered mobile devices is of utmost importance [1,2]. Mobile cloud computing, where most of the computations happen in the cloud (also known as Cloud Offloading) [3], is considered to be a potential solution for energy-efficient processing. However, application processing that needs privacy and security, such as a banking app or a secure data storage app, is often processed on the mobile device instead of cloud offloading. Moreover, as mobile edge computing becomes more and more ubiquitous, security issues in these mobile devices become more paramount. Such mobile devices have to face hostile security threats [4][5][6][7], such as physical, logical/software-based, and side-channel/lateral attacks. Amongst these, side-channel attack [4][5][6] is a popular security threat due to ease of access to the physical hardware where attacks are performed by observing the properties and behavior of the system, such as power consumption, thermal dissipation, electromagnetic emission, etc.
Comparatively, a lot less documented studies are performed in temperature (thermal)based side channel attacks [4][5][6] in mobile edge devices. Most of these mobile devices come equipped with heterogeneous multi-processors systems-on-chip (MPSoC), which consists of multiple heterogeneous processors on a single chip, capable of processing different types of applications to cater for performance and energy-efficiency of the executing applications. Due to an increase in the usage of MPSoCs [2,[8][9][10][11][12][13] in mobile edge devices and a rise in studies on thermal side channel attacks [6,14,15], it is crucial that side channel attacks in such a platform should be addressed with utmost importance [15].
To explore feasibility of thermal side-channel attack in a real commercial mobile device, we designed a new type of attack which involved computer vision-based Convolutional Neural Network (CNN). Among all the fields of Neural Network-based machine learning and pattern recognition, computer vision-based Neural Networks, especially CNNs and Deep Learning (DL) [16,17], are well studied and mature comparatively. Recently, CNN models have achieved high prediction accuracy in applicative fields to solve several reallife challenges, such as traffic categorization [18,19], human rights violation [20], weather forecasting [21], etc. Given the high success rate in understanding patterns, we utilized a CNN model-based attack. To perform the attack, we chose 4 of the 25 most common passwords of 2017 and 2018 [22,23] as surveyed by the Internet security firm SplashData. The 4 common passwords used by the user, which are chosen for our attack, are 123456, passw0rd, 111111, and football. We then executed AES-256 [24] encryption on a text file using the aforementioned passwords on Odroid XU4 development board [25] running on ondemand Linux governor [26] and recorded the thermal behavior of the CPUs. We trained ResNet model [27], a pre-trained CNN model trained on ImageNet using transfer learning, with the graphical representation of the thermal behavior (as shown in Figures 1a and 2a). ResNet was able to achieve a training prediction accuracy of 46.88% and a testing prediction accuracy of 31.99%, which means that ResNet is able to predict the correct password, one out of every four attempts on an average. Figures 1b and 2b show the region of interest on the graphical representation of the thermal behavior which is used by the CNN model to predict the password. In order to determine whether ResNet is classifying the thermal data based on the features of the thermal peaks, we utilized Gradient-weighted Class Activation Mapping (Grad-CAM) [28] to visualize in which areas of the graphical data the CNN was focusing on to predict the password being used for encryption process. In Figures 1b and 2b, the area highlighted (heat map) with shades of yellow/red is the active region where the CNN is looking to determine the password used. In the heat map, the regions range from blue to red, where blue means least active region, and red means the most active one. The observations from the aforementioned figures prove that visual-based CNNs could be successfully utilized to perform thermal side-channel attack, and, to the best of our knowledge, this is the first documented study to do so. In summary, this paper makes the following contributions.

1.
Design and explore thermal side-channel attack using computer vision-based CNN models.

2.
Evaluate popular CNN models and their accuracy in predicting password for different Linux governors.

3.
Design and implementation of a power-and memory-efficient CNN model, ThermalAt-tackNet, to perform thermal side-channel attack on a real consumer mobile device. The main motive to design and implement a computer vision-based CNN model to perform thermal side-channel attack is to provide a benchmark that could be used by industry practitioners and researchers to improve security against such an attack in mobile devices utilizing MPSoCs.

Convolutional Neural Networks and Deep Learning
A Deep Learning (DL) model [29] consists of an input layer, several intermediate (hidden) layers, which are stacked on top of each other, and an output layer. In the input layer, which is the first layer of the model, the raw values of data features are fed into it. In each of the hidden layers, a mathematical operation called convolution is applied to extract specific features, which is then utilized to predict the label of the raw data in the last (output) layer of the DL network. Most of the time, if a model utilize an input layer, a hidden layer and an output layer then the model is denoted as Convolutional Neural Network (CNN) model or simply, CovNet. If such a model uses a lot of stacked hidden layers, only then it is denoted as a DL model or Deep Neural Networks (DNN).

Pre-Trained Networks and Transfer Learning
A conventional approach to enable training of DNN/CNN on relative small datasets is to use a model pre-trained on a very large dataset, and then use the CNN as an initialization for the applicative task of interest. Such a method of training is called "transfer learning" [30], and we have followed the same principle. The chosen CNN models mentioned in Section 3.3 are pre-trained on ImageNet [29]. For the proposed attack, we have utilized the following popular pre-trained CNN models: VGG (VGG19) [31], ResNet (ResNet152v2) [27], MobileNet (MobileNetv2) [32], and NASNet (NASNetMobile) [33].

Hardware & Software Setup for Experiments
We also chose Odroid XU4 [25] board to execute the attack in order to verify the affect of thermal side-channel exploitation. Odroid XU4 employs the Samsung Exynos 5422 [34] MPSoC, which is popularly used in Samsung mobile devices, especially Samsung Galaxy S5. The Odroid XU4 is a representational development board of Galaxy S5 smartphone. Exynos 5422 MPSoC contains clusters of big (4 Cortex A-15) and LITTLE cores (4 Cortex A-7). This MPSoC provides DVFS feature per cluster, where the big core cluster has 19 frequency scaling levels, ranging from 200 MHz to 2000 MHz with each step of 100 MHz and the LITTLE cluster has 13 frequency scaling levels, ranging from 200 MHz to 1400 MHz, with each step of 100 MHz.
The Odroid XU4 was running on UbuntuMate version 14.04 (Linux Odroid Kernel: 3.10.105). During the time of performing the attack, the average ambient temperature of the room was 21 • C. When we executed the attack, we changed the governor [26] between conservative, ondemand, performance, interactive, and powersaver to study which Linux governor is more vulnerable to such attack.
Brief description of the different types of governors are as follows: • ondemand: Sets the operating frequency of the CPU depending on the CPU utilization. In this, the operating frequency is set to maximum whenever there is any load on the CPU. • conservative: Is a fork of ondemand governor and sets the operating frequency of the CPU depending on the CPU utilization. It differs from ondemand by increasing or decreasing the operating frequency of the CPU gradually based on the CPU utilization. • performance: Sets the operating frequency of the CPU to the highest frequency within the borders of user specified minimum frequency and maximum frequency. • powersaver: Compared to performance, this governor sets the operating frequency of the CPU to the lowest frequency within the borders of user specified minimum frequency and maximum frequency. • interactive: Dynamically scales CPU operating frequency in response to the CPU utilization. Interactive is significantly more responsive than ondemand because it scales the operation frequency over the course of time to max frequency based on the CPU utilization.

Dataset and CNN Model
To generate a dataset of thermal behavior, we choose 4 most common passwords (123456, passw0rd, 111111, and football) and used AES-256 encryption algorithm to encrypt a text file using the aforementioned passwords. For each aforementioned password, the encryption was performed on the same text file for 500 times. The reason to choose AES-256 is because of its popularity. The encryption operations were performed on CPU 7, which is one of the big CPUs (A-15) of the Exynos 5422 MPSoC, while one of the LITTLE cores (CPU 3) snoops the operating temperature data of the big CPU. After the temperature data for each password were collected, we transformed the data points into a graphical representation in order to be fed to a pre-trained CNN for training and prediction purposes.

Training CNN to Predict Password
We choose a pre-trained CNN model, which is trained on 1000 classes of ImageNet (CNN model is pre-trained with 1000 different labels (classes), such as eskimo dog, madagascar cat, cougar, and lifeboat of ImageNet database.) and removed the classifier module and modified it to be able to predict our chosen classes of password. We fine-tuned [35] the CNN model by adding a new randomly initialized classifier (output layer) and training the last fully connected layer by freezing all the layers of the base model (frozen layers represented with gray color in Figure 4) and unfreezing the last fully connected layer (unfrozen layers represented with green color in Figure 4). Freezing the layers mean that no updates to the weights are made in those layers during the training process. The new output layer of the model is then trained to take the lower level features passed through the model network and map them to the desired output classes (password), using optimization techniques, such as stochastic gradient descent (SGD) [36]. SGD is an iterative optimization algorithm, which estimates the error gradient for the CNN model during the training process and updates the weights of the model using back-propagation [37].

ThermalAttackNet: Proposed CNN Architecture
Since most of the pre-trained CNNs come with several fully connected layers, using such a model consumes a lot of memory space on the device, as well as power. In order to overcome these challenges, we designed a CNN model, named ThermalAttackNet, which performs similar to popular CNNs (ResNet, VGG, NASNet, and MobileNet); however, at the same time, it consumes less power and memory comparatively. Given the fact that graphical representation of thermal behavior (as shown in Figure 3) consists of regular temperature peaks characterized by edges, we designed the CNN to be able to extract such features as accurately as possible. The architecture of ThermalAttackNet is illustrated in Figure 5. ThermalAttackNet consists of 6 convolutional layers (denoted by Conv2D in Figure 5), and we discard the fully connected layers in favor of retaining higher resolution feature maps at the deepest output layer. This also reduces the number of parameters (only 48,804) used in ThermalAttackNet compared to ResNet, VGG, NASNet, and MobileNet (as shown in Table 1). In Figure 5, it should be kept in mind that X is a variable batch size, which will depend on the implementation of the model, and C is the output classes, which is 4 (passwords) in our case. Each convolutional layer (Conv2D) performs convolution with a filter bank to produce a set of feature maps and then an element-wise rectified-linear non-linearity (ReLU) max(0, x) is applied. Following that, max-pooling (denoted as MaxPooling2D in Figure 5) is used to achieve translation invariance over small spatial shifts in the input image. Table 1 shows the comparison between ThermalAttackNet and other popular models. Note: ThermalAttackNet is trained on thermal dataset by performing augmentation to the data improve its training. The following data augmentation approaches were performed on the dataset: Horizontal and Vertical Shift, Random Zoom, and Shear Intensity.

Experimental and Evaluation Results
From the 500 graphical data for each password label, we separated 100 graphical data for cross-validation testing purpose, whereas 75% of the remaining 400 graphical data for each password label were used for training, and rest of the 25% is used for validation during the training period. Validation data is used to provide an unbiased evaluation of a model fit on the training dataset while tuning hyperparameters of the model. Table 2 shows the training prediction accuracy, and Table 3 shows the testing prediction accuracy achieved by MobileNetv2, NASNetMobile, ResNetv2, VGG19, and ThermalAttackNet, respectively, on different Linux governors: conservative (cons.), ondemand (ond.), performance (perf.), interactive (inter.), and powersaver (pow.).

Which CNN Model Is Best at Predicting Password
In Table 2, we could notice that MobileNetv2 achieves the highest training prediction accuracy of 69.6875 for performance governor; however, for the same governor, the testing prediction accuracy drops to 25.7499% (see Table 3). Since testing prediction accuracy is more important to determine if the CNN is able to predict accurately, based on Table 3, ResNet152v2 achieves the best prediction accuracy of 31.999%. Therefore, among these compared CNN models, ResNet152v2 is best at predicting password using our proposed thermal side-channel attack.
Which governor is least secure: From Table 3, it is evident that ondemand governor is the least secure among other Linux governors if ResNet152v2 is used as the model for the attack.

Power Consumption of CNNs
The average power consumption (in Watt) during inference while utilizing ResNet15v2, MobileNetv2, VGG19, NASNetMobile, and ThermalAttackNet on ondemand governor is 10.69, 9.56, 10.67, 8.79, and 7.63, respectively. Given the fact that ThermalAttackNet is fraction of a size of popular CNNs (see Table 1) while being able to predict close to other popular CNNs (see Table 3), utilizing ThermalAttackNet for such an attack on the device is more power efficient.

Extensive Evaluation on a Commercial Mobile Device
To evaluate the efficacy of the ThermalAttackNet on a commercial device to predict passwords via thermal side-channel, in general, we extended the evaluation by encrypting a text file, as mentioned in Section 3.2, on Exynos 5422 MPSoC, which is utilized in Samsung Galaxy devices, with 25 most commonly used passwords in 2018 [23]. To make the attack more realistic, as could be performed by an attacker or malicious program, for each password, the encryption was performed more than 400 times, and the dataset of thermal records for each password was not equal. Figure 6 shows the training accuracy ( Figure 6a) and loss (Figure 6b) for 140 epoch. An epoch is a term used to indicate the number of passes of the entire training dataset the machine learning model has completed. Figure 7 shows the confusion matrix [38], which is a table that is used to describe the performance of a classifier or classification model on a set of test data for which the true values of the classes are known. In Figure 7, the primary axis represents the 25 most used passwords, and the X-axis represents the prediction performance against the same respective classes (25 most used passwords). From the confusion matrix (Figure 7), we could notice that the prediction accuracy of ThermalAttackNet is 100%, which means it was able to predict the 25 most used passwords for the encrypted texts all the time, thus proving the efficacy of such an attack by a malicious person or program.

Discussion & Future Works
The ondemand governor is a dynamic in-kernel cpufreq governor that can change the CPU operating frequency depending on the CPU utilization. Here, the cpufreq is the subsystem of the Linux kernel that allows the operating frequency to be explicitly set on the processors. On the other hand, the performance governor sets the operating frequency of the CPUs at the highest possible frequency within a user specified range. From Table 3, we could notice that ondemand and performance governor are most vulnerable, and that is because, when the operating frequency of the CPU is set to very high, due to high power consumption, the heat dissipation on the CPU also increases significantly [2,8,9,[11][12][13][14]39], which creates a peak in temperature on the CPU.
Given the fact that ondemand and performance governors are more vulnerable to attacks similar to the proposed one, some form of software/hardware mechanisms should be employed in mobile edge devices employing such governors such that either the peak temperature achieved during the encryption process could be masked or such that the peak temperature does not increase during the encryption process.

Conclusions
In this paper, we studied the accuracy of different CNN models, ResNet15v2, Mo-bileNetv2, VGG19, and NASNetMobile, to predict passwords exploiting thermal sidechannel attacks for different Linux governors in mobile MPSoCs. Based on empirical data, ondemand governor is the least secure among other Linux governors if ResNet152v2 is used as a CNN model for the attack. We also proposed a power-efficient CNN, Ther-malAttackNet, which is able to predict passwords almost equally as ResNet152v2 CNN, however, in a more power-efficient manner, while consuming less disk storage memory on the device.

Code Availability
The program codes to implement the attack and generate the dataset could be accessed from https://github.com/somdipdey/ThermalAttackNet (accessed on 25 May 2021).