Detection of Thermal Covert Channel Attacks Based on Classification of Components of the Thermal Signal Features

In response to growing security challenges facing many-core systems imposed by thermal covert channel (TCC) attacks, a number of threshold-based detection methods have been proposed. In this paper, we show that these threshold-based detection methods are inadequate to detect TCCs that harness advanced signaling and specific modulation techniques. Since the frequency representation of a TCC signal is found to have multiple side lobes, this important feature shall be explored to enhance the TCC detection capability. To this end, we present a pattern-classification-based TCC detection method using an artificial neural network that is trained with a large volume of spectrum traces of TCC signals. After proper training, this classifier is applied at runtime to infer TCCs, should they exist. The proposed detection method is able to achieve a detection accuracy of 99%, even in the presence of the stealthiest TCCs ever discovered. Because of its low runtime overhead (<inline-formula><tex-math notation="LaTeX">$< 0.187\%$</tex-math><alternatives><mml:math><mml:mrow><mml:mo><</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>187</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="wang-ieq1-3189578.gif"/></alternatives></inline-formula>) and low energy overhead (<inline-formula><tex-math notation="LaTeX">$< 0.072\%$</tex-math><alternatives><mml:math><mml:mrow><mml:mo><</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>072</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="wang-ieq2-3189578.gif"/></alternatives></inline-formula>), this proposed detection method can be indispensable in fighting against TCC attacks in many-core systems. With such a high accuracy in detecting TCCs, powerful countermeasures, like the ones based on dynamic voltage and frequency scaling (DVFS), can be rightfully applied to neutralize any malicious core participating in a TCC attack.


INTRODUCTION
A MONG a wide range of security challenges facing today's many-core chips, thermal covert channel attacks are found particularly dangerous and difficult to deal with, and even more so, as numerous innovative methods and techniques have been attempted to enhance the transmission performance and/or stealthiness of TCCs. For instance, by switching from non-return-to-zero (NRZ) [1] to Manchester code [2] in encoding scheme, a TCC channel witnesses its bit error rate (BER) drops from 11% [3] to 1% [2], while the throughput gets a boost to 5 bits per second (bps). As hyper-threading becomes commonplace in modern many-core systems, a TCC channel described in [2] can even deliver a transmission rate higher than 45 bps. In [3], a TCC attack over a real computer was demonstrated. In another study [4], cloud users who share the same FPGA chips can suffer from secret information leak with the help of TCC. These TCC attacks in real machines manifest the scope and seriousness of TCC attacks.
To fight against TCC attacks, the dynamic voltage and frequency scaling (DVFS) or noise jamming based countermeasures that are supposedly to block any meaningful thermal signal transmission can be applied [5], [6]. Both countermeasures rely on the threshold-based detection methods to detect the existence of a TCC. Essentially, a TCC is deemed as present if the amplitude of the thermal signal exceeds a well-defined threshold.
As TCC becomes stealthier with reduced signal amplitudes, the threshold-based detection methods find themselves inadequate to discover TCC attacks. For example, when a TCC needs to generate a high temperature within a signal period of t b , if the heat up time is reduced from 0.5t b (the case shown in Fig. 1(a)) to 0.1t b (the case shown in Fig. 1(b)) and the cool down time is increased from 0.5t b (the case shown in Fig. 1(a)) to 0.9t b (the case shown in Fig. 1(b)), the signal amplitude is so low that the threshold-based detection schemes proposed in [5], [6] will fail to detect any TCC, pushing down the detection accuracy to almost 0 as indicated in Fig. 1(c). Countermeasures with such a low detection accuracy are unacceptable, since blindly generating thermal noise or applying DVFS to cores especially those running legitimate applications will lead to significant performance loss. In this paper, we define the radio of the heat up time to the cool down time as a, and the value of a is important for the improved stealthy TCC.
Although the improved stealthy TCC in Fig. 1(b) can circumvent the threshold-based detection, the TCC signals are found to have multiple side lobes of high amplitudes, a fairly consistent feature that can be explored for the sake of TCC detection. Correspondingly, we present a pattern recognition algorithm to classify (detect) TCC attacks out of thermal noise in this paper. This proposed pattern-classification-based TCC detection method is able to detect all the known TCC attacks and variants with a high detection accuracy. The contributions of this paper are as follows.
1) An improved stealthy TCC is designed such that the existing threshold-based detection methods are no longer useful. 2) The frequency components of the TCC signals are analyzed, and the features drawn from the side lobes are explored for TCC detection. A neural-networkbased classifier is thus developed, trained, validated, and tested for TCC detection. 3) Experimental results show that the proposed detection scheme can detect TCCs with an accuracy of as high as 99%, and the runtime overhead is very low. The remainder of this paper is organized as follows. Section 2 presents the previous works on thermal covert channel attacks. Sections 3 and 4 present the details regarding the possible designs and detection of the improved stealthy TCC, respectively. In Section 5, the proposed detection scheme is experimentally verified, and the results are reported. Finally, Section 6 concludes this paper. ON DETECTION OF THERMAL COVERT CHANNELS

Baseline Thermal Covert Channels
With heat as communication media and no shared resources (e.g., cache and memory), the thermal covert channel attacks can be launched in many-core systems [2], [3], [5], [7] more easily than other types of covert channels. A typical thermal covert channel is modeled to include a transmitter and a receiver as well as a defender, as shown in Fig. 2. As shown in Fig. 2(a), a TCC attack has a pair of transmitter and receiver programs. The transmitter is in the secure zone of the system and is able to obtain sensitive data. The receiver is in the unsecure zone and does not have direct access to the sensitive data. The transmitter encodes the bit stream of the sensitive data into temperature variations. For example, bit '1' is encoded by a rise and fall in temperature, and bit '0' is encoded by staying at a low temperature, as shown in Fig. 2 (b). The thermal signals are generated by running either computation-intensive codes for heating-up or keeping the core idle for cooling-down. The receiver on the other end of the communication link reads its thermal sensor and decodes the sensitive data originated from the transmitter.
The transmitter program in a TCC can be implanted into a secure zone, which is supported by technologies like ARM TrustZone [1] and Intel software-guard extensions (SGX) [8] through software updates or other means [9]. For example, before a user application is loaded into its own SGX enclave, the transmitter codes can be injected into the user application as a Trojan [9], resulting in that the transmitter is able to run in the secure zone and has access to private data. After implanted into the secure zone, a TCC program can leak private data by deliberately manipulating chip temperatures [2], [3], [5], [6]. The thermal signals can be obtained either by directly reading the thermal sensors through MSR (i.e., Model Specific Register) software interface [10] or by reading the temperature files exposed by some commonly installed temperature-monitoring utility tool (e.g., CoreTemp [11] in Linux system). The thermal sensors nowadays are technologically fine-tuned, with several precisions like 1 C in [10] and 0.12 C in [12].
The TCC receiver program, which is outside the secure zone, reads the thermal signals from its local thermal sensor, decodes the bitstream back into sensitive data originated from the transmitter, and delivers the data to the hacker through the network.
In fighting against the TCC programs, a defender runs a program that can detect all cores' workload traces and can access all thermal sensors. The defender is granted ROOT privilege that it can apply countermeasures that it hopes to neutralize any TCC attacks.
Hereinafter, the TCC models described in [5] are adopted as the baseline TCC model in this paper. In paricular, both 0-hop and 1-hop TCCs are considered. Here the transmitter and receiver of a 1-hop channel are two cores that are one hop away from each other. There are two types of 0-hop channels: i) the receiver and transmitter threads run in the same physical core and they share the same thermal sensors; and ii) the receiver  and the transmitter have access to the same thermal data files which are exposed by installed temperature monitoring software or sysfs system in Linux [11].

Threshold-Based Detection Methods and Their Disadvantages
To fight against TCCs, threshold-based detection methods [5] have been proposed. The detection threshold, defined as r, ranges from r l to r h , where r l is the average noise amplitude (e.g., thermal signals generated by normal applications), and r h is the average value of signal amplitudes at the transmission frequencies of TCCs. The detection accuracy is related to the true positive rate, false positive rate, and true negative rate. Here a false positive case refers to the situation that an innocent thread is mistakenly recognized as a malicious one, while a true negative case refers to an innocent thread is not identified as so, and a true positive is when a TCC thread is correctly detected. From Figs. 3(a) and 3(b), one can see that when using a threshold-based detection which works fine against baseline TCCs, if the signal amplitudes of a TCC are below the threshold, the threshold-based detection methods will be unlikely to be able to detect such improved stealthy TCCs. On average, the true positive rate, true negative rate, and total accuracy are 0%, 47.5%, and 47.5%, respectively. Such a low true positive rate is unacceptable in detection.
Apparently, the error rate or detection accuracy depend on the selection of detection threshold. As shown in Fig. 4, let x be the signal amplitude, f 1 ðxÞ be the probability density function of TCC's signal amplitudes, and f 0 ðxÞ be the probability density function of noise's amplitudes. According to [13], the detection error rate, denoted as P e , can be modelled by Eqn. (1), where P ð1Þ and P ð0Þ are the probabilities that the TCC signals and noise are sent, respectively; P ð0j1Þ is the probability that the signals from TCCs are not detected; and P ð1j0Þ is the probability that the noise is regarded as TCC signals. One can see from Eqn. (1), the detection error rate P e or the detection accuracy (1 À P e ) depend on the value of the detection threshold r.
P e ¼ P ð1ÞP ð0j1Þ þ P ð0ÞP ð1j0Þ Z þ1 r f 0 ðxÞdx: (1) Fig. 5 shows the distributions of f 1 ðxÞ and f 0 ðxÞ, which are the probability density functions of the amplitudes of TCC signals and noise, respectively.
The overlap (the shaded area in Fig. 4) between the two distribution functions, f 1 ðxÞ and f 0 ðxÞ, denoted as P o , is given by The percentages of P o over f 1 ðxÞ and f 0 ðxÞ are defined in Eqns. (3) and (4) respectively, which are 30% and 28% from the experiments.
To study the impact of the threshold r on the detection accuracy of improved stealthy TCCs, Fig. 6 shows the numerical results of Eqn. (1) with the configurations detailed in Section 5.1.
In our experiments, half of the applications generate TCC signals, and the remaining half of the applications generate noise. Therefore, the ideal case of the true positive rate, the true negative rate, the false positive rate, and the total accuracy are supposed to be 50%, 50%, 0%, and 100%, respectively. From Fig. 6, one can see that when the threshold r is lower than 40 dB, the true positive rate increases, but the true negative rate decreases. When the threshold drops to 30 dB, the total detection accuracy reaches the highest at 70%, with a true positive rate of 40% and a true negative rate of 30%. Note that the false negative rate of 10% means 10% of the TCC attacks are not detected. In addition, when this threshold drops even further, the false positive rate goes up to a level higher than 20% (i.e., 30%$50%). With such a high false positive rate, applying a DVFS-based countermeasure, like the one described in [5], will lead to unacceptable performance loss.

Design of Improved Stealthy TCCs
To circumvent the threshold-based detection, the signal amplitude of TCC needs to be reduced to the level lower   than the detection threshold. As shown in Fig. 1(b), the signal amplitude reduction can be done by reducing the time to boost up the temperatures as follows.
Based on the RZ encoding scheme, when transmitting a bit '1', hereinafter, the ratio of the time to heat up over the time to cool down is defined as a. Note that a is almost always set to 1 (e.g., a ¼ 0:5t b =0:5t b as shown in Fig. 1(a)) in the baseline TCCs in Section 2.1. The value of a is selected in an iterative manner that targets to fail the thresholdbased detection method.

1) In each iteration, build a TCC based on a specific
value of a (intial value is set to 1, and in every iteration its value is linearly decremented by v from the previous iteration). 2) Then we apply the threshold-based detection to detect the just built TCC and get the detection accuracy as well as the packet error rate (PER) when no countermeasure is adopted.
3) The iterative method should be stopped if the minimum detection accuracy is obtained and the PER of the TCC is lower than an acceptable level, say 10%. Once a is determined, the improved stealthy TCC attack is designed as follows.
The transmitter uses the above method to compute a. The bit stream to be transmitted is RZ encoded with a period of t b . For bit '1,' the transmitter core runs CPU-intensive code for a duration of Â t b , and cools down for a duration of ð1 À Þ Â t b within the same period. For bit '0,' the core keeps idle for the entire period. On the receiver side, a finite impulse response (FIR) filter with center frequency of 1=t b is used to filter out the signal after reading the thermal sensors. If the signal amplitude is over a decision threshold which is half of the maximum signal amplitude, it is deemed as bit '1,' otherwise bit '0'.
The transmitter and receiver use a handshake protocol [5] for communication. That is, the transmitter first sends a request packet (REQ) to the receiver to setup the connection. Upon receiving the REQ packet, the receiver replies an ACK packet to the transmitter. The transmitter then starts data transmission. Finally, the transmitter sends a TER packet to terminate the connection once the data transmission is done.

Spectrum Analysis of TCC Signals
We analyze the above proposed improved stealthy TCC in frequency domain. A TCC signal is set to be band-limited, ranging from 10 Hz to 500 Hz [5]. Below 10 Hz, thermal noise dominates, and since the thermal sensors have a refresh rate (sampling) of 1,000 times per second, signal transmission frequency has to be cut at 500 Hz.
From Figs. 7(a) and 7(b), one can see that the signal amplitude at 50 Hz of the improved stealthy TCC is much lower than that of the baseline TCC. Besides, the improved stealthy TCC have additional high-amplitude side lobes at 100, 150, 200, and 250 Hz, a feature not seen in the baseline TCC. By comparing the spectra of the baseline and improved stealthy TCCs and that of thermal noise [see Fig. 7(c)], one can see that these high-amplitude side lobes can be used as the key feature to distinguish the TCCs from noise.
Correspondingly, a pattern-classification-based detection method is proposed in the next subsection. Even with adoption of different types of encoding schemes (e.g., on-off keying, RZ, and Manchester code), the TCCs with improved stealthiness still exhibit high-amplitude side lobes when PER is lower than 10%. As a result, a TCC can not be escaped from being detected by the proposed detection method.
To further study the source of the side lobes, the following analysis is performed.
According to [15], the TCC thermal signals of the receiver can be written as where a 1 , a 2 , a 3 , a 4 , b 1 , and b 2 are coefficients, a 4 is the temperature contribution of the receiver core, and p is the power consumption of the transmitter core. The frequency domain transformation is obtained by performing a discrete Fourier transform on sðtÞ, which is given as where f is signal frequency, and dðÁÞ is the Dirac function.  The spectrum of thermal noise (e.g., generated by running the 'blackscholes' application from PARSEC [14]).
One example of sðtÞ and SðfÞ is shown in Figs. 8(a) and 8(b). From Eqn. (6) and Fig. 8(b), one can see the side lobes of SðfÞ, and they are mainly due to the two terms involving the Dirac function in Eqn. (6). By exploring these side lobes as one of the frequency domain features of the TCC signal, we show the improved TCC attack, which cannot be detected by other known methods, become detectable.
The frequency spetrum changes when bit rate varies. For example, when the TCC bit rate is increased by increasing the transmission frequency, there are more high frequency components.

CLASSIFICATION-BASED DETECTION OF THERMAL COVERT CHANNEL ATTACKS
To fight against the improved TCC, the features of TCCs in Section 3.2 is explored by a neural network based detection to classify (detect) TCCs from noise. The neural network model that is trained offline is applied to the system at runtime to classify the TCC signals and noise. For TCC, the pattern recognition algorithm generates an output 1, and for thermal noise, the output is 0.

Neural Network Model
The neural network based model is experimentally compared against two other models based on classification tree [16] and logistic regression [16]. The validation accuracies of all the three models are compared in Table 1.
One can see that the proposed neural network model has the highest validation accuracy (the sum of true positive rate and true negative rate on the validation data set). Therefore, we used the proposed neural network model in all our subsequent experiments. As shown in Fig. 9, the neural network model has k middle layers and 1 output layer, which is called ðk þ 1Þ-layer neural network model. 1) Input data: the input data is a vector with 491 elements, with each element representing a signal amplitude (the signal frequencies span from 10 Hz to 500 Hz with an incremental of 1 Hz) of TCC or noise. The input vector is denoted asx [e.g., . . . ; x 491 Þ]. 2) Middle layers: the lth layer contains n l neural nodes, with each node value being denoted as an element of vectorã ½l [e.g.,ã ½l ¼ ða 1 ½l ; a 2 ½l ; . . . ; a n l ½l Þ]. The activation function of layer l is denoted as d ½l ðÁÞ.
3) Output layer: this layer has a node with its output value (denoted asŷ) to be either '1' or '0', indicating whether the input is a TCC signal or not. The 'sigmoid' activation function is adopted in this layer before the final result is generated. When the output value from 'sigmoid' is higher than a threshold (e.g., 0.5 in our experiments), it indicates that the input data is possibly from TCC and the final output of the model is 1, otherwise 0.

Data Preprocessing and Training
For the training data of the neural network model, we generate TCC signals or noise from a logical core for each t seconds (e.g., 2 seconds) with a sampling frequency of 1000 Hz. That is, each data sample contains 1000 Â t temporal signal values over t seconds. Before training the parameters of the model, each data sample is transformed into frequency domain representation. That is, through discrete Fourier transform (DFT), each data sample is made of a sequence of the amplitudes of 491 signal components. We also provide a supervised label (denoted as y) for each data sample to the neural network model for parameter training. If the signals are from thermal covert channels, the supervised label is set to be '1', otherwise '0'. The data samples are divided into three datasets: the training set, validation set, and test set. As their names suggest, the data from the training set is used to train the parameters (i.e., the weights of the network edges) of the neural network model; the data from the test set is used to evaluate the ability of model generalization, that is, how well the training model performs on new data samples; and the data from the validation set is used to choose a model that has the best ability of generalization among different hyper-parameters (i.e., learning rate, number of training iterations, and number of neural layers or nodes, etc.). The parameters (weights of edges) of the model during the training are randomly initialized. After ' iterations in training, the gradient descent based method [16] obtains the neural network model parameters with learning rate " and cost  function Jðy; b yÞ ¼ À P n i¼1 y ðiÞ logŷ ðiÞ þ ð1 À y ðiÞ Þlog ð1 Àŷ ðiÞ Þ, by the following three-step procedure.
Step 1 -Forward Propagation: From the input layer to the output layer, the weights of each neuron a a a a a a a ½l are computed layer by layer, a a a a a a a ½l ¼ d ½l W W W W W W W ½l a a a a a a a ½lÀ1 þ b where a a a a a a a ½0 ¼ x x x x x x x; that is, we have the input vector, b y ¼ a a a a a a a ½R with R as the number of layers of the neural network.
Step 2 -Backpropagation: From the output layer to the first layer, the gradient of the cost function to the parameters of each layer is calculated layer by layer by the chain rule following Eqn. (8), (9), (10), and (11), a a a a  a a ½lÀ1 þ b b b where @J @W W W W W W W ½l is the partial derivative of the cost function of d ½lðÁÞ , a a a a a a a ½lT and W W W W W W W ½lT are the transpose of a a a a a a a ½l and W W W W W W W ½l respectively, and c is the sample size. dz ½l ½:; j is the jth column vector of matrix dz ½l .
Step 3, the parameters of the neural network model are updated following Eqns. (12) and (13).

Online Detection
After the neural network model is trained offline, it can be used to detect the existence of a TCC attack in a many-core system at runtime. Herein a detection cycle and a global manager are the time unit of each detection and the thread to initiate a detection cycle, respectively. In addition, the detection is performed at the logical core level since the hyper-threading multi-/many-core systems have become commonplace. To reduce runtime overhead, a distributed detection architecture can be adopted. That is, the global manager assigns the detection jobs to each individual logical core, and each logical core performs the detection and reports their individual results back to the global manager regarding whether there is a possible attack or not. Note that since there is a strong thermal correlation between neighboring cores, detecting thermal signals generated by each core's activities cannot easily distinguish a TCC core from those running normal applications. For instance, in the case that a transmitter core and a core running normal applications are physically next to each other in the vertical stack of a 3D many-core system, the thermal signals collected from both cores exhibit the same transmission frequency. Essentially, a TCC program running in a secure zone does not have direct access to the cores to change their voltage and/or frequency; rather a TCC controls the CPU workloads by either running computationintensive codes or keeping the cores idle as an indirect means to generate thermal signals. Therefore, instead of using thermal signals, the CPU workloads measured by the number of instructions per cycle (IPC) are used to exactly pin down the TCC cores.
In each detection cycle, the global manager samples each logical core's IPC profiles over a time window, say t 1 seconds (i.e., 2 seconds in our experiments), and it then commands each logical core to execute a pattern-classification-based detection to test whether there is a TCC attack or not. This detection task is set to supersede any other tasks of the logical core that has been engaged with. Calculate the spectrum of signal IPC i using the discrete fast Fourier transform (FFT) algorithm; 3: Feed the signal amplitudes (from 10 Hz to 500 Hz with a frequency incremental of 1 Hz) to the input layer of the neural network model and get the model resultŷ; 4: ifŷ ¼ 1 then 5: Add i to L; 6: Report logical core i to the global manager; 7: return. 8: end 9: Send a message to the global manager that no TCC channel is found in core i.

10: end
The pattern-classification-based detection algorithm in Algorithm 1 works as follows.
Step 1. The global manager initiates a detection cycle to see if there is any possible TCC attack. Upon receiving the command, each logical core extracts the spectrum of its IPC signals (see line 2 in Algorithm 1). Then each logical core feeds the signal amplitudes (from 10 Hz to 500 Hz with a linear frequency incremental of 1 Hz) to the detection model and gets the outputŷ when the model calculation is finished (see line 3 in Algorithm 1).
Step 2. After getting the outputŷ from the detection model, one can decide whether the signals are actually from a covert channel or not (see line 4 in Algorithm 1). Once a suspicious channel is detected in logical core i, core i is added to list L (see line 5 in Algorithm 1). Note that a normal application running on a core may be mistakenly deemed as a suspicious one, which is a standard false positive. There is a low probability of false positive, typically in the range of 5% as demonstrated in [5], which is acceptable in this step.
Step 3. At the end of a detection cycle, if a logical core confirms that a TCC attack is present, it reports its findings, including the position of the detected logical core, to the global manager (see line 6 in Algorithm 1). Otherwise, the logical core can conclude that no TCC attack has been found in the current detection cycle and reports so to the global manager (see line 9 in Algorithm 1).
Step 4. If the global manager finds no TCC channel exists in any of the logical cores, it initiates a new detection cycle, after which the process starts all over again from step 1. Otherwise, if the address space of a thread listed in L can be accessed, that thread is removed from list L. Here only a detected thread running in a secure zone is deemed a threat to system security. Note that supported by processor reserved memory [17], only self-signed applications can access the secure zone.
If one or more TCC logical cores are detected, i.e., list L is not empty, the global manager begins to block the transmission from the cores listed in L. The transmission blocking is supported by applying the DVFS-based countermeasure proposed in [5] to the cores detected with a TCC transmitter or receiver (essentially the physical CPU cores that the detected logical cores belong to). Since the DVFS-based countermeasure dynamically changes the voltage and frequency level of the detected transmitter core (e.g., scaling down from 2.5 GHz to 500 MHz), the thermal signals generated by the transmitter can be severely distorted, leading to a very high error transmission rate that essentially shuts down a TCC.

Overhead of the Proposed Detection Method
Similar to the threshold-based detection method in [5], a detection cycle is initiated repeatedly; that is, a new detection cycle will be initiated after the global manager applies DVFS countermeasure to the detected cores. A detection cycle spans 2 phases: t 1 , and t 2 (see Section 4.3), where t 1 (i.e., 2 s) is the time for the global manager to calculate all logical cores' IPC values, and t 2 is the time for each logical core to perform discrete fast Fourier transform and neural network model inference. Compared with the threshold-based detection method in [5], the proposed detection method has additional runtime overhead and energy overhead for the neural network model inference at runtime.
As indicated in [5], the length of a detection cycle on average is 2 s (see section 4.3). During t 1 , only the global manager takes 57344 Â n c clock cycles or 28 Â n c ms for a core running at 2 GHz. During t 2 , each core takes 901,120 clock cycles or 0.45 ms to perform the discrete Fourier transform. The neural network inference needs 4920 real number multiplications, which corresponds to 9:84 Â 10 4 clock cycles [18], or a total of runtime of 49.1ms for a core clock running at 2 GHz.
Although the detection works periodically, the system (except the global manager) runs normal tasks as well as the TCC tasks during most of the time of a detection cycle (i.e., during t 1 ). When n c 100, the inference time of the proposed detection accounts for lower than 0.17% of the execution time of the normal applications. The energy consumption overhead of the proposed detection is only about 0.039% of the total energy consumption of the whole system. In general, the runtime overhead in terms of cycle count and energy consumption of our proposed detection method is fairly low, given its high detection accuracy.
Besides, the global manager (running a thread that has root privilege) needs to broadcast a control packet to other cores to initiate the detections in parallel. With a 2-pipeline-stage router architecture, when n c 100, the communication overhead is lower than 200 clock cycles, which is negligible compared to execution times of most applications.
In terms of storage overhead, each logical core needs to store a copy of the weights of the neural network with each weight in double precision (64 bits). Altogether, when n c 100, the storage overhead is lower than 3.936 MB (i.e., 100 Â 64 Â 4920 Ä 8 Ä 10 6 ), which is considered fairly low.

Experimental Setup
To demonstrate diverse applicability of our approach, we have considered two sets of experiment, and they are performed on respective 2D and 3D many-core systems. In these many-core systems, each tile is composed by a processor core, memory units (L1 I/D caches and an L2 cache bank), network interface (NI), and a router, as shown in Fig. 10. Tiles are connected by networks-on-chip. All the experiments are either run on a many-core simulator, Sniper-v7.2 [19], or directly run on two real machines featuring a 2D multi-core system.
In the simulator, to dynamically generate temperatures for all the cores, McPAT-v1.0 [21] and Hotspot-v6.0 [22] are adopted as the power and thermal models, respectively. The temperatures from TCC cores are deemed as TCC signals while the temperatures generated by normal applications made of a few benchmarks from PARSEC [14] and SPLASH-2 [23] are treated as the thermal noise. The detailed configurations of Sniper, Hotspot, TCC programs, and real machines are tabulated in Table 2.

Experimental Configurations of the Many-Core Systems
As for a 2D 1-hop channel, two physically separated cores form a TCC pair, one as the transmitter and the other as the receiver, while all the other cores are running legitimate threads from the selected benchmarks. Specifically, each physical core runs two simultaneous multithreading (SMT) threads, and both the transmitter core and the receiver core run a TCC thread and a thread spawned by the Fig. 10. Examples of (a) 2D and (b) 3D many-core systems.
benchmarks. As for a 0-hop channel, the two logical cores sitting in the same physical core are able to run the transmitter and receiver programs.
As for a 3D many-core system where its floorplan follows the one used in [24], the receiver core of a 1-hop channel is right below the transmitter core, while the remaining configurations are set to be the same as those of the 2D manycore case. In a 3D many-core system, the vertical layers are connected by the TSV's (Through-Silicon-Vias).
For real machines with 2D multi-core chips adopted in this study, one has a quad-core eight-thread Intel Core i7-7700HQ processor clocked at 2.8 GHz, and the other has a dual-core four-thread Intel Core i7-6700U processor clocked at 2.7 GHz. We fix the fan speed to the maximum and let other cores sleep, and only the transmitter core and receiver core are active, as the case in [2], and all the other cores are set to sleep.
In the real machines with a coarse-grained sensor resolution of 1 C, since a 0-hop channel does not need to transfer heat between two neighbor cores, the transmission frequency of a 0-hop channel can be much higher than that of 1-hop channels. That is, the upper bound of transmission frequencies for 0-hop channels and 1-hop channels are experimentally found to be 100 Hz and 20 Hz, respectively.
In the simulations, the precision of temperatures is set to be 1 C [10] and 0.12 C [12] for simulations in 3D and 2D many-core systems, respectively. Note that the thermal correlation in the vertical direction is higher than that in the horizontal, thus, the vertically placed 3D 1-hop channels are more efficient than those channels in a 2D many-core system.

Experimental Scenarios of the Many-Core Systems
The TCC communications are point to point in nature. For each experiment, packets are transmitted randomly for 1,000 times and the result is then averaged. Effectiveness of a TCC attack is measured in terms of the PER, which is defined as follows.
where N is the total number of packets transmitted, and N e is the number of packets failed to be correctly recognized. Note that when a few bits of a control packet (e.g., the connection request packet REQ) are flipped, say 1 in 5 bits, the bit error rate (BER) in this case is 20%; but since the packet cannot be recovered, the PER is actually 100%. The experiments mainly include the following scenarios: Measuring the PERs of the improved stealthy TCC communications under different system sizes when threshold-based detection and DVFS-based countermeasure [5] are applied. Measuring the detection accuracy when applying the proposed pattern-classification-based detection method.
Measuring the PER of the improved stealthy TCC communication when the proposed pattern-classification-based detection and DVFS-based countermeasure [5] are in place.
Evaluating the performance loss of legitimate applications when exploiting the proposed detection and DVFS control to the transmitter core. Evaluating the performance loss of legitimate applications when exploiting DVFS control to the transmitter core.

Finding the Value of a for the Improved Stealthy TCC
The improved stealthy TCC not only needs to circumvent the threshold-based detection, but also needs to ensure a low PER (e.g., < 10%). To better measure how well our proposed detection method is, the TCC with the best stealthiness is first tested in a 1-hop channel. Fig. 11 shows the PERs of a 1-hop TCC under different a's with and without the threshold-based detection [5] and DVFS-based countermeasure [5] applied. When the threshold-based detection is in place, the DVFS-based countermeasure is applied to the detected CPU cores. As a decreases from 1 to 1/9, the PER under the threshold-based detection decreases significantly, dropping from 82% to 8.5%. When a is 1/12 or even smaller, the PER of TCC reaches an unacceptably high level (i.e., > 88%), since the signal noise rate (SNR) is too low to sustain a TCC communication. Therefore, the best a is set to be 1/9 for an improved stealthy TCC.

Finding the Parameters of the Neural Network Model
In order to find the parameters and train the neural network model for TCC detection, we collect 350,000 samples of IPC signals that are from the TCC programs running with different a values (e.g., from 1/12 to 1), encoding schemes, transmission frequencies and packet bits. We also collect 350,000 samples of IPC noise that are from the legitimate applications. The collected data samples are split among the training (5/7 of the samples), test (1/7 of the samples), and validation (1/7 of the samples) sets.
The parameters (weights of the edges) of the neural network are automatically learned by the gradient-descentbased training [25], while the hyper-parameters are manually tuned offline to get a model with better generalization ability. Besides detection accuracy, another important consideration is the network complexity in terms of the number of layers and/or neurons, as complexity of a neural network should be preferably kept low to reduce runtime overhead. In this case, various neural network models described in Section 4 are built with varying complexities and experimentally compared for their detection accuracy.
The validation accuracies, defined as the sum of true positive rate and true negative rate, of different neural network models on the validation set are reported in Table 3. All the models in Table 3 have 491 inputs and 1 output. Model A, whose inference time is 984 clock cycles, is a two-layer neural network model with 2 nodes in its middle layer. The numbers of nodes, layers, and inference times of the other neural network modes can be found in Table 3. Model E has two middle layers, whereas they have 10 and 2 nodes, respectively. One can see that model C achieves a high validation accuracy with moderate inference time. Therefore, model C with 10 nodes in its middle layer is adopted in the following experiments.

Results of the Improved Stealthy TCC Attacks
To measure how well the improved stealthy TCCs (described in Section 3.1) can be detected by the thresholdbased detection, we run another experiment. Once a TCC core is detected by the threshold-based detection [5], the DVFS-based countermeasure [5] is applied to that core to block the communication. The average PERs of experiments from the simulations and measurements of the two real machines are shown in Fig. 12 and Table 4, respectively.
As for the simulations, the TCCs can work with a transmission frequency of 100 Hz and a CPU running at 2,500 MHz. From Fig. 12, one can see that the average PERs of TCC transmission in the 2D and 3D many-core systems of different sizes are all below 8.5%, which means that the improved stealthy TCC can barely be detected using the threshold-based detection and thus most of the time, the DVFS-based countermeasure is not triggered.
A similar result obtained for the TCCs running in the two real machines is shown in Table 4. The average PERs of a 0hop channel and a 1-hop channel are both lower than 8%, which means the threshold-based detection method [5] can not detect the improved stealthy TCC.
In a simple word, as indicated by both simulation results and real machine measurements, the improved stealthy TCC certainly poses a serious threat to any system, when threshold-based detection method [5] be applied.

Evaluation of the Proposed Detection Scheme
In the end of each detection cycle, if a TCC attack is detected by the global manager, the CPU core associated with the TCC will be located. The accuracy is the combination of true positive rate and true negative rate of detection. Fig. 11. The average PER results of 1-hop TCC communications with different a's with and without the threshold-based detection [5] and the DVFS-based countermeasure [5] applied.
where P detected are the positions (core ID) of the detected cores, P transmitter are the actual positions of the transmitter cores, P not are the positions of the cores that are not detected as TCC cores, and P normal are the positions of the cores that are not running TCC transmitter threads.
To measure the average detection accuracies of the threshold-based detection and the proposed detection, both detection methods are respectively adopted in 100,000 experiments grouped as 1,000 sets. Each experiment involves 4 logical cores, with 2 logical cores running the transmitter and receiver programs of TCC, and the other 2 logical cores running normal threads (threads from PAR-SEC [14] or SPLASH2 [23]).
From Fig. 13, one can see that when applying the proposed detection method, the average detection accuracies of the baseline TCCs and the improved stealthy TCCs are at 99% (i.e., 50% of true positive rate and 49% of true negative rate). In sharp contrast, although the threshold-based detection method works reasonably well to detect the baseline TCCs with an accuracy about 96%,the accuracy drops to only about 45% (i.e., 0% of true positive rate and 45% of true negative rate) when the improved stealthy TCC is present.
From Fig. 14, in the two real machines, our proposed detection method can effectively detect both the baseline and improved stealthy TCCs with average accurary of higher than 95%. In contrast, the threshold-based detection method in [5] fails to detect the improved stealthy TCC with a detection accuracy of lower than 50%.
The false positive rates of the proposed detection method and the threshold-based detection method [5] are shown in Table 5. One can see that when the threshold-based detection method [5] is applied, the false positive rate of the improved stealthy TCC is unacceptably high at 49%. But with our proposed detection method applied, the false positive rates of detecting both the baseline TCC and improved stealthy TCC are very low.
Therefore, by using the proposed pattern-classificationbased detection strategy, we can almost always identify a TCC attack, should it ever exist, and correspondingly, the location(s) of the transmitter core(s) can be accurately determined.
Once a TCC attack (including the baseline TCC and the improved stealthy TCC) is detected, the frequency level of the TCC transmitter core is changed by the DVFS-based countermeasure proposed in [5]. As shown in Table 6 (averaged measurements in the two real machines) and Fig. 15 (simulation results), with our proposed pattern-classification-based detection and DVFS-based countermeasure, the average PERs of the baseline TCCs and the improved stealthy TCCs are all higher than 75%. Such a high PER (i.e., > 70%) really denies any meaningful communications in practice; that is, our proposed detection with the DVFSbased countermeasure can effectively shut down TCC attacks.

Average Performance Loss
Scaling down the V/F level of a physical core running the TCC transmitter program will also negatively impact performance of a legitimate logical core. Fortunately, we do not need to apply DVFS-based countermeasure to the TCC transmitter core all the time since the TCC programs return to be inactive after finishing transmission. We denote t as the ratio of the time of TCC being inactive to the time of TCC being active. In practice, t is set to be much higher than 4 [5].
We assume that a TCC thread may share the same physical core with a thread of a legitimate application. The performance loss (PL) of a legitimate application is 0%, if DVFS Fig. 12. The average PERs of the improved stealthy TCC under different system configurations and scenarios when the threshold-based detection [5] and DVFS-based countermeasure [5] are applied together.  is not applied to block TCC. When DVFS is applied to the physical cores participating in a TCC, the performance loss of that legitimate application is calculated by As shown in Fig. 16, we compare the average PLs under the threshold-based detection and the proposed detection with the DVFS-based countermeasure and t set to be 4. The true positive and false positive rates of the threshold-based detection are 40% and 20%, respectively, as given in Section 3.2. The true positive and false positive rates of the proposed detection are 50% and 1%, respectively, as given in Section 5.4.2. One can see that with the false positive rate of 20% achieved by the threshold-based detection (see Section 3.2) and thus DVFS is excessively applied, the PL of the thresholdbased detection and DVFS is at least 3Â higher than the PL of the proposed detection method. For a large manycore system (e.g., number of cores ! 8 Â 8), the PL of the proposed detection and DVFS is lower than 2%, which is considered very low in any practical sense. Table 7 compares our proposed detection and defense method with related countermeasures. As in Table 7, the task migration-based method [3] and pre-heating method [4] can detect neither the baseline TCC nor the improved stealthy TCC, as these two methods do not involve detection. The threshold-base detection method [5] fails to detect the improved stealthy TCC with the reason stated in Section 2.2. In contrast, our proposed detection method can detect both the baseline TCC and the improved stealthy TCCs.

CONCLUSION
In this paper, a pattern-classification-based detection was proposed to fight against improved stealthy TCC which employs reduced signal amplitude that fails the threshold-based detection methods. This proposed pattern-classification-based detection can achieve a detection accuracy of 99% for both the baseline TCC and improved stealthy TCC. After applying the DVFS-based countermeasure in the detected CPU cores,     the PERs of both the baseline TCC and the improved stealthy TCC are higher than 70%, but at a low runtime overhead ( < 0:187%) and low energy overhead ( < 0:072%). With its low complexity and overhead, the proposed detection and DVFS-based countermeasure are able to work seamlessly together to thwart any known TCC attacks.
Amit Kumar Singh (Member, IEEE) received the BTech degree from IIT, Dhanbad, India, in 2006, and the PhD degree from Nanyang Technological University (NTU), Singapore, in 2013. He is currently an associate professor with the University of Essex, U.K. He has a post-doctoral research experience for more than five years with several reputed universities. His current research interests are design and optimisation of multi-core based computing systems with focus on performance, energy, temperature, reliability and security. He has published more than 110 papers in reputed journals/conferences, and received several best paper awards, e.g., Letian Huang received the MS and PhD degrees in communication and information system from the University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2009 and 2016, respectively. He is an Associate Professor with UESTC. His scientific work contains more than 40 publications including book chapters, journal articles and conference papers. His research interests include heterogeneous multicore system-on-chips, network-on-chips, and mixed signal IC design.