DeMAC: Towards Detecting Model Poisoning Attacks in Federated Learning System

Federated learning (FL) is an efficient distributed machine learning paradigm for the collaborative training of neural network models by many clients with the assistance of a central server. Currently, the main challenging issue is that malicious clients can send poisoned model updates to the central server, making FL vulnerable to model poisoning attacks. In this paper, we propose a new system named DeMAC to improve the detection and defence against model poisoning attacks by malicious clients. The main idea behind the new system is based on an observation that, as malicious clients need to reduce the poisoning task learning loss, there will be obvious increases in the norm of gradients. We define a metric called GradScore to measure this norm of clients. It is shown through experiments that the GradScores of malicious and benign clients are distinguishable in all training stages. Therefore, DeMAC can detect malicious clientsbymeasuringtheGradScoreofclients.Furthermore,ahistoricalrecordofthecontributedglobalmodelupdatesisutilizedtoenhancetheDeMACsystem,whichcanspontaneouslydetect maliciousbehaviourswithoutrequiringmanualsettings.ExperimentresultsovertwobenchmarkdatasetsshowthatDeMACcanreducetheattacksuccessrateundervariousattackstrategies.In addition,DeMACcaneliminatemodelpoisoningattacksunderheterogeneousenvironments.


Introduction
Federated learning is a distributed machine learning paradigm [1][2] [3].Each client trains a model locally, and then all local model updates are aggregated by a central server to derive a global model.This process is repeated multiple times, and the accuracy of the global model on the main task is gradually improved.FL offers efficiency and scalability compared to centralised training since many clients execute the training in parallel [4].In particular, FL provides client's privacy as they can keep their training dataset locally [1] rather than sharing it with other participants.Such a secure aggregation mechanism complies with General Data Protection Regulation (GDPR) [5] and protects clients against privacy leakage attacks.Due to FL's potential functions of privacy protection, it has been deployed in the real world.For example, Android Gboard [6] has been installed with FL for next-word prediction.In finance, WeBank [7] has applied FL for credit risk prediction.FL has been widely used in pharmaceutical companies for drug discovery in MELLODDY project [8].
A major challenge faced by FL is that it leaves the door open for malicious clients.A FL system is vulnerable to model poisoning, especially backdoor attack that may insert backdoors into the trained global models [9].The backdoors can make the global machine-learning model misclassify a small set of samples with chosen triggers into targeted labels, while the backdoored global model can show good performance on both main and backdoor tasks.
Existing defences such as [10][11] [12] propose Byzantine-tolerant aggregation rules and remove statistical outliers by comparing client local model updates.However, these previous works make some assumptions, such as the data distribution should be IID (Independent and Identically Distributed), which is not valid in non-IID setting [10].[12] relies on the aggregation of updates using the geometric median, not the standard arithmetic mean aggregation.However, our work shows that this method can be bypassed by malicious clients who carefully design their poisoned updates.
In view of the above drawbacks of the existing works, we propose a system (called DeMAC) to detect and defend against model poisoning attacks from malicious clients.Our design relies on the key finding that genuine clients train their model updates following the main federated training task and their local benign datasets, while malicious clients will craft their local model updates trained on the poisoning task and poisoned datasets.To succeed in the poisoning attacks, the adversaries should increase the number of poisoned samples to decrease the training loss of the poisoning task.Therefore, the 2-norm of gradients of the poisoned local model updates will be increased.Although the adversaries can reduce the number of poisoned samples and decrease the deviations from benign models' norm of gradients, this may cause the poisoning task to fail.Hence, to measure the 2-norm of gradients and capture the abnormal changes, we define a new metric which is called GradScore as the 2-norm of gradients in the last layer of the client model updates after the first local epoch training.Using GradScore DeMAC can effectively detect potential malicious clients with abnormally large GradScore values.When the global model training converges, the loss and the norm of gradients of the main federated training task will be small.If there are poisoning attacks in this training stage, the difference between model GradScores of genuine clients and malicious clients will be more obvious.Therefore, DeMAC can easily detect and mitigate malicious clients.As for poisoning attacks starting from an early stage, our experiments also show that DeMAC can work effectively.
Furthermore, to improve the defence performance, a historical record of the contributed global model updates is utilised in DeMAC to help spontaneously estimate the convergence trend of the global model and determine the time to start detecting malicious behaviours.This historical record will store a list of variables within a flexible look-back window size.The variables are the absolute values of two adjacent accuracy values of the global model on the validation set.Once the maximum value among this historical record is below the default threshold, DeMAC would be triggered and start to detect.
We evaluate DeMAC on two benchmark datasets and model-replacement attack [9], distributed attack [13], scalingscale attack and muti-poisoning attack [14].Experiment results show that DeMAC can effectively mitigate modelreplacement, distributed, and scaling-scale attacks.When malicious clients participate in every training iteration and insert perturbations, the Attack Success Rate (ASR) of the baseline algorithms increases gradually while the ASR of DeMAC keeps at a low level.Therefore, DeMAC can effectively suppress the propagation errors.In brief, our contributions include: 1) We propose DeMAC, a defence system to detect malicious clients and defend against model poisoning attacks via checking the abnormal model updates from potential malicious clients.2) We utilize the historical record in DeMAC for defence against malicious attacks, which can spontaneously detect malicious clients without manual settings; 3) We extensively evaluate DeMAC by experiments against multiple model poisoning attacks and backdoor attacks on benchmark datasets, which shows high efficiency in defence against malicious attacks in both early and late training stages and significant performance improvement over the existing baseline methods.
The remainder of our paper is organized as follows.Section 2 discusses the research related to poisoning attack and defence.In section 3, we introduce the system and threat models with specific descriptions of adversaries, objectives and requirements for attacks and defences.In Section 4 and 5, we present our novel DeMAC system to defend against model poisoning attacks.We present the evaluation setup in Section 6.In Section 7, the evaluation results of DeMAC are presented.Section 8 concludes the paper and presents our future work.

Byzantine-robust federated learning methods
The principle of existing Byzantine-robust defences [10][11] is to train a global model with high performance, even if there are some malicious clients.
Krum [10] tries to find a representative model update as the aggregated model update.Suppose there are  local clients in every iteration.And  among these local clients is malicious.The score for the th client is calculated as , where Γ ,−−2 is the set of  −  − 2 local clients that have the smallest Euclidean distance to w  .So the representative model update is the one that has the smallest score.This representative model update will be the global model for the next iteration.Krum attempts to limit the iteration between poisoned and clean models in a single iteration.However, it does not consider compound propagation errors [25].Therefore, the iterative nature of learning ensures that small deviations at the start of training compound exponentially.
Median [11] is a coordinate-wise aggregation rule.The coordinate-wise median of sorted local models is selected as the aggregated global model update.Instead of using the mean value among local clients, this aggregation rule considers the coordinate median value of the parameters as the corresponding parameter in the global model for the next iteration.
Trimmed-mean [11] is another coordinate-wise aggregation rule.Given the trim parameter  <  2 , the server removes the  maximum and minimum values of the coordinates in client model updates and then computes the mean of the remaining −2 values as the corresponding parameter in the global model for the next iteration.Trimmed-mean relies on the assumption that the coordinate of the attacker would either be the minimum or the maximum value of the corresponding parameters.However, this assumption does not hold for model poisoning attacks [25].Therefore, even a single attacker can compromise Trimmed-mean.
RFA [12] is a robust aggregation rule based on similarity metrics.RFA aggregates the model updates and appears robust to outliers by replacing the weighted arithmetic mean in the aggregation process with an approximate geometric median.Model-replacement attack [9] is more easily detected by RFA due to the scaling operation [12].However, by strictly controlling the total weights of the outliers with only a few attackers poisoning a small set in every batch, the attacker model updates can have lower distances and can be assigned higher aggregation weights [13].By doing so, the attackers can bypass RFA and perform a successful backdoor attack.

Anomaly detection-based methods
Many existing defences [20][21] [22][23] [24] follow an anomaly-detection-based strategy and exclude anomalous model updates.FoolsGold [20] defines indicative features.By measuring the cosine similarity on the indicative features and checking the Sybil clones, Sybil attacks can be detected in no-IID data scenarios as Sybils have highly similar updates.However, FoolsGold shows poor performance on one Sybil attack scenario.FLAME [21] uses similar detecting strategies by calculating the angular differences between all model updates.Rather than comparing the probabilities of global models, DeepSight [24] compares the local model updates with the previous global model.However, it does not work in no-IID scenarios.Auror [23] defines indicative features and finds that all the indicative features come from the final layer.Auror assumes that the indicative features from benign clients would have a similar distribution while the indicative features from malicious clients would have an anomalous distribution.However, it does not work in no-IID scenarios.

Background
In this section, we give some background knowledge of federated learning and attack strategies against federated learning systems.

Definition of symbols and corresponding descriptions
The overall definition of symbols and corresponding descriptions is listed in Tab.1.

Preliminaries
Here,  = {(  ,   )}  =1 denotes the training set on local devices, with input vectors  ∈ ℝ  and  ∈ {0, 1}  encoding labels.Each training sample in  is drawn from the unknown distribution ℤ.Local clients are assumed to have the same architecture neural network model in federated learning.For a chosen neural network model on clients, (w, ) = ( (w, )) denotes the probability vector of the neural network with activation function  and weights w ∈ ℝ  .For any probability vector , let (, ) denote the loss function.
For any local client, let w 0 , w 1 , w 2 , ..., w  be the client updates at iterations of SGD (Stochastic Gradient Descent). 0 ,  1 , ..,  −1 ⊆  of size M are mini-batches at one iteration.Here we have

System Setting
We assume that  clients train their local models before sending local updates to the central server.The central server combines these updates by using FedAvg [1].In addition, all the clients keep their data secret, and no client can intercept training or testing data.The optimization problem of FL is min   (), where  () =  ∼ℤ [((, ), )] is the expectation of the empirical loss ((, ), ) on the local training set  [26].
One iteration of FL training is shown below (see the left part in Fig. 1): • rules.The FedAvg [1] is given as: To simulate a non-IID distribution, we assign data to clients according to the Dirichlet distribution [27].

Attack Strategies
In this paper, we will focus on targeted model poisoning attacks.The adversary manipulates the local models w to obtain the compromised clients model w ′ before being aggregated into the global model  +1 .The adversary wants

Data poisoning
In this attack strategy (see the right part in Fig. 1), the adversary can only manipulate the training set on local clients by adding triggers into data samples or by changing the labels of a group of attacker-chosen data samples.By varying the Poisoned Data Rate (PDR), i.e.,   =      , the attacker can make a trade-off between attack impact and attack stealthiness.

Model Poisoning
In this attack strategy (see the right part in Fig. 1 ), the adversary can fully control a subset of the clients.Here, we denote the fraction of compromised clients as Poisoned Malicious Clients Rate (PMR), i.e.,   =   .To increase the attack's impact on the aggregated model, the adversary can deliberately modify the model updates before submitting them to the aggregator.This is done by (1) turning up the scaling factor  to increase attack impact (e.g., modelreplacement attack [9]) and ( 2) constraining the training process by setting the scaling-coefficient parameters  to evade anomaly detection (e.g., constrain-and-scale [9]).In this attack strategy, the adversary can create multi-objective optimization ((, ) + (1 − )  (, )).By tuning the scaling-coefficient parameter , the adversary can attack more stealthily.[25] firstly introduces the propagation error.Suppose clients conduct a protocol  = (, , ) at global iteration  ∈ [ ].Here, (  , , ) is a gradient oracle that inputs the th global model   , local dataset  and outputs the updated weights w  .Malicious clients conduct a poisoned protocol  * = ( * , , ) with  * (  ,   , ).For any round  and any global model   and any dataset , we have  * (  ,   , ) = (  , , ) +  with ‖‖ 1 ≤ .At each iteration , the upper bound  on  gives the additive error introduced by poisoning.Small additive errors introduced at early iterations can build upon each other and create large divergences.This is referred to as propagation error.

Propagation Error
In this work, we design a multi-poisoning attack to instantiate such propagation errors.In this attack strategy, the adversarial clients perform model poisoning or data poisoning attacks at every iteration.The adversarial clients can vary the PDR or the PMR.The upper bound  would vary with different PDR or PMR.

Characterization of Model Poisoning Attacks
To illustrate various model poisoning attacks more visually, we use a two-dimensional representation of the weight vectors of models.Then each model can be characterized by two factors: direction and magnitude.The cosine distance of the weight vectors can measure the direction between the two given models.The  2 norm of the distance between the weight vectors can measure their magnitude difference.).This type of attack is achieved by four distributed attacks (Distributed Backdoor Attack (DBA) [13]).Poisoned models trained by distributed attacks are more undetectable in direction and magnitude than centralised attacks.The , which has a less direction deviation but a larger magnitude difference.Such poisoned client models can be obtained by boosting the poisoned models with a large scaling factor  (model-replacement attack [9]).The last type ⃖⃖⃖⃖ ⃗ w ′ 4 has similar representations with genuine models.It is more stealthy compared to the first three types.This poisoned model can be crafted by constrain-and-scale attacks [9].

DeMAC Design Principle and key observation
In this section, we introduce our proposed approach, DeMAC.First of all, we give a novel scoring method, GradScore.We analyze that the PDR directly impacts the value of GradScore of a poisoned model.Then we describe how to detect malicious clients by evaluating the corresponding GradScore value in federated learning.Meanwhile, we analyze that based on GradScore, DeMAC can detect model poisoning attacks no matter the data distribution among clients.Finally, we give a security analysis that the proposed scoring method is not affected no matter how the adversary scales its model updates.

GradScore and analysis
Now, we give the definition of GradScore and analyze why and how this scoring method can be used for detecting poisoning attacks in FL.It can be seen from above that the contribution of a training sample (  ,   ) to the decrease of loss on other samples from the same minibatch can be quantified by Eq. ( 3).The value of ‖  (  ,   )‖ is the GradScore of a sample (  ,   ).Samples with large GradScore have a strong influence on learning.For poisoning training, poisoned samples have a stronger influence.To reduce the poisoning task training loss   (, ), malicious clients should increase the PDR.Therefore, the sum of the GradScore of samples is larger with higher PDR on malicious client datasets.In Fig. 3 (a)(b)(c), we evaluated this inference, running backdoor training on MNIST dataset with a minibatch of 64 samples.With the same pre-trained model, the model trained with a higher poisoned data rate causes an obvious decrease in the backdoor training loss and a higher GradScore value.
Now we analyze how this scoring method can detect the model poisoning attacks in federated learning.No matter what the data distribution is among clients, the deviations between local models and the global model start to cancel out, i.e.,∀w ∈ {w  }  =1 , w +1  −   ≈ 0 [9], in the benign setting, as the global model converges.Therefore, the updates of benign local models, w ≈ w +1 − w  is bounded.Therefore, ‖  (  ,   )‖ of one example from the benign dataset is small.The second observation is that when the global model starts to converge, poisoning behaviours on the malicious client will deviate the malicious updates from the current iteration global model [21] to reduce the training loss on the poisoning task.The GradScore of benign clients is small, while the GradScore of malicious clients is larger.Fig. 3 (d) shows that the GradScore of the last layer gradients of the malicious client model is larger than benign clients.Therefore, malicious clients can be detected by comparing the GradScore of the last layer of local models.
To avoid detection, the adversary can try weak model poisoning attacks by limiting scaling up the poisoned model.The Theorem 4.2 shows that GradScore is unaffected if the adversary scales its poisoned model. Therefore, holds.

Overview and Design of DeMAC
In this section, we instantiate DeMAC for deep inspection and analysis of model updates to discover model poisoning attacks.We describe the design details of DeMAC below.To solve this problem, we design a historical global model update record with a flexible look-back window size .This history record records the continuous variation of model accuracy on the validation set.When the maximum value among this history record is lower than a threshold value , the defence approach can start to process.In the rest of this section, we demonstrate a detailed description of every main component of DeMAC.Algorithm 1 outlines the procedure of DeMAC.

Identifying malicious behaviours
In designing DeMAC, the first step is identifying and measuring malicious behaviours in the federated learning system. = {(  ,   )}  =1 denotes the training dataset on client   .If the  value for a client is significantly higher than other client  values in the same global iteration round, it indicates that malicious behaviours might happen on this client.The step of calculating  is shown in line 7 of Algorithm 1.
To avoid benign clients being misidentified when the global model is not yet converging, we demonstrate a validation phase to monitor the convergent trend.This validation phase consists of recording a group of previous global models and measuring the distance between the validation accuracies of two neighbouring global models.In the first step, we design a historical global model record, ℎ( 0 , ...,   ), where ( 0 , ...,   ) refers to a list of previous global models, and  is the size of the sliding window.In the second step, we define the distance between neighbouring global model validation accuracies as: Where (  ) is the accuracy of global model   on the validation dataset.We use a list  ( 0 ), ...,  ( −1 ), to contain neighboring validation variations of a historical record, ℎ( 0 , ...,   ).The related steps are shown in lines 2-3 of Algorithm 1.If the maximal  (  ) ∈ [ ( 0 ), ...,  ( −1 )] is below the threshold , the global model can be regarded as convergence.
When the aggregator detects unusually large  values sent by some clients and the global model converges, malicious behaviours can be identified in the federated system.

Pruning and excluding malicious clients
After malicious behaviours are identified, the next step in DeMAC is to identify and exclude anomalous clients based on corresponding  values.First, the  to corresponding clients are sorted in ascending order.The top  clients with the highest scores are pruned and excluded from the benign client list.The parameter  depends on the number of anomalous clients in one global iteration.Only one malicious client should be excluded when the adversary takes model-replacement attack strategies.However, malicious clients may collude and strengthen the impact of poisoning in one iteration.Considering the real-world federated learning deployments, it is unrealistic to assume that the fraction of malicious clients is above the range (0 <  < ∕2).In an application scenario like Gboard [6], over 50% malicious clients mean the adversary should control at least 500 million Android devices.That is incredible [28].So we only consider below 50% cases.Generally,  is set to 0.5.In this work, we assume the server has a knowledge of the number of malicious clients at one iteration.Hence, the server can decide the value of .The sorting and pruning step is shown in lines 9-11 of Algorithm 1.
The aggregator excludes the updates sent by malicious users in the current iteration and trains the global model on the remaining model updates (line 16 of Algorithm 1).The global training algorithm varies based on the underlying training algorithm used in the application.We use FedAvg [1] to train the global model in this proposed work.

Evaluation Setup
In this section, we give the details of the experimental setup and evaluation metrics used in this work for evaluating the effectiveness of DeMAC.

Experimental Setup
Datasets and global-model settings: In this work, two well-known benchmark datasets MNIST [29] and CIFAR10 [30]   according to Dirichlet distribution with concentration parameter , so client datasets are unbalanced to classes.The distribution is more concentrated when the value of  is smaller.On the contrary, the distribution tends to be more uniform.Without Dirichlet sampling, data are uniformly distributed (IID) to clients.Unless otherwise mentioned, the concentration parameter  is set to 0.5.For MNIST dataset, a four-layer Convolutional Neural Network (see Tab.2) is used.For CIFAR10, ResNet18 [31] architecture is considered the global model.
Federated Learning settings: FedAvg [1] is considered as the FL method.In each global round, 10 of 100 clients are randomly selected.Considering the different characteristics of the datasets, we adopt the following parameter settings for federated training: for MNIST, clients train for 1 local epoch with a local learning rate of 0.1.For CIFAR10, clients train for 2 local epochs with a local learning rate of 0.1.
(1) Model-replacement attacks, constrain-and-scale and DBA.In the case of MNIST, we modify the pixels of the digital image at training time, causing the images with pixel-pattern to be classified towards a target class.On CIFAR10 dataset, we apply the same attack strategy as on MNIST dataset.The attackers can set the scaling parameter  for single-shot and DBA to control the impact of model poisoning.Unless otherwise mentioned, we set  to 30, and PDR is set to 30∕64 with a local batch size of 64.For single-shot model replacement attacks and constrain-and-scale attacks, we assume that attackers perform attacks after 60 rounds for MNIST and 400 rounds for CIFAR10.For DBA, as attackers split triggers into four equal parts, it is assumed that malicious clients attack at round 62, 64, 66, and 68 for MNIST and at round 402, 404, 406, and 408 for CIFAR10.
(2) Multi-poisoning attacks.In the case of MNIST, adversarial clients perform the multi-poisoning attack at round 10 and 20, respectively.In the case of CIFAR10, adversarial clients perform the multi-poisoning attack at round 80 and 300, respectively.Unlike the three types of attacks described above, the multi-poisoning attack executes every round after being performed.We set  to 1, and the PDR is set to 30∕64.
Detecting time.
(1) Single-shot model-replacement attack, constrain-and-scale and DBA.Unlike existing works [22][14] manually setting detecting time, DeMAC can spontaneously detect malicious clients according to the information provided by the historical record.We set the sliding window size  to 15 for MNIST and 20 for CIFAR10.DeMAC will be triggered when the maximum distance between neighbouring global model accuracies on the validation set  (  ) is below the predefined threshold .We choose  = 0.5 for MNIST and  = 2 for CIFAR10.
(2) Multi-poisoning attacks.Our experiments show that the increase in global model accuracy has an obvious oscillation rather than increasing monotonically during the early training stage.Hence, the predefined threshold  is looser than the above setting in this case.

Evaluation Metrics
We consider two evaluation metrics for evaluating the accuracy and efficiency of DeMAC.Main task accuracy (MA) is used to evaluate the accuracy of the global model on the main task.MA is the ratio of testing examples that are correctly classified.Backdoor Accuracy (BA) or Attack Success Rate (ASR) is the ratio of poisoned examples that are classified as target labels by the global model.We define an evaluation metric for measuring the performance of DeMAC.Computation cost per round (CCR) measures the computation cost at one round in the Byzantine-robust FL system.

Evaluation Results
Efficiency of history record.Detecting/Attacking timing is rarely discussed in previous defence works.[22] [9] discussed that the impact of model-replacement attacks in early rounds is not durable as the ASR decreases sharply within several rounds.The poisoning impact tends to stay long in the later training rounds.The simple way is to detect when the global model starts training.However, it is not cost-friendly to start detecting from train-from-scratch to defend against poisoning attacks, such as model-replacement attacks.To solve this problem, we combine DeMAC with a historical record.By applying this historical global model record, DeMAC can track the convergence of global training.Fig. 5(a)(b) shows that DeMAC will be triggered when the maximum _ (  ) is below the predefined threshold (pink area in Fig. 5(a Impact of the degree of non-IID: Fig. 6(a)(b)(c)(d) shows the impact of the non-IID degree on DeMAC.First, from Fig. 6(a)(c), we observe that with PDR (30∕64) and threshold  fixed, the ASR can be reduced to nearly 0% when the non-IID degree is larger than some threshold.When  is set to 0.1 for CIFAR10 or  is set below 0.3 for MNIST, DeMAC cannot detect malicious clients, which causes a high ASR on the global model.In Fig. 6(b), we postpone the attacking time at round 600, when the global model stabilises.We observe that at the same PDR (30∕64), threshold  value (2) and non-IID setting ( = 0.1), DeMAC can mitigate poisoned updates and reduce backdoor accuracy to a low level compared with the failure of detection in Fig. 6(a).In Fig. 6(d), we set threshold  value to 0.6 rather than the default value 0.5 with non-IID setting( = 0.1).So, DeMAC would be triggered and start to detect malicious clients earlier.BA of the global model can be reduced to nearly 0% under all PDR settings.From the above analysis, it is not difficult to see that the success of model poisoning attacks is highly rated to the convergent trend of the global model.
Impact of Poisoned Data Rate (PDR): Fig. 6(e)(f)(g)(h) shows the impact of the poisoned data rate on DeMAC.In Fig. 6(e)(g), DeMAC can mitigate malicious client updates and reduce the ASR to a low level for both two datasets.In Fig. 6(f)(h), we evaluate the efficiency of DeMAC against distributed backdoor attacks.Fig. 6(f) shows that DeMAC cannot mitigate the last split backdoor attack when PDR is low.One possible reason is that compared with the central backdoor attack, distributed property of DBA makes attack behaviour more stealthy.In Fig. 6(h), DeMAC can decrease the attack impact under all the attack strategies.
Defending Anomaly-Evasion Attack: As discussed in section 3, attackers can balance the impact and stealth of attack by varying the scaling-coefficient parameter .  (, ) is calculated as the  2 norm between the current poisoned and round global models.Figure 6(i)(j)(k)(l) shows that DeMAC can successfully mitigate attack impact for both datasets and different  values.We also add some tables for corresponding to the results in the appendix 9.
Detecting multi-poisoning attacks: Fig. 7 and Fig. 8 show the comparison results on two datasets for detection methods, different attack timing, and different numbers of malicious clients.From our results, it is not difficult to see that malicious perturbations in every iteration can gradually compromise baseline Byzantine-robust FL algorithms and cause high ASR.DeMAC can effectively suppress such propagation errors.Here are several observations.Firstly, DeMAC can mitigate attack impact and reduce the ASR to a low level in most cases, except in the case (MNIST dataset, PMR(4/10), attack after ten rounds, Fig. 7  main reason could be that the detection methods are too hard to distinguish benign clients from many malicious clients, as the global model is unstable in the first ten rounds.Second, in the case (MNIST dataset, PMR(2/10), attack after ten rounds) and case (MNIST dataset, PMR(2/10), attack after 20 rounds), all the defending methods except RFA [12] can mitigate attack impact.In subsection 2.1, we discuss that by carefully setting the scaling factor  and then controlling the total weights of the outliers, the attacker can bypass RFA.We set the scaling factor  as 1 and PDR as 30∕64.RFA fails to detect malicious behaviours.This conclusion is in line with conclusions from previous work [13].DeMAC, trimmed_mean [11], and median achieve comparable main accuracy and outperform Krum [10].The main reason could be that Krum selects one client update to represent the global model.Therefore, due to the heterogeneous data distribution, these chosen model updates cannot achieve the same performance as on the global test dataset.This conclusion is in line with conclusions from previous work [25].Third, in cases (CIFAR10 dataset, attack after 80 rounds), all the defending methods except DeMAC fail to eliminate attack impact.In cases (CIFAR10 dataset, attack after 300 rounds), DeMAC, Krum, and RFA can reduce the ASR to a low level, but median and trimmed-mean still cannot defend attack behaviours.As we discuss in subsection 2.1, the assumption of trimmed-mean does not hold for model poisoning attacks.Therefore, this observation is in line with the discussion from prior sections.Fourth, in the case (CIFAR10 dataset, attack after 300 rounds), other defending methods can achieve comparable main accuracy as DeMAC.However, in the case (CIFAR10 dataset, attack after 80 rounds), the main accuracy of DeMAC outperforms other defending methods after 400 rounds.Performance Comparison: In this work, we define the CCR as the time for one iteration of training in the FL system equipped with the chosen defence method.We use two tables to show the comparison of the effectiveness of DeMAC with other Byzantine-robust methods on two different datasets.In these experiments, we take the multipoisoning attack strategy.The details of this attack strategy are described in section 6.1.Experimental Setup.In Table 3, RFA [12] is the most time-consuming method.Median [11] and trimmed-mean [11] are the most time-saving methods.DeMAC and Krum [10] show similar performance.

Table 4
The comparison of the effectiveness of DeMAC with other byzantine-robust methods on MNIST set

Conclusion
Backdoor attacks and, more specifically, model poisoning attacks are a big challenge faced by federated learning.To address the shortcomings of the existing defence approaches, we proposed a novel defence system, which is called DeMAC to defend against malicious attacks by measuring the difference in the contribution of benign clients and malicious clients to the global model.We defined a new metric GradScore, to compute the L2-norm of the gradients of the last layer of contributed model updates, which is shown to be effective in detecting updates from malicious clients.Furthermore, we utilized the history records of the contributed model updates to enhance the malicious client detection performance.We evaluated and compared DeMAC with state-of-the-art defence techniques over various attack strategies and datasets.Experiment results show that DeMAC can effectively mitigate the model poisoning attacks without sacrificing the performance of the main task and significantly outperforms the existing defence approaches.Future research directions include extending the proposed method to defend against adaptive attacks based on well-known non-targeted model poisoning frameworks.This proposed method can also be combined with Byzantine-robust aggregation rules.We leave it for future work.
For the purpose of open access, the author has applied a CC BY licence to any Author Accepted Manuscript (AAM) arising from this submission.

Tables corresponding to Evaluation Results
Impact of the degree of non-IID: Tables 5, 6 show the impact of the non-IID degree on DeMAC.From Table 5, with other hyperparameters fixed, DeMAC can decrease the ASR to a very low level when  is larger than 0.1.When  is set to 0.1 and the attacking time is postponed to 600 round, DeMAC can mitigate the poisoned updates.Table 6 shows similar results.When threshold  is set from 0.5 up to 0.6, DeMAC is able to reduce the ASR to a low level.From above analysis, we can see that the success of model poisoning attacks is closely related to the convergence of FL training.

Step 1 : 2 :
Synchronizing the global model with local clients: The server sends the current global model   to the chosen clients.• Step Training local models: Each client initializes its local model as the global model   and trains a local model using its training set  = {(  ,   )}  =1 .The optimization problem of clients is minimizing ((w, ), ), where w is the local model.By using SGD, the client updates the local model as described in Eq. (1).Then the client sends its updates w +1  to the server.• Step 3: Aggregation: The server the updated global model via aggregating the local updates by some aggregation

Figure 1 :
Figure 1: On the left are the three steps of Federated Learning.On the right is the malicious Federated Learning

Figure 2 :
Figure 2: Weight vectors of genuine and poisoned models

Fig. 2
shows several types of poisoned models.The first poisoned client model ⃖⃖⃖⃖ ⃗ w ′ 1 is trained by adding a large fraction of poisoned data   into genuine dataset .⃖⃖⃖⃖ ⃗ w ′ 1 has an obvious direction deviation from the benign client models.The second type ⃖⃖⃖⃖ ⃗ w ′ 2 consist of four small vectors (

Figure 3 :
Figure 3: Impact of the PDR on loss value (a), Backdoor Accuracy (b) and GradScore value (c), the GradScore of the last layer gradients of the malicious updates and benign updates (d)

Fig. 4
Fig.4shows the main components and the workflow of DeMAC during global iteration .It follows a deterministic algorithm and does not know the attack strategies or data distributions.DeMAC is deployed during the training session before the testing phase.Firstly, it should identify malicious behaviours in federated learning systems.Here comes a design challenge.At the beginning of federated learning training, benign local models should update their parameters continually to make the global model converge to the global minimum.So how can we perceive the convergent trend?To solve this problem, we design a historical global model update record with a flexible look-back window size .This history record records the continuous variation of model accuracy on the validation set.When the maximum value among this history record is lower than a threshold value , the defence approach can start to process.

Figure 5 :
Figure 5: Measuring the maximum _  (  ) enables DeMAC to detect convergence of the global model (a)(c).DeMAC with history global model record vs DeMAC without history global model record (b)(d)

Figure 6 :
Figure 6: The ASR for DeMAC against model-replacement attack under different non-IID settings (a)(c).The ASR for DeMAC against model-replacement attack, where 0.1 of concentration parameter , threshold  value 2 for CIFAR10, and  value 0.6 for MNIST are used (b)(d).Impact of the poisoned data rate on DeMAC against model-replacement attack (e)(g) and distributed backdoor attack (f)(h).MA and BA of the global model under the protection of DeMAC against constrain-and-scale attack with different  values (i)(j)(k)(l).
)(b)).The DeMAC defence is performed only when the global model starts to stabilize.Fig.5(b)(d) shows the comparison between DeMAC with a historical global model record and DeMAC without a historical global model.It is not difficult to see that enabling DeMAC in early rounds may cause a delay in the convergence of the global model.It might be a drawback in federated learning deployments.In Fig.5(b) for MNIST, in the initial 20 rounds, DeMAC without a historical global model record shows a higher error rate than DeMAC with a historical global model record.The same result is shown in Fig.5(d).

Figure 7 :
Figure 7: ASR and MA of malicious detection for different detection methods.Concentration parameter (0.5),MNIST dataset and scaling parameter (1) are used.

First
Author et al.: Preprint submitted to Elsevier (, ) = ∇ w  ((w  , ), ) is the gradient of the loss for a training sample (, ).

Table 3
The comparison of the effectiveness of DeMAC with other Byzantine-robust methods on CIFAR set

Table 5 :
Impact of the degree of non-IID on CIFAR10

Table 6 :
Impact of the degree of non-IID on MNIST Impact of Poisoned Data Rate (PDR): Tables7, 8, 9, 10 show the impact of PDR on DeMAC.From Tables7, 8, DeMAC can effectively mitigate the malicious behaviours.From Table9, DeMAC cannot work well when PDR is set to 10∕64.The possible reason is that with few data samples being poisoned, DBA is too stealthy to be detected.

Table 7 :
Impact of Poisoned data rate (PDR) on CIFAR

Table 8 :
Impact of Poisoned data rate (PDR) on MNIST

Table 9 :
Impact of Poisoned data rate (PDR) on CIFAR for defending DBA

Table 10 :
Impact of 12isoned data rate (PDR) on MNIST for defending DBA Defending Anomaly-Evasion Attack:Tables 11,12show DeMAC can successfully mitigate attack impact for two datasets.