Cross-Session Emotion Recognition by Joint Label-Common and Label-Specific EEG Features Exploration

Since Electroencephalogram (EEG) is resistant to camouflage, it has been a reliable data source for objective emotion recognition. EEG is naturally multi-rhythm and multi-channel, based on which we can extract multiple features for further processing. In EEG-based emotion recognition, it is important to investigate whether there exist some common features shared by different emotional states, and the specific features associated with each emotional state. However, such fundamental problem is ignored by most of the existing studies. To this end, we propose a Joint label-Common and label-Specific Features Exploration (JCSFE) model for semi-supervised cross-session EEG emotion recognition in this paper. To be specific, JCSFE imposes the $\ell _{\text {2,1}}$ -norm on the projection matrix to explore the label-common EEG features and simultaneously the $\ell _{{1}}$ -norm is used to explore the label-specific EEG features. Besides, a graph regularization term is introduced to enforce the data local invariance property, i.e., similar EEG samples are encouraged to have the same emotional state. Results obtained from the SEED-IV and SEED-V emotional data sets experimentally demonstrate that JCSFE not only achieves superior emotion recognition performance in comparison with the state-of-the-art models but also provides us with a quantitative method to identify the label-common and label-specific EEG features in emotion recognition.


I. INTRODUCTION
E MOTION refer to people's psychological reactions to external stimuli or their own stimuli accompanied by physiological reactions [1]. Emotions have an important impact on the establishment and maintenance of interpersonal relationships [2], cognition [3], decision-making [4], and other interactive activities. Many mental disorders are closely related to emotions [5]; therefore, identifying the emotional state of people with emotional expression disorders is helpful to their treatment and healthcare. In past decades, emotion recognition has been attracting increasing attention from both academia and industry [6]. Compared with the traditional data modalities such as facial expressions, text, and speech, EEG can offer us more reliable emotion recognition results because it is originated from the neural activities of our central nervous system and is not easily camouflaged [7]. With the development of weak signal acquisition equipments and processing techniques, EEG has been widely used in multiple scenarios such as drowsiness estimation [8], rehabilitation engineering [9], and disease diagnosis [10]. In the present work, we put the emphasis on EEG emotion recognition [11].
Current studies in EEG emotion recognition mainly focused on two aspects. One is the feature extraction methods to characterize the statistics, frequency, and nonlinear characteristics of EEG data [12], [13]. Generally, the popular EEG features for emotion recognition are extracted from the time-, frequency-, time-frequency and spatial domains. The other focus is the feature transformation and recognition models [14]. Roughly, we can categorize these existing models into linear, kernel-based and neural networks-based nonlinear ones. They improved emotion recognition performance by diverse motivations such as enhancing the model robustness [15], distinguishing the different discriminative abilities of features [16], and minimizing the inter-subject variabilities [17]. Instead of using the handcrafted EEG features, sometimes raw EEG data is fed into deep learning models to simultaneously obtain the data representations and emotion recognition results. That is, the feature learning and classification are unified together to achieve the end-to-end EEG decoding [18].
The multi-channel and multi-rhythm properties of EEG provide us with abundant spatial and frequency information, based on which the extracted features are used for emotional state estimation. Based on the consensus that different EEG frequency bands and channels correlate differently to mental states [8], [19], different dimensions of a certain feature type (e.g., power spectra density or differential entropy) should also correlate differently to different types of emotional states. In pattern recognition, different features have different discriminative abilities in classifying the emotional states. Then, a fundamental problem in EEG emotion recognition is whether there exist some common features that are discriminative for all the involved emotional states. Accordingly, we also want to investigate whether there exist label-specific features that are discriminative only to a specific emotional state. However, this problem has not been fully studied yet within the community of EEG emotion recognition.
In this paper, we propose a new Joint label-Common and label-Specific Features Exploration (JCSFE) model for crosssession EEG emotion recognition, which is implemented based on the semi-supervised regression. Specifically, we impose the 2,1 -norm on the regression projection matrix to explore the label-common features by achieving row-sparsity; simultaneously, the 1 -norm is used to explore the label-specific features due to its isotropic property. Moreover, a graph regularizer is incorporated into JCSFE by enforcing the local invariance property of data. As a summary, the present work consists of the following contributions.
• We propose a new emotion recognition model by joint label-common and label-specific EEG features exploration, which are respectively achieved by imposing the 2,1 -norm and 1 -norm on the projection matrix in the semi-supervised regression. • As the secondary contribution, JCSFE incorporates the graph regularizer to enforce the local invariance property of data. Besides, an efficient optimization algorithm is proposed to optimize the JCSFE model objective whose convergence and complexity are analyzed. • On the emotion recognition performance, JCSFE not only obtains improved accuracy but also provides us with quantitative measurement of the EEG spatial-frequency activation patterns for emotion recognition from two perspectives, i.e., each feature in terms of all emotional states and each emotional state in terms of all features. We organize the rest of this paper as follows. Section II introduces the JCSFE model formulation and optimization. Comparative studies are conducted and the results are analyzed in section III. Discussions to clarify the connections as well as differences between JCSFE and some related models are provided in section IV. Section V concludes this paper and describes the potential future work.

A. Problem Definition
In this paper, matrices are denoted by blodface uppercase letters and vectors are written as bolaface lowercase letters. row, j -th column are respectively denoted as m i , m j . The boldface 1 m represents an all-one column vector whose length is m.
Generally, in semi-supervised EEG emotion recognition, we are often given an EEG data set X = [X l , X u ] ∈ R d×n , where X l ∈ R d×l is the labeled subset and X u ∈ R d×u is the unlabeled subset. Accordingly, Y l ∈ R l×c is the emotional state indicator matrix of these labeled samples. Here, d is the sample dimensionality, c is the number of emotional states, l and u are respectively the numbers of labeled and unlabeled samples (i.e., n = l + u). The i | l i=1 -th row of Y l , y i ∈ R 1×c , encodes the label information of sample x i ∈ R d as By defining Y u ∈ R u×c as the indicator matrix of the unlabeled EEG data and Y = [Y l ; Y u ] ∈ R n×c , our aim is to estimate Y u as accurately as possible given X and Y l . Below we use an example to illustrate the label-common and label-specific features in pattern classification. Suppose that we have a data matrix which contains two instances X = [x 1 , The corresponding label vectors are Y = [y 1 ; y 2 ]. The two elements in each label vector represents the probability of the corresponding instance belonging to the two classes, respectively. By fitting (X, Y) by a projection matrix W, we obtain one possible solution of W shown in Fig. 1. Through the non-zero values of the two columns of W, i.e., w 1 and w 2 , we know the specific features of each class. Specifically, w 1 = [1, 1, 1, 0, 0] T means that features f 1 , f 2 , f 3 determines the first class, while w 2 = [0, 0, 1, 1, 1] T indicates that features f 3 , f 4 , f 5 determines the second class. f 3 is the common feature for both classes.

B. JCSFE Model Formulation
In Fig. 2, we show the the overall framework of applying JCSFE into semi-supervised EEG emotion recognition task. The second stage is the JCSFE-based model learning, which is implemented under a semi-supervised regression framework due to its simplicity and effectiveness. The three components in JCSFE consist of the label-common and label-specific features mining, the consideration of data local invariance property by graph regularizer. Given a centered data matrix X and the label indicator matrix Y l , semi-supervised regression can be expressed by where C(W) defines some constraints on W to be described. The non-negative and row-normalization constraints make Y u essentially define the probabilities of a certain EEG sample to different emotional states, based on which we can directly determine the emotional state of each unlabeled EEG sample. For example, if the j | u j =1 -th row of the learned Y u is [0.12, 0.78, 0.04, 0.06], we accordingly annotate the emotional state of this sample as the second one.
On the label-common EEG features, they have common discriminative ability for all emotional states. Based on the feature selection (ranking) theory [20], [21], for the i | d i=1 -th feature, we can use the normalized 2 -norm of the i -th row of W (i.e., θ i ) to measure the extent of a feature to be a labelcommon one. Mathematically, a larger value of θ i represents that the i -th feature is more discriminative in classifying the emotional states. To this end, we impose the 2,1 -norm on W to achieve the label-common features exploration, which essentially enforces it to be row-sparse. Besides the labelcommon features, we consider that each emotional state might be additionally determined by several specific features of its own. Therefore, we use the 1 -norm regularization to select label-specific features, which enforces the projection matrix W to be element-wisely sparse. Currently, we achieve the following objective function where w i j expresses the discrimination of the i -th feature in terms of the j -th emotional state. That is, w i j = 0 means that the i -th feature is discriminative for recognizing the j -th emotional state. Then it is considered as a label-specific feature of the j -th emotional state. On the contrary, w i j = 0 means that it is useless for recognizing the j -th emotional state. In objective function (3), non-negative regularization parameters α and β are used to balance these impacts of the three terms, which respectively control the element-sparsity and row-sparsity of the projection matrix W in exploring the label-common and label-specific features.
Besides the label-common and label-specific features exploration, we additionally take the data connections into consideration which is inspired by the consensus that learning performance can be greatly improved if the data manifold is explored and utilized. Specifically, a k-nearest neighbor (KNN) graph is adopted to measure the pairwise correlations between EEG samples. Correspondingly, a similarity matrix S ∈ R n×n is built in which s i j characterizes the similarity between samples x i and x j . For simplicity, the '0-1' weighting scheme is used in this paper, based on which we define where N (x i ) contains the k-nearest neighbors of sample x i based on the Euclidean distance metric. The data local invariance property asks that if two samples x i and x j are similar in original data space, their representations in projected space should be also similar. This can be achieved by where the Laplacian matrix L can be calculated by D−S. D is a diagonal matrix, whose i -th diagonal element d ii is n j =1 s i j . By incorporating (5) into (3) as a regularizer, we finally achieve the JCSFE objective function as where γ is a newly introduced regularization parameter. F X T W is an intermediate variable to simplify the notations. Once the variables in objective function (6) are fitted by given EEG data, we can directly obtain the emotional state information of unlabeled samples by Y u . Moreover, based on the learned W, we can explore the label-common and labelspecific features by the form of analyzing the respective EEG spatial-frequency patterns in emotion recognition.

C. JCSFE Model Optimization
On the two variables W and Y u in the JCSFE objective function, we propose to optimize them in alternating manner. That is, we update one variable by fixing the other.
where Y = [Y l ; Y u ]. The above problem can be decoupled for each i ∈ {l + 1, l + 2, · · · , l + u}; therefore, we can optimize Y u in the row-wise manner. That is, for the i -th subproblem, we need to solve which is an Euclidean projection with a simplex constraint [22]. It can be optimized by the Lagrange multiplier method together with the Karush-Kuhn-Tucker (KKT) condition. Detailed derivations can be found in the supplementary material. W-step. Though objective function (6) is convex, it is not smooth due to the existence of the 2,1 -norm and the 1 -norm regularization terms. Therefore, we first relax W 2,1 as Tr(W T AW) to simplify the derivations [20] we employ the accelerated proximal gradient (APG) method to deal with the 1 -norm regularizer. The derivation to the updating rule of W is provided in the supplementary material.
The pseudo-code of the optimization procedure to JCSFE objective function is provided in Algorithm 1. Notation S is a soft-shrinkage operator to solve the 1 -norm regularized problem which is defined as where ε usually has a small positive value.

D. Complexity and Convergence Analysis
We analyze the time complexity of Algorithm 1 below. In the initialization step, the complexity of initializing W is O(nd 2 + d 3 + ndc + d 2 c). The complexity of calculating the sample similarity matrix by k-nearest neighbor is O(n 2 d).
In the main loop, the time cost is primarily dominated by calculating the gradient of f (W), which can be measured by When updating Y u , it occupies the complexity of O(uc). Considering the usual case of semisupervised EEG emotion recognition is n ≈ u > d c, we conclude that the overall complexity of optimizing JCSFE model objective function by Algorithm 1 is O(tn 2 d), where t is the number of iterations.
On the convergence property of JCSFE, we provide the analysis below. When row-wisely updating the label indicator matrix Y u by the Lagrange multiplier method, the involved multipliers are analytically determined, leading to its analytical solution. When updating the projection matrix W, the APG method is used whose convergence property has been extensively studied [23]. Therefore, we declare that the convergence of Algorithm 1 can be guaranteed.

Algorithm 1 The Optimization of JCSFE Objective Function
Input: Labeled EEG samples X l ∈ R d×l and the corresponding label indicator matrix Y l ∈ R l×c , unlabeled EEG samples X u ∈ R d×u , model parameters α, β and γ ; Output: The estimated label indicator matrix Y u ∈ R u×c . 1 Calculate the diagonal matrix A; 3: Calculate the similarity matrix S via (4); 4: Calculate L f = 3( XX T 2 2 + γ XLX T 2 2 + βA 2 2 ); 5: while not converged do 6:

E. Label-Common and Label-Specific Features Exploration
This section illustrates how to quantitatively measure a certain feature to be a common one in terms of all the emotional states, and a specific one to each of the emotional states, by the learned JCSFE model.
As shown in Fig. 3, each row of the projection matrix characterizes the discriminative ability of the corresponding feature in classifying all the involved emotional states. We use θ i as the quantitative importance measure of the i -th feature to be a label-common one. However, θ i | d i=1 is not explicitly learned by JCSFE, and only the 2,1 -norm was used to enforce the row-sparsity of the projection matrix. Inspired by the underlying rationality of the 2,1 -norm based feature autoweighting [24], for each feature dimension, we propose to use the normalized 2 -norm of the corresponding row of the projection matrix to serve as its quantitative importance. Specifically, the importance of the i -th EEG feature (i.e., θ i | d i=1 ) can be calculated by where w i is the i -th row of W and w i 2 is the 2 -norm of w i . Obviously, θ i s satisfy the non-negative and normalization constraints, i.e., θ i | d i=1 ≥ 0 and d i=1 θ i = 1. The larger value of θ i , the i -th EEG feature is considered to be more discriminative in distinguishing the emotional states. In other words, it should be regarded more as a label-common feature.
Intuitively, the 2,1 -norm based label-common feature exploration process is completed by investigating the elements of the projection matrix along the horizontal direction. Somewhat differently, the 1 -norm pursues the isotropic sparsity of the projection matrix by shrinking its elements in order to identify features which might be specific to a certain emotional state. Since the emotional label indicator matrix is arranged by one-hot encoding, the quantitative importance measure of a feature to be a label-specific one can be obtained by investigating the elements in each column of the projection matrix, which is along the vertical direction, as depicted by Fig. 3. For example, the feature importance descriptor to identify the j -th emotional state can be calculated as the normalized 1 -norm of each element in the j -th column; namely, where | · | is the absolute value operator. Essentially, equations (10) and (11) are equivalent because each row has only one element in this label-specific case.

A. Data Description
Two benchmark emotional EEG data set, SEED-IV and SEED-V, are used in the following experiments. We first describe the main properties of the SEED-IV data set and then point out the differences in SEED-V.
In SEED-IV, EEG data was collected from 15 subjects when they were watching the movie clips. 72 movie clips were carefully selected to evoke the four discrete emotional states, i.e., sad, fear, happy and neutral. In each of the three sessions, each subject was asked to watch 24 movie clips, among which six clips correspond to one emotional state. The EEG acquisition devices include the ESI Neuroscan system and a 62-electrode cap in compliance with the international 10-20 placement. When raw EEG data was recorded with a sampling frequency of 1000 Hz, it was first down-sampled to 200 Hz and then band-pass filtered to 1-50 Hz. In the following experiments, we use the differential entropy features which were extracted from five frequency bands, i.e., Delta (1-3 Hz), Theta (4-7 Hz), Alpha (8-13 Hz), Beta (14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) and Gamma (31-50 Hz). The sample vector was formed by concatenating the 62 values corresponding to each of the five frequency bands, leading to its dimensionality 310. There respectively have 851, 832 and 822 EEG samples in the three sessions.
SEED-V is also a video-evoked emotional EEG data set, which consists of five different types of emotional states. Specifically, SEED-V data set has one more state, disgust, in comparison with SEED-IV. 20 subjects participated the data collection experiments and the EEG data from 16 subjects was made public. In each session, three of the total 15 trials correspond to one emotional state. There are 681, 541 and 601 samples in the three sessions, respectively.

B. Experimental Setup
In the following experiments, we compare JCSFE with the several semi-supervised learning models including • Semi-supervised Support Vector Machine (ssSVM) with linear kernel. • Rescaled Linear Square Regression (RLSR) [21], which explicitly defines a feature importance descriptor in semisupervised regression to characterize the different contributions of features in classification. • Semi-supervised Linear Square Regression (ssLSR) and graph regularized ssLSR (LSRG). ssLSR is modified from RLSR, which has no feature auto-weighting ability. LSRG introduces a graph regularization into ssLSR. • Semi-supervised Feature Selection with Redundancy Minimization (SFSRM) [25], which penalizes the redundancy in feature selection by enforcing the strongly correlated features to be far apart in feature ranking. • Robust Discriminative Sparse Regression (RDSR) [26], in which the 2,1 -norm based sparse regression is used to enhance the robustness and the projection matrix is enforced to be row sparse for feature selection. • Semi-supervised Structured Manifold Learning (SSML) [27], which proposes to learn a structured graph to exploit the submanifold of both labeled and unlabeled data to solve the multimodality problem that samples in some classes lie in several separated clusters. • Sparse Discriminative Semi-Supervised Feature Selection (SDSSFS) [28], which improves RLSR by introducing the label dragging technique to maximize the margin between different classes. In terms of parameter setting, the relevant parameters in each model are uniformly tuned from the candidate values {2 −10 , 2 −9 , · · · , 2 10 }. The initialization of Y u in Algorithm 1 means that each sample has the same probability to all the emotional states. On the experimental paradigm, the subject-dependent cross-session EEG emotion recognition is employed. Since each subject has three different sessions in both SEED-IV and SEED-V, for each subject we consider only the three cross-session emotion recognition tasks in chronological order, i.e., session1-session2, session1-session3, and session2-session3. Taking the 'session1-session2' task as an example, EEG samples from the first session serve as the labeled ones but those from the second session are unlabeled. Accordingly, we should estimate the emotional states of these unlabeled EEG samples as accurately as possible.

C. Results and Analysis
In Tables I and II, we present the recognition accuracies of these compared models, where the bold number indicates the best result of that case. s1, s2, · · · , are the indices of subjects. These results provide us with the following insights.
• Obviously, JCSFE obtained the best performance among the nine compared models on average. The average accuracies of JCSFE in the three cross-session recognition tasks of SEED-IV are 80.78%, 78.55%, and 83.89%, which respectively outperform the runner-up model by 6.57%, 5.97% and 5.78%. Similarly, the average accuracies of JCSFE on the SEED-V data set are 81.90%, 81.65% and 81.33%, which also have 2%-5% improvements in comparison with the secondbest one. According to the obtained results, we generally conclude that jointly exploring the label-common and labelspecific EEG features is beneficial for improving the emotion recognition accuracy. Additionally, the local invariance property of data is also useful in JCSFE.
• The performance of ssSVM is generally worse than that of the remaining models. To be specific, its average accuracies on the SEED-IV data set are 57.06%, 58.08%, 63.77% and they are 62.04%, 58.94% and 62.29% on the SEED-V data set. From our point of view, the linear kernel in ssSVM is not effective enough in capturing the essence of emotional information in EEG. Similarly, the performance of SFSRM is also not satisfactory. First, SFSRM performs semi-supervised feature selection by considering the 2,1 -norm based labelcommon features only. Second, the label indicator matrix in SFSRM is real-valued which cannot explicitly characterize the label information and therefore cannot effectively guide the feature selection process.
• As stated in the experimental setting, RLSR takes the adaptive feature weighting into account while ssLSR does not. Such only difference made RLSR obtain superior performance to ssLSR. Taking SEED-IV as an example, we believe that the improvements of 2.62%, 2.47%, and 3.01% achieved by RLSR are brought from the adaptive learning of the different contributions of different EEG feature dimensions to emotion recognition. Therefore, RLSR is endowed the ability to automatically identify the discriminative features while suppress the redundant and noisy features. Besides, due to the introduction of graph regularization, LSRG generally outperforms ssLSR in terms of the average performance.
• For the three recently proposed models, RDSR, SDSSFS and SSML, they have generally shown good performance in emotion recognition. For example, RDSR improves the performance by 1.39%, 1.20% and 0.86% in the three tasks of SEED-IV, in comparison with RLSR. In RLSR, direct mapping between the data matrix and the label indicator matrix is built by a row-sparse projection matrix. While in RDSR, it additionally takes the local label consistency into consideration to constrain the projection matrix. Similarly, by taking RLSR as a baseline method, SSDSFS additionally includes the labeldragging strategy to maximize the margin between classes, leading to superior performance. As for SSML, the graph learning technique is used to more effectively characterize the underlying connections of samples.
In addition, we rearranged the emotion recognition accuracies in the form of confusion matrix. In Fig. 4, we show the confusion matrices of JCSFE on the two data sets, from which we easily obtain the average recognition accuracy on each state. Taking SEED-IV for example, JCSFE acts the best recognition accuracy, 82.33%, on the neutral state. There are only 7.58%, 4.1% and 6% neutral EEG samples which are incorrectly recognized as sad, fear and happy, respectively.
Moreover, we performed the Friedman test on the emotion recognition results to perform the statistical analysis among the compared models. The null hypothesis is that all these models share the same performance in emotion recognition. If such  hypothesis is rejected, we use the Nemenyi post-hoc test to tell whether two among all the nine models have significantly different performance. In this work, we have nine models and 45 cases in SEED-IV (i.e., K =9, N=45). We rank the accuracies in each case in descending order and then mark the highest one as 1, and the lowest one as 9. In case of tiers, the related models share the average rank. Therefore, the average ranks of ssSVM, ssLSR, RLSR, LSRG, SFSRM, RDSR, SDSSFS, SSML, and JCSFE are 8.00, 5.94, 4.67, 5.17, 6.42, 4.60, 3.92, 4.57, and 1.69, respectively, as shown in Fig. 5a). The length of these vertical bars is termed as the critical distance, which is calculated as CD = q α where q α is the critical value in Tukey distribution (q α is 3.102 when K = 9). We set the significance level as 0.05. Since the average ranks of JCSFE and SDSSFS are 1.69 and 3.92, their difference 2.23 is larger than the CD value 1.7909. Therefore, we conclude that there exists significant difference between their results. Intuitively, there is no overlap between the red and the purple bars in Fig. 5a). For SEED-V, the CD value is 1.7341 since K = 9 and N = 48; Accordingly, we have the statistical analysis results in Fig. 5b).  6. The correspondence between feature dimensions and EEG frequency bands (channels) [29].

D. Label-Common EEG Spatial-Frequency Patterns
In section II-E, we explained how to quantitatively measure a feature to be a label-common one. In this section, we first show the correspondence between an EEG feature dimension and its frequency bands (channels), based on which we then investigate the label-common EEG spatial-frequency patterns in cross-session emotion recognition. Consider that θ i is importance descriptor to define the contribution of the i -th feature in classifying all the involved emotional states and we are given an EEG data set with p frequency bands and q channels. According to the correspondence between EEG frequency bands and feature dimensions [29], the importance of the i | p i=1 -th frequency band can be calculated by ω(i ) = θ (i−1)×q+1 + θ (i−1)×q+2 + · · · + θ i×q .
In both SEED-IV and SEED-V, we have five frequency bands and 62 EEG channels. Therefore, by respectively setting p to 5 and q to 62 in both rules (12) and (13), we automatically identify the critical EEG frequency bands and channels in classifying the emotional states, as illustrated in Fig. 6. In Fig. 7, bar charts are used to show the importance of different EEG frequency bands on the SEED-IV and SEED-V data sets, and the corresponding values are marked on the top of bars. It can be seen that the Gamma frequency band holds the largest value; that is, the Gamma band generates more discriminative features than the others on average, which is undoubtedly identified as the most critical frequency band in EEG emotion recognition.
Similarly, according to equation (13), it is easy to obtain the quantitative importance measure of different EEG channels. To more intuitively present the importance of different brain regions rather than listing the contributions of all the EEG channels, we use the brain topology to show how the EEG channel importance values distribute on the scalp in Fig. 8, from which we find that the spatial patterns of both data sets are generally consistent. Based on the obtained results, we roughly conclude that the four regions of the prefrontal, the left/right temporal, and the (central) parietal lobes exhibit to be more correlated to emotion recognition. The above EEG spatial-frequency activation patterns identification results are generally consistent with some existing studies [19], [29], [30].

E. Label-Specific EEG Spatial-Frequency Patterns
Based on the normalized 1 -norm label-specific feature exploration described in section II-E, below we analyze the specific EEG activation patterns associated with each of the emotional states, according to the established rules in the above subsection. Taking the SEED-IV data set for example, we respectively annotated the four emotional states of sad, fear, happy and neutral as the first, second, third and fourth classes. By using the one-hot encoding, the label indicator vector of these four emotional states are [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], and [0, 0, 0, 1], respectively. Therefore, the four columns of the projection matrix can be viewed as the feature importance descriptor respectively corresponding to these four emotional states to some extent.
Then, according to equations (12) and (13), the spatialfrequency activation patterns associated with each emotional state are achieved, as shown in Fig. 9. From the obtained results, we find that though some common patterns are shared across different emotional states, their activation patterns are not exactly the same and there have some respective unique patterns. For example, we find the activated occipital region is common for all especially the fear and happy states in Fig. 9; however, they have differently distributed importance values of frequency bands. Generally, the importance values of frequency bands distribute similarly across the sad, fear and neutral states, which all have the Gamma band as the most important one. However, the average contributions of the Theta and Gamma bands look similar on the happy state. Similarly, the label-specific EEG spatial-frequency patterns on the SEED-V data set are provided in Fig. 10. Based on the above analysis, we generally conclude that it is insufficient to emphasize the label-common EEG features only in emotion recognition and it is beneficial to additionally take the labelspecific features into consideration.

IV. DISCUSSION
This section discusses the connections and differences between JCSFE and some existing models such as the LLSF [31], JLCLS [32], LFCMLL [33], and CLML [34]. The main common ground among these models is the utilization of the 2,1 -norm and the 1 -norm to respectively learn labelcommon and label-specific features. From this point of view, our JCSFE model formulation is inspired by the existing ones. On the model optimization, most of these models use the APG method to solve the model objective functions.
The differences between JCSFE and the above mentioned models consist at least the following three aspects.
• JCSFE is a semi-supervised model by utilizing both labeled and unlabeled EEG samples in model learning, which is more effective in capturing the underlying data properties [35]. Moreover, jointly estimating the emotional states of unlabeled EEG samples and optimizing the remaining model variables can better guide the discriminative feature exploration.
• JCSFE is particularly designed for EEG emotion recognition. In the above experiments, we not only obtained improved emotion recognition performance by JCSFE, but also investigated the EEG spatial-frequency patterns from two aspects, i.e., each feature in terms of all the emotional states and each emotional state in terms of all the features. However, the other models focused only on evaluating their performance on benchmark data sets by standard metrics (e.g., accuracy) but paid less investigation on the problem itself.
• In the present work, the video-evoked EEG emotion recognition is a single-label pattern classification problem and each sample should be uniquely categorized into a specific emotional state. Therefore, we did not take the label correlations into consideration, which is different from these multilabel or label distribution learning models.

V. CONCLUSION AND FUTURE WORK
In this paper, we proposed a new model term JCSFE for semi-supervised cross-session EEG emotion recognition, which jointly explores the label-common and label-specific EEG features by respectively introducing the 2,1 -norm and the 1 -norm based regularization terms. Moreover, a similarity graph was used to characterize the data manifold based on which the local invariance property of data was preserved. Comparative studies were performed on two emotional EEG data sets and the results demonstrated that 1) JCSFE obtained improved emotion recognition performance in comparison with the state-of-the-arts, 2) the EEG spatial-frequency patterns in emotion recognition were extensively analyzed from two aspects, i.e., the patterns across all the emotional states and those associated with each emotional state. It is worth mentioning that the analysis of EEG spatial-frequency patterns in this work is completely data-driven. Though there exist consistencies between our results and some existing studies to some extent, further research from both cognitive neuroscience and information science is still necessary to validate whether they are related to the neural mechanism of affective information processing.
In the present work, we consider the cross-session emotion recognition only, which is much easier than the cross-subject setting due to the existence of inter-subject variabilities. As our future work, we will consider extending the current JCSFE model in dealing with cross-subject EEG emotion recognition. That is, possible transfer learning strategies will be improved and integrated into JCSFE to suppress the intersubject variabilities.