Maximum Neighborhood Margin Discriminant Projection for Classification
Abstract
1. Introduction
Dimensionality reduction (DR) plays an important role in many fields such as pattern classification, machine learning, and computer vision. Its purpose is to solve the “curse of dimensionality" [1] and map the high-dimensional points to a subspace that reveals the intrinsic structure of the original data. Among the DR techniques, principal component analysis (PCA) [2, 3], linear discriminant analysis (LDA) [3, 4], locality preserving projection (LPP) [5, 6], and their kernelized and tensorized variants [7–18] are the most representative and well-known algorithms. Although these methods have different suppositions, they can be put into a unified graph embedding framework with different constraints [19].
Generally, PCA aims to preserve the global geometric structure of data by maximizing the trace of the feature covariance matrix and produces compact representation of the original space in a low-dimensional space. However, it does not take into account the class label information. The goal of LDA is to find the global discriminant information for classification by maximizing the ratio between interclass and intraclass scatters. In contrast to PCA, LDA takes much consideration of the class information and strengthens the ability of pattern discrimination. Since both PCA and LDA only consider the global structure of data, they have little to do with the essential manifold of the data. As for them, it is difficult to discover the hidden submanifold that truly reflects the essential structure of the data.
In contrast to PCA and LDA, locality preserving projection (LPP) is introduced to detect the intrinsic geometry of the manifold structure of data [5, 6]. LPP is a classical liner graph embedding [19] derived from Laplacian Eigenmap [20]. It attempts to find an embedding that preserves the local neighborhood information and reflects the inherent submanifold structure. In recent years, many variants of LPP have been developed for dimensionality reduction [21–34]. These LPP-based DR algorithms can be mainly put into two categories: supervised and unsupervised ones. As for the supervised LPP-based methods, they generally employ class information of data in the process of subspace leaning, such as in [23–27, 29–32]. On the contrary, the unsupervised LPP-based ones do not consider class information [21, 22, 33, 34]. Among them, unsupervised discriminant projection (UDP) [21], as a simplified version of LPP [35], is a very popular method with the aim of resolving the “overlearning locality" existing in LPP. UDP seeks to find a projection by maximizing the nonlocal scatter and minimizing the local scatter simultaneously. As for LPP and UDP, since the local structure of data is modelled by the nearest neighbor graph, they may be not effective in the case of noisy data. Moreover, they are completely unsupervised in regard to the class label information and cannot perform well for classification. In addition, recently there are some new DR techniques that integrate sparse representation (SR) and subspace learning [36–41]. Based on the theory of sparse representation, these DR methods have more discriminating power for classification.
In this paper, we propose a novel dimensionality reduction method, named maximum neighborhood margin discriminant projection (MNMDP). It is based on the idea of LPP. However, unlike LPP, MNMDP is a supervised learning technique by fully utilizing the class label information for discovering the inherent manifold structure. The proposed MNMDP constructs a weighted k-nearest neighbor (k-NN) graph that models the data topology, and then the affinity weights of edges of the graph are built to fully capture local geometry of interclass and intraclass neighborhoods of each point in the phase of manifold learning. Although there exist many supervised variants of LPP, our MNMDP is very different from them in building adjacent similarities and the objective functions. In MNMDP, the affinity weight can be viewed as the combination of the local weight and the discriminating weight, the definition of which is different for intraclass and interclass neighbors of a given point. The local weight can well represent the local neighborhood structure, while the discriminating weight can further differentiate between different classes by using the class label information. Note that the definition of the affinity weight first appears in [23] and then is adopted to get good performance in face recognition in [26, 42]. After establishing the affinity weights of k-NN graph, we compute the intraclass neighborhood scatter and interclass neighborhood scatter, respectively. Then, a liner mapping for MNMDP is obtained by maximizing the margin between them. Hence, our MNMDP cannot only well preserve the intrinsic submanifold structures of data but also enhance the discrimination among different classes, so as to improve classification performance. The experimental results on four high-dimensional databases show that our proposed method performs well in pattern classification, compared to the competing methods: PCA, LDA, LPP, and UDP.
The rest of this paper is organized as follows. Section 2 briefly reviews LPP. In Section 3, we introduce the proposed MNMDP method. Section 4 represents in detail the classification performance of the competing methods by conducting comprehensive experiments. Section 5 discusses the characteristics of the MNMDP. Finally, we conclude this paper in Section 6.
2. Locality Preserving Projection
For a general classification problem, denote by X = {xi ∈ ℝm}i=1N the set of Nm-dimensional samples, each of which has the class label ci ∈ {1,2,…, C}, where C is the total number of classes. In general, the aim of dimensionality reduction is to transform the original m-dimensional space to a new d-dimensional subspace by a liner mapping; that is, Y = {yi ∈ ℝd}i=1N, where d ≪ m. The liner mapping for DR is always mathematically formulated as(1)yi=ΦTxifori=1,…,N,where Φ ∈ ℝm×d is a transformation matrix and Φ = [ϕ1, ϕ2,…, ϕd].
Nowadays, locality preserving projection (LPP) [5, 6] is a promising dimensionality reduction technique, which is a classical graph embedding [19]. Through a liner transformation projection, LPP can find an embedding that best discovers the intrinsical manifold structure of data. In order to do so, it first constructs a weighted affinity graph 𝒢 = (𝒱, ℰ), where 𝒱 is the set of all points of data and ℰ is the set of edges between any pairs of points. Note that the affinity graph 𝒢 is usually established by k-neighborhood; nodes i and j are connected by an edge if xi is in k nearest neighbors of xj or xj is in k nearest neighbors of xi. Then, the weight Wij of an edge between xi and xj is often defined as follows:(2)Wij={exp(−||xi−xj||2δ),xj∈𝒩k(xi)orxi∈𝒩k(xj)0,otherwise,where 𝒩k(xi) or 𝒩k(xj) denotes a set of the k nearest neighbors of the sample xi or xj and the parameter δ is a positive constant.
To obtain a liner graph embedding, the objective function of LPP to be minimized is as follows:(3)min∑ij||yi−yj||2Wij.The criterion above with Wij can give a penalty for mapping neighboring points xi and xj far apart in an embedded subspace. By simple algebra, the objective criterion for LPP under the appropriate constraint can be rewritten as(4)
Finally, the transformation matrix Φ to minimize (4) can be achieved by solving the generalized eigenvalue problem(5)XLXTΦ=λXDXTΦ,where Φ only contains d eigenvectors corresponding to the d smallest eigenvalues, that is, λ1 ≤ λ2 ≤ ⋯≤λd.
3. Maximum Neighborhood Margin Discriminant Projection
In pattern recognition, the class label information of data plays an important role for classification. Moreover, the local structures of the training samples are also very useful for it. Inspired by the two facts, a novel dimensionality reduction method, called maximum neighborhood margin discriminant projection (MNMDP), is introduced in this section. In the proposed MNMDP, both the class information and local geometry of data are fully taken into account for classification simultaneously. The major focus of MNMDP is to seek a liner graph embedding that not only detects the underlying submanifold structure of data from the same class but also enhances the discrimination among submanifolds from different classes in the process of the learning. To obtain this graph embedding of given data, it first constructs the k-nearest neighbor graph of all points that can be retained in the projected subspace, and then assigns completely distinct weights for the interclass and intraclass neighbors of a point by fully considering the class information, and at last finds a liner mapping by maximizing the margin between the interclass and intraclass neighbors of all points, so as to improve the classification performance in the new subspace.
3.1. Intraclass Neighborhood Scatter
Given a data set X = {xi ∈ ℝm}i=1N, MNMDP first begins to build an adjacent graph 𝒢 by k-neighborhood for all points. For a data point xi, let 𝒩k(xi) be the set of k-nearest neighbors of it. In the meantime, let 𝒩k+(xi) denote the intraclass neighbors in the k-neighborhood 𝒩k(xi) (i.e., neighbors from the same class as xi) and 𝒩k−(xi) the interclass neighbors of xi in 𝒩k(xi) (i.e., neighbors from different classes). In order to represent the neighborhood relationship of each data point, we need to build an affinity matrix for intraclass neighborhood and interclass neighborhood, respectively. Then, the intraclass and interclass neighborhood scatters are accordingly computed to preserve the local neighborhood structures of the data. Subsequently, by keeping the margin between the intraclass and interclass neighborhood maximum with two scatters, MNMDP can possess important pattern discrimination for classification in the projected subspace.
The affinity weights for intraclass neighborhoods of all points are defined as follows:(6)Wij+={exp(−||xi−xj||2δ)(1+exp(−||xi−xj||2δ)),xj∈𝒩k+(xi)orxi∈𝒩k+(xj),0,otherwise,where the parameter δ is a positive regulator. According to (6), the affinity weight integrates the local weight, that is, exp(−||xi−xj||2/δ), which can preserve the local intraclass neighborhood structure and the intraclass discriminating weight, that is, 1 + exp(−||xi−xj||2/δ), which can represent the class information of the same classes. From (6), the affinity weights of the intraclass points are larger than those in LPP. This fact is very advantageous to classification.
From the viewpoint of pattern recognition, it is quite favorable for presuming that the different samples have different contributions to classification. Generally speaking, the fact that the samples with greater contributions have the more significance for classification is naturally related to their neighborhood location in the feature space. Here, we take into account a local scaling regulator of data to dynamically adjust adjacent weights between pairs of neighbors, so as to reasonably reflect the classification contribution of each sample. According to the k-neighborhood of one sample xi, the parameter δ as local scaling regulator in (6) is set to be as follows:(7)δ=1k2∑j=1k||xi−xj||2.This can be a good reasonable way to estimate the value of δ, and the affinity weights Wij+ between nodes i and j are allowed to self-tune in terms of the k-neighborhood with δ.
To still retain the intraclass neighborhood relations through a liner mapping, that is, yi = ΦTxi, the intraclass neighborhood scatter along a projection ϕ is defined as(8)J+(ϕ)=12∑ij||yi−yj||2Wij+.
It follows from (8) that(9)
To gain more insight into (10) in terms of the affinity matrix W+ from (6), S+ is rewritten as(11)
In order to well preserve the intraclass neighborhood and keep the intraclass neighborhood scatter compact in the embedded subspace, the optimal projections can be obtained by minimizing the intraclass neighborhood scatter:(12)MinJ+(ϕ)=ϕTS+ϕ.
3.2. Interclass Neighborhood Scatter
In contrast to intraclass neighborhood scatter, the affinity weights for interclass neighborhoods of all points are defined as follows:(13)Wij−={exp(−||xi−xj||2δ)(1−exp(−||xi−xj||2δ)),xj∈𝒩k−(xi)orxi∈𝒩k−(xj),0,otherwise,where the parameter δ is a positive regulator, the same as (7). In (13), the affinity weight can simultaneously reflect the local interclass neighborhood structure by the local weight exp(−||xi−xj||2/δ) and the class information of the different classes by interclass discriminating weight 1 − exp(−||xi−xj||2/δ). From (13), the fact that the affinity weights of the interclass points are less than those in LPP is also very helpful for classification.
Then, the interclass neighborhood scatter along a projection ϕ is defined as(14)J−(ϕ)=12∑ij||yi−yj||2Wij−.It follows from (14) that(15)
By the same algebra as S+, S− in (16) is rewritten as follows:(17)S−=XL−XT,where D− is a diagonal matrix, its elements are column sum of W−, that is, Dii− = ∑jWij−, and L− = D− − W−. Note that L−, D−, and S− are symmetric matrixes.
To gain more discrimination between different classes through a liner mapping, the interclass neighborhood scatter in the projected subspace should be kept more separable by maximizing the following criterion:(18)MaxJ−(ϕ)=ϕTS−ϕ.
3.3. Optimal Liner Embedding
Combining (12) and (18) with the orthonormal constraint (i.e., ΦTΦ = I), we get the following objective function:(19)
According to (20) or (21), we can find two aspects that are favorable for classification. On one hand, the optimal projections obtained are such that the intraclass samples are attracted being more compact (minimizing the intraclass neighborhood scatter), while the interclass samples are simultaneously pulled being more separable (maximizing the interclass neighborhood scatter). Of course, it can keep the margin between intraclass and interclass neighborhood maximum in a new subspace, so as to clearly enhance pattern discrimination. On the other hand, the graph embedding obtained with orthogonal projections can have both more locality preserving power and more discriminating power [33].
To maximize the above objective function, we can use the Lagrangian multiplier method to first build the following function:(22)L(ϕi,λi)=∑i=1d(ϕiT(S−−S+)ϕi−λi(ϕiTϕi−1)),where λi (i = 1,…, d) is a Lagrange multiplier. Then, the optimization is carried out by the partial derivative of L(ϕi, λi) with respect to ϕi:(23)∂L(ϕi,λi)∂ϕi=(S−−S+)ϕi−λiϕi.
Let (23) be zero; we yield(24)(S−−S+)ϕi=λiϕi.
Thus, the optimal matrix Φ that maximizes the objective criterion in (20) can be achieved by solving the generalized eigenvalue problem(25)(S−−S+)Φ=λΦ,where Φ only contains d eigenvectors corresponding to the d largest positive eigenvalues, that is, λ1 ≥ λ2 ≥ ⋯≥λd ≥ 0. Note that, since the symmetric matrix (S− − S+) is not positive semidefinite, the eigenvalues of (S− − S+) may be positive, negative, or zero. To maximize (20), we only need to select the d largest positive eigenvalues.
3.4. The MNMDP Algorithm
Based on the above description, the algorithmic procedure of the proposed MNMDP is summarized in Algorithm 1.
4. Experimental Results
In this section, we evaluate the classification performance of the proposed MNMDP, in comparison with the state-of-the-art DR methods: PCA, LDA, LPP, and UDP. The experiments are conducted on four databases including HRF, FKP, AR, and Musk databases. In order to measure the recognition performance, l images per class on each database are randomly selected as training samples, while the remaining images per class are used for testing. To ensure that the performance is not biased from certain random realization of training/testing set, 10 independent runs for a given l are performed. Since training and testing samples are chosen randomly online, those 10 runs give us 10 different training and testing sets for performance evaluation. Note that both training and testing sets on each run have no overlap between them. The average recognition rates across these 10 runs with 95% confidence are regarded as the final classification results. In our experiments, to overcome the small sample size problem encountered in LDA, LPP, and UDP, PCA preprocessing is employed to solve the singular matrix by keeping nearly 98% image energy. If the singularities still exist, pseudoinverse is adopted. In order to ensure a fair comparisons, the regulator δ in LPP is set in terms of (7). The k-neighborhood parameter denoted by Wk for building the weight matrix in LPP and UDP is selected as Wk = l − 1 [21]. The nearest neighbor (NN) classifier with Euclidean distance is used for classification in the experiments.
4.1. Experiments on HRF
The PolyU HRF (High-Resolution-Fingerprint) database [44] was collected in two separate sessions. Here, we use the DBII of HRF. The database contains 148 fingers, each of which has five sample images per session. Each image is taken with 1200 dpi and the size per image is 640 × 480 pixels, with 256 grey levels per pixel. For computational efficiency, each image is resized to 32 × 32 pixels in our experiments. As an example, Figure 1 shows ten images of one finger in the HRF database. We form the training set by a random subset of l images per class and use the rest as a testing set. In the following experiments, the numbers of training sample images per class are chosen as l = 5, 6,7, 8.
In the experiments on HRF, we first explore the performance of MNMDP with varying k-neighborhood parameter Wk in terms of recognition rates. The value of Wk is set from 1 to 21 in Step 2. The maximal average classification results via Wk for each l are plotted in Figure 2. It can be obviously observed that the proposed MNMDP with more training samples has better classification. As can be seen in Figure 2, the performance of MNMDP for each l first increases slowly when Wk changes from 1 to 5, and then increases rapidly when Wk changes from 5 to 9, and finally drops with increase of Wk. The possible reason for this experimental phenomenon is that the affinity graph is unable to capture effectively the geometry of data when Wk is small and the more geometrical information of data can be preserved as Wk increases. However, when Wk is beyond the reasonable value, the k-neighborhood for a given point may include more interclass points [45], and this can degrade the ability of pattern discrimination. Consequently, the experimental results reveal that k-neighborhood parameter Wk in MNMDP plays an important role for preservation of geometrical and discriminant information of data that is available for classification and its suitable value can be easily selected to achieve good performance.
Furthermore, the experimental comparisons of competing methods are studied by varying the reduced dimensionality on HRF. We experiment with the dimension of the reduced space from 5 to 100 in Step 5. Based on the results shown in Figure 2 the best k-neighborhood parameters for MNMDP are set as Wk = 9 for l = 5,6, 7,8, respectively. For each l on HRF, the comparative average recognition performance of each method is given in Figure 3. As shown in Figure 3, the classification performance of each method ascends quickly until the dimensionality is about 30 and then keeps almost stable or decreases slowly with increase of dimensionality. It is clear that the proposed MNMDP consistently outperforms the other methods at any value of dimensionality, making the superiority of the MNMDP evident. Observe again that LDA obtains the better performance than PCA, LPP, and UDP, PCA is preferable to LPP at small values of dimension while they get the quite similar performance at large values of dimension, and UDP is the weakest among them. This fact that the performance of MNMDP and LDA is better than that of PCA, LPP, and UDP may be because our MNMDP and LDA are supervised learning methods using class information. Thereby, the experimental results in Figure 3 demonstrate that MNMDP is more robust over a large range of dimensionality with satisfactory performance.
The best performance of the competing methods by means of the highest average recognition rates with the corresponding standard deviations (stds) and values of dimension in the parentheses is also given in Table 1 for each l on HRF. Note that the best recognition rates for each l among the methods are marked in bold face. We can obviously see that the performance of each method increases with increase of the training samples. As expected, the proposed MNMDP achieves the best performance and the corresponding dimensionality is the smallest among the competing methods. It can also be observed from Table 1 that LDA is better than PCA, LPP, and UDP and LPP is superior to PCA. In addition, UDP is better than LPP and PCA when l = 5 and less than them when l = 6,7, 8. As a consequence, the promising performance of our MNMDP is confirmed on HRF.
4.2. Experiments on FKP
The PolyU FKP (Finger-Knuckle-Print) database [46] contains 165 subjects, each of which has 48 samples that were taken in two separate sessions. Each session per subject has 6 images for each of the left index finger, the left middle finger, the right index finger, and the right middle finger. Here, we use FKP ROI database obtained by ROI extraction algorithm in [47] and the size of each image is 220 × 110 pixels, with 256 grey levels per pixel. To save computation time, we use a subset of the FKP ROI database in the experiment. The data set we selected contains the 100 subjects and 12 images for each individual. Note that the first 3 sample images per finger are selected. In our experiments, each image is resized to 32 × 32 pixels for computational efficiency. As an example, Figure 4 shows twelve images of one subject in the FKP database. We form the training set by a random subset of l images per class and use the rest as a testing set. In the following experiments on FKP, the numbers of training sample images per class are chosen as l = 5, 7,9, 11.
In the experiments on FKP, we first investigate the recognition performance of MNMDP on FKP by varying the k-neighborhood parameter Wk. The value of Wk is varied from 1 to 21 in Step 2. The maximal average recognition rate at each value of Wk for each l is illustrated in Figure 5. It is obvious that the classification performance of MNMDP increases when the number of training samples increases. As can be noted in Figure 5, the performance of MNMDP nearly increases when Wk changes from 1 to 7 at first and then decreases when Wk becomes large. From Figure 5, it can be concluded that the appropriate value of Wk in MNMDP is significant for capturing the geometrical structure and pattern discrimination of data on FKP, the same as on HRF, and it can be easily selected to obtain good performance.
To further verify the performance of MNMDP, the comparative classification results of the competing methods on FKP are reported in Figure 6 with varying the reduced dimensionality from 5 to 100 in Step 5. According to the experimental results in Figure 5, the best k-neighborhood parameters for MNMDP are set as Wk = 7 for l = 5,7, 9 and Wk = 5 for 11, respectively. We can see that the classification performance of each method almost ascends monotonically with increasing the dimensionality, at first increases quickly, and finally increases very slowly or even keeps stable. It is noticeable that the proposed MNMDP outperforms the other methods significantly across all values of dimensionality for each l, and UDP almost obtains the worst performance among them. We can also observe that the performance of LDA is always superior to that of LPP and UDP over a large range of dimensionality, and PCA performs better than LPP when the dimension is small whereas it does worse when the dimension is large. Based on the experimental comparisons in Figure 6, the conclusion we have drawn is that our method consistently obtains better classification performance, irrespective of the variation in dimensions.
For each l on FKP, the experimental comparisons of the competing methods in terms of the maximal average recognition rates with the corresponding standard deviations (stds) and values of dimension in the parentheses are also tabulated in Table 2. It should be noted that the best performance for each l among the methods is indicated in bold face. It can be seen from Table 2 that the proposed MNMDP is very superior to the other methods and its optimal reduced dimensionality is far smaller than that of them. Moreover, the performance of each method increases as the number of the training samples increases. We can also observe that the best performance of LDA is better than that of PCA, LPP, and UDP and UDP and LPP are preferable to PCA. Through the comparative study of the best performance of the competing methods, we can conclude that the MNMDP has more discriminating power to achieve the satisfactory classification.
4.3. Experiments on AR
The AR face database [48] contains over 4,000 color images corresponding to 126 people's faces (70 men and 56 women). The image samples of each person were taken in two sessions, separated by two weeks time. Here, we select a subset of AR including 50 men and 50 women, and each person has 14 image samples, separately collected in two sessions with neutral expression, smile, anger, and scream, left light on, right light on, and all side lights on. Each image is manually cropped and then normalized to 32 × 32 pixels, with 256 grey levels per pixel. As an example, Figure 7 shows the images of one person in the AR database. We form the training set by a random subset of l images per class and use the rest as a testing set. In the following experiments on AR, the numbers of training sample images per class are chosen as l = 7, 9,11,13.
In the experiments on AR, the classification performance of the proposed MNMDP versus the k-neighborhood parameter Wk is first carried out for each l, shown in Figure 8. Notice that the values of Wk are varied from 1 to 21 in Step 2. It is obvious that the performance of MNMDP increases with the increase of the training samples. As can be noted in Figure 8, the recognition rates of MNMDP ascend quickly at first with increase of Wk and then almost keep stable when Wk becomes large. Thus, we can conclude that the appropriate value of Wk plays an important role in MNMDP for preserving the geometry of data and enhancing the power of pattern discrimination, and it can be easily set to obtain good performance.
Moreover, the classification performance of the proposed MNMDP is further evaluated on AR by varying the reduced dimensionality, in comparison with the competing methods. The dimensionality varies from 5 to 100 with an interval of 5. Note that from the results in Figure 8, the best k-neighborhood parameters for MNMDP are set as Wk = 17 for l = 7, Wk = 15 for l = 9, and Wk = 13 for l = 11,13, respectively. The performance of each method in terms of average recognition rates is illustrated in Figure 9. It can be seen that the performance of each method for each l first increases rapidly when Wk becomes large and then approximately tends to be stable. Compared to PCA, LDA, LPP, and UDP, the proposed MNMDP method almost has the best performance by varying the dimensionality, especially at the large values of dimensionality. In the meantime, LDA is superior to PCA, LPP, and UDP with increasing the dimensionality. In addition, in most cases LPP is better than PCA and UDP, and PCA is better than UDP. From the comparative performance in Figure 9, we can conclude that our method always has better classification results over a large range of the dimensionality.
The maximal average recognition rates of each competing method on AR for each l with the corresponding standard deviations (stds) and values of dimension in the parentheses are also reported in Table 3. It is to be noted that the best performance among them is described in bold face. We can see that the performance of each method is improved by increasing the number of the training samples. As Table 3 displays, MNMDP has the best performance among the methods for each l. It can also be observed that the best classification performance of LDA is better than that of LPP, PCA, and UDP. Consequently, the experimental results in Figure 9 and Table 3 on AR face database certainly demonstrate the good performance of the proposed MNMDP.
4.4. Experiments on Musk
The Musk (version1) database [49] is one of the two-class classification tasks that predicts whether new molecules will be musks or nonmusks. It totally contains 476 samples, each of which has 166 attributes that depend on the exact shape or conformation of these molecules. In the experiments, we set the number of training samples per class as l = 50,80,110,140, respectively, and the remaining samples are used to test the competing methods.
In the experiments on Musk, we first investigate the classification performance of the proposed MNMDP versus the k-neighborhood parameter Wk for each l. The experimental results are shown in Figure 10. The values of Wk are presented from 1 to 21 in Step 2. As can be seen in Figure 10, the recognition rates of MNMDP increase from 1 to 5 and then drop when the values of Wk increase. Hence, the classification results have revealed that the k-neighborhood parameter Wk is very important for MNMDP to preserve the geometrical structures of data and to strengthen pattern discrimination, and the appropriate value of Wk for good performance can be easily determined.
To further verify the classification performance of our MNMDP on Musk, it is compared to the competing methods by varying the reduced dimensionality. The dimensionality increases from 1 to 30 in Step 1. It should be noted that from the results in Figure 10, the best k-neighborhood parameters for MNMDP are determined as Wk = 5 for l = 50,80, 110,140, respectively. Figure 11 shows the performance of each method in terms of average recognition rates. It can be found that the performance of each method for each l first ascends at small values of Wk and then approximately tends to be stable or increases slowly when Wk becomes large. As shown in Figure 11, the proposed MNMDP method almost has the best performance by varying the dimensionality among all the methods. It can also be observed that PCA, LPP, and UDP get the similar performance when Wk is about larger than 10, and LDA obtains the worse performance when Wk varies from 6 to 30. Therefore, the classification results in Figure 11 indicate that our method is always better than PCA, LDA, LPP, and UDP with the change of dimensionality.
The comparative experiments on Musk for each l in terms of the maximal average recognition rates with the corresponding standard deviations (stds) and values of dimension in the parentheses are finally shown in Table 4. Note that the best performance for each l among all the methods is represented in bold face. It is clear that the classification performance of the proposed MNMDP is better than PCA, LDA, LPP, and UDP. In the meantime, the optimal reduced dimensionality of our MNMDP for each l is smaller than that of them. Therefore, we can conclude that the proposed MNMDP does well in dimensionality reduction with good classification.
In summary, the proposed MNMDP almost yields the best classification performance in all the experiments, compared to PCA, LDA, LPP, and UDP. It implies that both pattern discrimination and geometrical information of the data are very important for classification, and MNMDP fully captures them in the learning processing.
5. Discussions
In this section, some characteristics of the proposed MNMDP that are available for classification are discussed. We first analyze the affinity weight for intraclass and interclass neighborhoods, then discuss the MNMDP from the viewpoint of distance metric learning [50, 51], and finally explore the eigenvalues of the generalized eigenvalue problem in MNMDP.
According to (6) and (13), the affinity weight can be thought of as integration of the local weight exp(−||xi−xj||2/δ) and the discriminating weight that is divided into two categories: intraclass one for intraclass neighbors 1 + exp(−||xi−xj||2/δ) and interclass one for interclass neighbors 1 − exp(−||xi−xj||2/δ). It means that the affinity weight cannot only preserve local structures of data but also distinguish between different classes. Through the analysis of the affinity weight in MNMDP, its three properties can be summarized as follows [23].
For a given point, the affinity weight gives more similarity to the intraclass neighbors than the interclass ones, when the Euclidean distances are equivalent. This is favorable for classification.Property 1
The affinity weight can retain intraclass and interclass similarity in certain ranges no matter how strong the noise is, since 1 ≤ 1 + exp(−||xi−xj||2/δ) ≤ 2 and 0 ≤ 1 − exp(−||xi−xj||2/δ) ≤ 1. This can largely preserve local geometric structures of data for different classes.Property 2
The affinity weight can strengthen the ability of margin augmentation and noise suppression. This is explained by two aspects. One is that the close points from different classes could have smaller values of similarity and the margin between different classes can be augmented, because 1 − exp(−||xi−xj||2/δ) tends towards 0 with a decrease of the Euclidean distance. The other is that the affinity weight can inhibit the noise in some degree; that is, the more distant points from the same class could be less similar to each other, because exp(−||xi−xj||2/δ) tends towards 0 with an increase of the Euclidean distance.Property 3
The above good properties make a good supervised construction of the affinity graph, and they are well maintained in the process of MNMDP subspace learning. They can overcome the issue that the interclass neighbors might have the same representation as the intraclass neighbors in the reduced subspace for the given point in LPP [52]. Based on them, MNMDP achieves a good projection that carries not only local geometric structures but also discriminant information. Thus, the properties of the affinity weight make the MNMDP robust and effective for the classification tasks.
In regard to classification, the MNMDP can be viewed as one method of distance metric learning [50, 51]. Once the transformation matrix Φ is obtained by MNMDP, the squared distance between xi and xj in the embedded subspace can be formulated as follows:(26)
Given that data points are projected along an eigenvector ϕi corresponding to eigenvalue λi, it follows from (24) with the orthogonal constraint ϕiTϕi = 1 that(27)
Based on the discussions aforementioned, the proposed MNMDP has more discrimination for classification. From the perspective of the classification, the key point is to enhance the pattern discrimination between samples from different classes of data. For the high-dimensional data, the good way is to find a projection that makes the samples from the same class compact and ones from different classes separable in the low-dimensional space. To visually verify the superiority of MNMDP for classification, we project the sample images in HRF database (described in Section 4.1) onto a two-dimensional subspace with the competing methods, and the samples from the first three subjects of HRF in the projected space with each method are illustrated in Figure 12. Note that, since the samples of each subject are collected in two separated sessions with different variations, each subject is clustered into two subclasses with each method. We can obviously observe that the separability of class clusters of MNMDP is much better than that of PCA, LDA, LPP, and UDP and the subclasses in MNMDP become more compact. Moreover, there is no overlap between three classes in MNMDP, compared to the other methods. Figure 12 intuitively validates the good discriminating capability of MNMDP for classification. The results are consistent with the observation from the experiments in Section 4 and the analysis in this section.
6. Conclusions
In this paper, the MNMDP introduced is a liner supervised dimensionality reduction technique, which can well preserve the local geometric structures of data and fully use class information for classification. In the proposed MNMDP, we employ distinct affinity weight for both intraclass and interclass neighbors of all points and then keep the margin between intraclass and interclass neighborhoods maximum through a liner mapping. In order to well investigate the classification performance of the MNMDP, our experiments are conducted on four high-dimensional databases, in comparison with the competing methods: LPP, UDP, PCA, and LDA. Through the comprehensive experiments, it demonstrates the effectiveness and robustness of the proposed MNMDP with satisfactory performance in pattern classification. In the future work, we plan to extend the MNMDP with some kernel tricks.
The average recognition rates of MNMDP versus Wk on HRF.
The average recognition rates of MNMDP versus Wk on FKP.
The average recognition rates of MNMDP versus Wk on AR.
The average recognition rates of MNMDP versus Wk on Musk.
| Methods | l = 5 | l = 6 | l = 7 | l = 8 |
|---|---|---|---|---|
| PCA | 66.96 ± 1.49 | 71.37 ± 1.27 | 73.87 ± 1.44 | 76.99 ± 2.43 |
| (60) | (55) | (65) | (50) | |
| LDA | 68.64 ± 1.31 | 73.00 ± 1.06 | 75.43 ± 1.29 | 78.55 ± 2.31 |
| (55) | (45) | (40) | (55) | |
| LPP | 67.21 ± 1.41 | 71.58 ± 1.25 | 74.29 ± 1.09 | 77.55 ± 2.01 |
| (135) | (80) | (85) | (65) | |
| UDP | 67.41 ± 1.58 | 71.21 ± 1.26 | 73.36 ± 1.52 | 75.57 ± 2.28 |
| (135) | (145) | (150) | (150) | |
| MNMDP | 74.31 ± 1.62 | 78.47 ± 1.61 | 81.21 ± 1.64 | 83.78 ± 1.58 |
| (30) | (30) | (30) | (40) |
| Methods | l = 5 | l = 7 | l = 9 | l = 11 |
|---|---|---|---|---|
| PCA | 63.31 ± 2.52 | 76.56 ± 2.80 | 85.18 ± 1.98 | 90.75 ± 2.40 |
| (110) | (125) | (130) | (110) | |
| LDA | 65.80 ± 2.35 | 79.44 ± 2.88 | 88.02 ± 2.01 | 92.80 ± 2.07 |
| (105) | (115) | (125) | (125) | |
| LPP | 64.30 ± 2.43 | 78.36 ± 2.52 | 86.80 ± 1.80 | 91.50 ± 2.31 |
| (110) | (110) | (120) | (105) | |
| UDP | 64.88 ± 2.32 | 78.16 ± 2.84 | 86.37 ± 1.83 | 91.30 ± 2.45 |
| (110) | (120) | (125) | (135) | |
| MNMDP | 69.49 ± 1.98 | 82.96 ± 2.54 | 90.98 ± 1.59 | 94.45 ± 1.90 |
| (70) | (50) | (55) | (65) |
| Methods | l = 7 | l = 9 | l = 11 | l = 13 |
|---|---|---|---|---|
| PCA | 64.17 ± 2.05 | 71.60 ± 1.40 | 76.70 ± 3.28 | 81.80 ± 3.46 |
| (100) | (100) | (110) | (100) | |
| LDA | 70.94 ± 1.74 | 78.24 ± 1.09 | 83.80 ± 2.82 | 88.80 ± 2.04 |
| (60) | (55) | (55) | (45) | |
| LPP | 65.23 ± 1.79 | 72.60 ± 1.02 | 78.20 ± 2.67 | 83.10 ± 3.31 |
| (85) | (95) | (80) | (95) | |
| UDP | 65.27 ± 1.68 | 72.22 ± 0.96 | 77.43 ± 3.17 | 82.80 ± 4.18 |
| (100) | (105) | (105) | (105) | |
| MNMDP | 81.63 ± 3.24 | 89.52 ± 2.99 | 94.37 ± 2.26 | 99.00 ± 0.94 |
| (120) | (125) | (140) | (120) |
| Methods | l = 50 | l = 80 | l = 110 | l = 140 |
|---|---|---|---|---|
| PCA | 77.82 ± 2.85 | 79.21 ± 2.36 | 81.05 ± 2.47 | 83.72 ± 3.2 |
| (33) | (40) | (42) | (19) | |
| LDA | 77.87 ± 2.72 | 79.59 ± 3.08 | 81.25 ± 2.25 | 83.47 ± 3.45 |
| (35) | (38) | (40) | (43) | |
| LPP | 77.66 ± 2.88 | 80.28 ± 2.45 | 81.45 ± 2.72 | 85.56 ± 3.17 |
| (33) | (24) | (32) | (25) | |
| UDP | 77.63 ± 3.37 | 79.81 ± 2.63 | 80.94 ± 2.31 | 84.18 ± 3.83 |
| (33) | (38) | (42) | (24) | |
| MNMDP | 78.03 ± 3.50 | 81.46 ± 2.87 | 85.47 ± 2.25 | 87.91 ± 1.74 |
| (11) | (18) | (8) | (7) |
Acknowledgments
This work was supported by the National Science Foundation of China (Grant nos. 61170126, 61202045, 61005017, 61202110, and 61300228). The authors wish to thank the anonymous reviewers for their valuable suggestions.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
References
- 1. Statistical pattern recognition: a reviewIEEE Transactions on Pattern Analysis and Machine Intelligence2000221437[PubMed][Google Scholar]
- 2. Face recognition using eigenfaces
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition June 1991 586591[PubMed] - 3. Eigenfaces vs. fisherfaces: recognition using class specific linear projectionIEEE Transactions on Pattern Analysis and Machine Intelligence1997197711720[PubMed][Google Scholar]
- 4. PCA versus LDAIEEE Transactions on Pattern Analysis and Machine Intelligence2001232228233[PubMed][Google Scholar]
- 5. Locality preserving projections
Proceedings of the Advances in Neural Information Processing Systems 2003 - 6. Face recognition using laplacianfacesIEEE Transactions on Pattern Analysis and Machine Intelligence2005273328340[PubMed][Google Scholar]
- 7. Kernel eigenfaces vs. kernel fisherfaces: face recognition using kernel methods
Proceedings of the IEEE 5th International Conference on Automatic Face Gesture Recognition 2002 215220 - 8. Nonlinear component analysis as a Kernel Eigenvalue problemNeural Computation199810512991319[PubMed][Google Scholar]
- 9. Generalized discriminant analysis using a kernel approachNeural Computation2000121023852404[PubMed][Google Scholar]
- 10. KPCA plus LDA: a complete kernel fisher discriminant framework for feature extraction and recognitionIEEE Transactions on Pattern Analysis and Machine Intelligence2005272230244[PubMed][Google Scholar]
- 11. An alternative formulation of kernel LPP with application to image recognitionNeurocomputing20066913–1517331738[PubMed][Google Scholar]
- 12. Kernel class-wise locality preserving projectionInformation Sciences2008178718251835[PubMed][Google Scholar]
- 13. Two-dimensional PCA: a new approach to appearance-based face representation and recognitionIEEE Transactions on Pattern Analysis and Machine Intelligence2004261131137[PubMed][Google Scholar]
- 14. MPCA: multilinear principal component analysis of tensor objectsIEEE Transactions on Neural Networks20081911839[PubMed][Google Scholar]
- 15. Two-dimensional linear discriminant analysis
Proceedings of the Advances in Neural Information Processing Systems 2004 - 16. Multilinear discriminant analysis for face recognitionIEEE Transactions on Image Processing2007161212220[PubMed][Google Scholar]
- 17. Tensor subspace analysis
Proceedings of the Advances in Neural Information Processing Systems 2005 - 18. Two-dimensional locality preserving projections (2DLPP) with its application to palmprint recognitionPattern Recognition2007401339342[PubMed][Google Scholar]
- 19. Graph embedding and extensions: a general framework for dimensionality reductionIEEE Transactions on Pattern Analysis and Machine Intelligence20072914051[PubMed][Google Scholar]
- 20. Laplacian eigenmaps for dimensionality reduction and data representationNeural Computation200315613731396[PubMed][Google Scholar]
- 21. Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometricsIEEE Transactions on Pattern Analysis and Machine Intelligence2007294650664[PubMed][Google Scholar]
- 22. Locality-preserved maximum information projectionIEEE Transactions on Neural Networks2008194571585[PubMed][Google Scholar]
- 23. Local structure based supervised feature extractionPattern Recognition200639815461550[PubMed][Google Scholar]
- 24. Local discriminant embedding and its variants
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05) June 2005 846853[PubMed] - 25. Face recognition using discriminant locality preserving projectionsImage and Vision Computing2006243239248[PubMed][Google Scholar]
- 26. Supervised optimal locality preserving projectionPattern Recognition2012451186197[PubMed][Google Scholar]
- 27. Neighborhood discriminant embedding in face recognitionOptical Engineering201049717[PubMed][Google Scholar]
- 28. Locality-based discriminant neighborhood embeddingThe Computer Journal201356910631082[Google Scholar]
- 29. Face recognition using discriminant locality preserving projections based on maximum margin criterionPattern Recognition2010431035723579[PubMed][Google Scholar]
- 30. Discriminant sparse neighborhood preserving embedding for face recognitionPattern Recognition201245828842893[PubMed][Google Scholar]
- 31. Null space discriminant locality preserving projections for face recognitionNeurocomputing20087116-1836443649[PubMed][Google Scholar]
- 32. Locality preserving discriminant projections for face and palmprint recognitionNeurocomputing20107313-1526962707[PubMed][Google Scholar]
- 33. Orthogonal laplacianfaces for face recognitionIEEE Transactions on Image Processing2006151136083614[PubMed][Google Scholar]
- 34. Optimal locality preserving projection for face recognitionNeurocomputing2011741839413945[PubMed][Google Scholar]
- 35. Comments on ‘globally maximizing, locally minimizing: unsupervised discriminant projection with application to face and palm biometrics’IEEE Transactions on Pattern Analysis and Machine Intelligence200830815031504[PubMed][Google Scholar]
- 36. Sparsity preserving projections with applications to face recognitionPattern Recognition2010431331341[PubMed][Google Scholar]
- 37. Face recognition using discriminant sparsity neighborhood preserving embeddingKnowledge-Based Systems201231119127[PubMed][Google Scholar]
- 38. Fast fisher sparsity preserving projectionsNeural Computing and Applications201323691705[Google Scholar]
- 39. Feature extraction based on maximum nearest subspace margin criterionNeural Processing Letters2013373355375[Google Scholar]
- 40. Locality and similarity preserving embedding for feature selectionNeurocomputing2013[Google Scholar]
- 41. Sparse representation classifier steered discriminative projection with applications to face recognitionIEEE Transactions on Neural Networks and Learning Systems201324710231035[PubMed][Google Scholar]
- 42. Supervised discriminant projection with its application to face recognitionNeural Processing Letters2011341112[PubMed][Google Scholar]
- 43. Efficient and robust feature extraction by maximum margin criterionIEEE Transactions on Pattern Analysis and Machine Intelligence200617111571165[Google Scholar]
- 44. The PolyU HRF databasehttp://www4.comp.polyu.edu.hk/~biometrics/HRF/HRF_old.htm
- 45. Top 10 algorithems in data miningKnowledge and Information Systems200814135[Google Scholar]
- 46. The PolyU FKP databasehttp://www.comp.polyu.edu.hk/~biometrics/FKP.htm
- 47. Ensemble of local and global information for fingerknuckle-print recognitionPattern Recognition201144919901998[PubMed][Google Scholar]
- 48. The AR face databasehttp://www2.ece.ohio-state.edu/~aleix/ARdatabase.html
- 49. UCI Machine Learning Repository2013Irvine, Calif, USAUniversity of California, School of Information and Computer Sciencehttp://archive.ics.uci.edu/ml
- 50. Distance Metric Learning: A Comprehensive Surveyhttp://www.cs.cmu.edu/~liuy/
- 51. A unified framework of subspace and distance metric learning for face recognitionAnalysis and Modeling of Faces and Gestures20074778250260
Lecture Notes in Computer Science [PubMed][Google Scholar] - 52. LPP solution schemes for use with face recognitionPattern Recognition2010431241654176[PubMed][Google Scholar]
- 53. Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction techniqueIEEE Transactions on Pattern Analysis and Machine Intelligence2007291221432156[PubMed][Google Scholar]
- 54. Neighbourhood components analysis
Proceedings of the Advances in Neural Information Processing Systems 2004 - 55. Distance metric learning for large margin nearest neighbor classificationJournal of Machine Learning Research200910207244[PubMed][Google Scholar]







