(1) Confirm whether the original variable to be analyzed is suitable for factor analysis;
(2) constructing factor variables;
(3) Using rotation method to make the factor variable explainable;
(4) Calculate the factor variable score of each sample.
2. Mathematical model of factor analysis
3. The main methods of factor analysis
Focusing on the core goal of extracting factors from original variables, factor analysis mainly involves the following five basic steps:
1, the premise of factor analysis
Because one of the main tasks of factor analysis is to concentrate the original variables, that is, to extract the overlapping information in the original variables and synthesize them into factors, and finally achieve the goal of reducing the number of variables. Therefore, it requires a strong correlation between the original variables. Otherwise, if the original variables are independent of each other, the degree of correlation is very low, there is no information overlap, and there is no same factor, so it is impossible to comprehensively concentrate, and there is no need for factor analysis. This step is to analyze whether the original variables are relevant and suitable for factor analysis through various methods.
SPSS provides four statistics to help judge whether the observed data is suitable for factor analysis:
(1) Calculate the correlation matrix.
Before the analysis steps such as extracting factors, the correlation matrix should be tested. If most of the correlation coefficients in the correlation matrix are less than 0.3, it is not suitable for co-factor analysis. When the number of original variables is large, the output correlation coefficient matrix is particularly large, which is not convenient to observe, so this method is generally not adopted or even if it is adopted, it is not convenient to give the original analysis report in the result report.
(2) Calculate the inverse image correlation matrix reflecting the image.
The reflection image matrix includes negative covariance and negative partial correlation coefficient. Partial correlation coefficient is the net correlation coefficient calculated by controlling the influence of other variables on two variables. If the original variables do have strong overlapping and transfer effects, that is, if the common factors can be extracted from the original variables, the partial correlation coefficient after controlling these effects will be very small.
The MSA (Measure of Sample Adequacy) statistic reflecting that the diagonal element of the image correlation matrix is a variable is mathematically defined as:
Observing the reflection image correlation matrix, if the absolute values of most elements in the reflection image correlation matrix are small except the main diagonal elements, the values of diagonal elements are closer to 1, which shows that these variables have strong correlation and are suitable for factor analysis. Like the last reason in (1), this method is rarely used.
(3) bartlett sphericity test.
The purpose of Bartlett spherical test is to test whether the correlation matrix is identity matrix, and if it is identity matrix, the factor model is not applicable. The nihilistic assumption of bartlett spherical test is that the correlation matrix is identity matrix. If this hypothesis cannot be rejected, it means that the data is not suitable for factor analysis. Generally speaking, the smaller the significance level value (
(4)KMO(Kaiser-Meyer-Oklin standard for measuring sampling adequacy)
KMO is the appropriate quantity for Kaiser-Meyer-holguin sampling. The higher the KMO measure value (close to 1.0), the more * * * identical factors among variables, and the research data is suitable for factor analysis. Generally, the index value is interpreted according to the following criteria: KMO value above 0.9 is very good, 0.8 ~ 0.9 is good, 0.7 ~ 0.8 is average, 0.6 ~ 0.7 is poor, and 0.5 ~ 0.6 is very poor. If the KMO measure value is less than 0.5, it means that the sample is too small and needs to be enlarged.
To sum up, the commonly used methods are bartlett sphericity test and KMO (Kaiser-Meyer-Oklin method to measure the adequacy of sampling).
2. The method of extracting * * * identity factors, determining the number of factors and finding the solution of factors.
It is the core content of factor analysis to synthesize the original variable into several factors. This step is to study how to extract and synthesize factors on the basis of sample data. There are principal component analysis, principal axis method, generalized least square method, unweighted least square method, maximum likelihood method, α factor extraction method and image factor extraction method. Principal component analysis and principal axis method are the most commonly used methods for users, among which principal component analysis is the most commonly used method. In the SPSS user manual, researchers are also advised to use principal component analysis to estimate factor load. The so-called principal component analysis is to explain a larger part of the variance of the original variable with fewer components. In principal component analysis, the numerical value of each variable should be converted into the standard value first. Principal component analysis is to form a multidimensional space with multiple variables, and then project a straight line in the space to explain the maximum variance. The straight line is the * * * same factor, which best represents the properties of each variable. A variable composed of the values on this straight line is the * * * same factor, or the first factor. However, there is still residual variance in space, so it is necessary to project a second straight line to explain the variance. At this time, it is necessary to follow the second criterion, that is, the second straight line of projection is orthogonal to the first straight line, which means it represents different aspects. A variable consisting of values on the second straight line is called the second factor. According to this principle, the third, fourth or more factors can be found.