Current location - Training Enrollment Network - Mathematics courses - Theory: the principle of factor analysis
Theory: the principle of factor analysis
Overview of factor analysis:

Factor analysis is divided into Q type and R type. We study R type as follows:

1. Factor analysis steps:

1. Confirm whether it is suitable for factor analysis.

2. Construct factor variables

3. Interpretation of rotation method

4. Calculate the factor variable score

2. The calculation process of factor analysis:

1. Standardize raw data

Objective: To eliminate the difference of size and dimension.

2. Find the correlation matrix of standardized data

3. Find the eigenvalues and eigenvectors of the correlation matrix

4. Calculate the variance contribution rate and cumulative variance contribution rate.

5. Determine the factor

F 1, F2, F3 ... The total data (cumulative contribution rate) of the top m factors is not less than 80%. The first m factors can be used to reflect the initial evaluation.

6. Factor rotation

Choose this method when the obtained factors are unclear or difficult to understand.

7. Calculate the score of each factor through the linear combination of the original indicators.

Two methods: regression estimation and Balet estimation.

8. Comprehensive score: take the variance contribution rate of each factor as the weight, and get the comprehensive evaluation index function through the linear combination of each factor.

f =(λ 1f 1+…λmFm)/(λ 1+…λm)

=W 1F 1+…WmFm

9. Sorting scores

Detailed description of factor analysis:

Factor analysis model, also known as orthogonal factor model.

X=AF+?

These include:

X3 X2 x =[x 1...XP]'

A=

F=[F 1,F2...Fm]'

? =[? 1,? 2...? p]'

Meet the above requirements:

(1)m is less than or equal to p.

(2) Cover (f,) =0

(3)Var(F)=Im

d(? )=Var(? )=

? 1,? 2...? P is irrelevant, and the variance is different.

We take f as the common factor of x and a as the load matrix. Special factor of x

A= (joint activities)

Mathematically, it is proved that aij is the correlation coefficient between I variable and the j-th factor. See the definition of aij in AHP.

& lt 1 & gt; Load matrix

As far as the estimation and interpretation methods of load matrix are concerned, there are principal factor method and maximum likelihood estimation method. As far as principal factor analysis is concerned, it is (principal factor is not principal component).

Let the covariance matrix of random vector x be?

λ1,λ 2, λ 3 ...>0? Characteristic root of

μ 1, μ2, μ3 ... are the corresponding standard orthogonal vectors.

Our freshman major is line generation or high generation, and there is something called spectral analysis:

? =λ 1μ 1μ 1'+......+λpμpμp '

=

When there are as many factors as variables, the variance of special factors is 0.

At this point, the model is X=AF, where Var(F)=Ip.

Therefore, Var(X)=Var(AF)=AVar(F)A'=AA'

Contrast? In the decomposition formula, column J of A should be

That is to say, except for the front part of uj, the factor of column J is signed, which is the coefficient of the j-th principal component, so it is the principal component method.

If you have to think about death?

The original covariance matrix can be decomposed into:

? =AA'+D=

The purpose of the above analysis;

1. Is the factor analysis model a covariance matrix describing the original variable X? Model of

2. The coefficient corresponding to each principal component in principal component analysis is unique, while the coefficient corresponding to each factor in factor analysis is not unique, so our factor load matrix is not unique.

(Principal component analysis is a special case of factor analysis, much like. Those who are interested can go and have a look. The two are easily confused. )

& lt2>* * * Identity and Difference Contribution

No matter in spss or R's factor analysis, it is all about contribution. Let's see what this means.

According to the factor analysis model, when there is only one common factor f,

Var(Xi)=Var(aiF)+Var(? I)

Due to data standardization, the left end is 1, and the right end is * * * variance and personality variance respectively.

* * * The greater the variance of gender, the greater the role of * * * gender factors.

The square of the element in the I-th row in the factor load matrix A is denoted as hi2.

Become a variable (Xi)*** identity.

It is the contribution of common factors to variance lock of (Xi), which reflects the influence of all common factors on variable (Xi).

Hi2 indicates that the ith component is highly dependent on each component of f, F 1, F2, ... FM.

The square of each element in column J of factor load matrix A is denoted as gj2.

Become the variance contribution of common factor Fj to X.

Gj2 represents the sum of variance of each component Xi provided by j * * * factor Fj to X, and is an index to measure the relative importance line of male * * * factor. The greater gj2, the greater the contribution of common factor Fj to X, or the greater the influence and function on X. ..

If all gj2 of load matrix A are calculated and arranged according to size, the most influential common factor can be extracted.

& lt3> factor rotation

This aspect is relatively simple, so I will briefly mention it.

Objective: To establish a factor analysis model is not only to find the principal factor, but also to analyze the reality. Factor rotation is to make the conclusion clearer.

Methods: Orthogonal rotation and oblique rotation are two categories, and orthogonal rotation is commonly used.

Very easy to understand. Let me explain the meaning of rotation. Take the plane rectangular coordinate system as an example. The data we want are only points on y=x and y=-x, but we can explain points on x=0 and y=0. At this time, we can rotate the coordinate system without affecting the result.