Current location - Training Enrollment Network - Mathematics courses - How to do statistical analysis of differences?
How to do statistical analysis of differences?
Difference analysis 1, average description-average process

2, t test

3. Analysis of variance

Average description-average process

Definition: The average process is a process of calculating various basic descriptive statistics with SPSS. In fact, the average process is to calculate the average and standard deviation of sample groups according to the conditions specified by users, such as calculating the average and standard deviation of each group according to gender.

The calculation formula of the average process is:

The research question compares the average and variance of mathematics scores of students of different sexes. The data is displayed in the table.

Math report card

Sex numerology

male

female

99

88

79

54

59

Fifty-six

Eighty-nine

23

79

Eighty-nine

99

T-test is a hypothesis test, and its test statistic is t. Used to test the difference between two variables.

General steps of hypothesis testing:? According to practical problems, the original hypothesis H0 and the substitution hypothesis H 1 are put forward. ? The statistic T is selected as the test statistic, and the distribution of T is determined under the condition that H0 holds. ? Select the significance level, and look up the table according to the distribution of statistics T to determine the critical value and rejection domain of H0. ? Calculate the statistical value according to the sample value and compare it with the critical value. ? Conclusion: If the statistical value belongs to the rejection domain, H0 will be rejected; Otherwise, H0 will not refuse.

Significance level: 0.05- significant 0.00 1- extremely significant 0.000 1- extremely significant.

Types of t-test

Single Sample T Test —— Comparison between Sample Mean and Population Mean? Independent Two-Sample T-Test —— Comparison of Independent Two-Sample Mean? T-test of paired samples —— Comparison between the difference mean of paired design and the population mean of 0

Statistical definition and calculation formula of single sample t-test

Definition: SPSS single sample T test is to test whether there is a significant difference between the overall mean and the specified value of a variable. The premise of statistics is that the sample population obeys normal distribution. That is to say, a single sample itself cannot be compared, but its mean value is compared with the known population mean value.

The null hypothesis of single sample T test is that there is no significant difference between the mean value of H0 population and the specified test value. T-test method is adopted to calculate T-statistics according to the following formula:

Implementation process in SPSS

Analysis-comparison mean-single sample t test

Implementation process in SPSS

The research topic analyzes whether there is a significant difference between the math score of a class in the college entrance examination and the national average of 70 points. The data is displayed in the table.

Gender Table of Math Achievement Male 99 79 59 Math 89 79 89 99

female

88

54

Fifty-six

23

One-tailed test and two-tailed test

In the average test, researchers are often interested in comparing the differences between different averages, and put forward two different research hypotheses: the average is greater than, less than and unequal, forming two different modes: specific direction test or undirected test. When researchers only care about the comparison relationship in one direction (for example, boys' math score X 1 is better than girls' X2), the average test has only one rejection area, so one-tailed test is needed. Examples are as follows: the average math scores of boys and girls are respectively.

When the researcher has no specific direction setting (for example, the IQ of boys and girls is different), it is assumed that the test may occur at two extremes and two rejection zones must be set, so the two-tailed test is needed. For example:

One-tailed test

Learning 54 50 89 67 79 78 56 89 89 56

Figure 4-6 "independent-sample t test" dialog box

Figure 4-7 "Define Group" Dialog Box

Results and discussion

Statistical definition and calculation formula of t-test for two paired samples

Definition: The T-test of two paired samples is to infer from the sample data whether there are significant differences in the mean values of the two paired populations. Generally, it is used to compare the effects of two different treatments on the same research object (or two matched objects) and the effects of the same research object (or two matched objects) before and after treatment. The former infers the difference between the two effects, and the latter infers whether a certain treatment is effective. The prerequisite requirements for the T-test of two paired samples are as follows: two samples should be paired. In the application field, the main matching data include: age, gender, weight, disease and other non-processing factors are the same or similar data. The observation times of the first two samples are the same, and the observation order of the last two samples cannot be changed at will. ? The two populations from which the sample comes should obey the normal distribution.

Principle 1. Paired sample t-test is a t-test to compare the mean value of sample differences and the overall mean value of 0 in paired design. 2. T-test of paired samples is a T-test of paired data. The test method is to find out the differences between each pair of samples first, and then compare the relationship between the average value of sample differences and the overall average value of 0. If there is no difference between the two groups of data, the average value of sample difference should fluctuate around 0. Otherwise, there will be differences between the two groups of data. The essence of this method is to do a single sample t-test between the difference of paired samples and the population mean of 0.

There is no significant difference between the null hypothesis H0 of the T test of the two paired samples and the mean of the two populations.

◆ Note that the internal data sequence of single-sample T test and independent double-sample T test can be changed at will. However, the samples of paired sample T test must be one-to-one correspondence. The order of data in the sample cannot be interchanged at will.

SPSS will automatically calculate the value of t, because this statistic obeys N? The t distribution of 1 degree of freedom, SPSS will give the associated probability value corresponding to the t value according to the t distribution table. If the relevant probability value is less than or equal to the significance level envisaged by the user? Reject H0, and think that there is a significant difference between the two population averages. On the contrary, the correlation probability is greater than the significance level? If H0 is not excluded, it can be considered that there is no significant difference in the mean of the two populations.

Implementation process in SPSS

Analysis-comparison mean-paired sample t test

Variance analysis

Variance analysis can usually be used to test the difference significance of multiple independent samples.

Difference Analysis of Rape Variety P 164

Whether different teaching methods have a significant impact on students' grades; Whether there are significant differences in the scores of candidates in different regions.

Basic concepts of variance analysis

Variance analysis was invented by R.A.Fister to test the significance of differences between two or more samples. Variance analysis method has been widely used in various analytical studies in different fields. The research method based on variance is helpful to discover the inherent regularity of things. Due to various factors, the data obtained from the study are fluctuating and different. The causes of fluctuations can be divided into two categories: one is the influence of uncontrollable random factors and man-made.

Page 3/7

It is an uncontrollable influencing factor, called random variable; The other is the influence of artificially imposed controllable factors on the results, which is called control variables.

Variance analysis can be used to determine which of the above factors leads to the difference between sample data.

stochastic variable

random error

Uncontrollable

decision variable

systematic error

There is a fixed size and direction (positive or negative), which can be corrected or eliminated when repeated measurements are made.

The main purpose of variance analysis is as follows: 1. Find out the factors that have a significant impact on this thing through data analysis; 2. Study whether the interaction between various factors has an impact on this thing.

◆ Note: What are the applicable conditions of ANOVA? 1. The population from which the sample comes obeys normal distribution. ? 2. The sample variance must be homogeneous.

3. The samples are independent of each other.

Types of variance analysis

Single factor variance analysis

One-way ANOVA means that only one factor A's influence on the index X is considered separately. At this time, other factors are unchanged or controlled within a certain range. There are k levels of consideration A, and ni test is conducted at each level.

In the analysis of variance, the index representing the variation size and used to decompose the variation is the sum of the deviations from the mean square. The square of the total variance is recorded as SST, which is decomposed into two items: the first item is the sum of the mean square deviations of each group, representing the variance within the group (that is, the variance caused by random variables), which is called SSW (within the group); The second term is the sum of squares of the differences between the mean of each group and the total mean weighted by the sample content, representing the variation between groups (variation caused by control variables), which is called sum of squares between groups or SSB (between groups).

Total variation = random variation+variation caused by therapeutic factors Total variation = variation within group+variation between groups.

In this way, we can compare the size of intra-group variation and inter-group variation in a certain way. If the latter is much larger than the former, it shows that the influence of processing factors does exist. If the two are almost the same, it means that influence does not exist. This is the basic idea of variance analysis.

computing formula

SST=SSW+SSB

Where k is a horizontal number; Ni is the sample size of the ith level. It can be seen that the sum of squares of sample deviation between groups is the sum of squares of the average value of each level group and the total deviation, which reflects the influence of control variables. The sum of squares of intra-group deviation is the sum of squares of deviation between each data and the average value of this level group, which reflects the size of data sampling error.

F statistic is the ratio of the average sum of squares between groups to the average sum of squares within groups (the ratio of variance between groups to variance of error).

From the formula of F value, it can be seen that if the different levels of control variables have a significant impact on the observed variables, then the sum of squares of deviations between groups of observed variables is bound to be large, and the F value is relatively large; On the contrary, if the different levels of control variables have no significant influence on the observed variables, then the sum of squares of intra-group deviations has greater influence and F value is smaller.

Implementation process in SPSS

Analysis-Comparative Average-One-way ANOVA

Implementation process in SPSS

Consider a problem carefully.

Name: hhyajuyu shizhaswa et jeshuish2 _ new12 _ new22 _ new32 _ new42 _ new52 _ new62 _ new72 _ new82 _ new9Math 99.00 88.00 99.00 89.00 94.00.

Page 4/7

90.00 79.00 56.00 89.00 99.00 70.00 89.00 55.00 67.00 56.00 Group 00000222221165438.

Math scores of three groups of students

Implementation steps

Select the one-way ANOVA command from the menu.

One-way ANOVA Dialog Box

One-way ANOVA: Comparison Dialog Box

One-way ANOVA: Options dialog box

One-way ANOVA: Multiple Comparisons After the Event Dialog Box

Results and discussion

(1) Firstly, it is the premise test result of one-way ANOVA, that is, the homogeneity test of variance.

(2) The second table in the output result file is shown below.

(3) The third table in the output result file is shown below.

(4) The fourth table in the output result file is shown below.

(5) The last part of the output result is a line chart of the mean value of each group of observed variables, as shown in Figure 5-6.

In fact, the selection of LSD method for post comparison is a variation of T-test, and only the whole sample is used in the calculation of variation and degree of freedom.

Information, not just two sets of comparative information. Therefore, its sensitivity is the highest, and there is still the problem of amplifying the α level (an error), but in other words, the total second-class error is very small. If LSD can't detect the difference, I'm afraid there really is no difference. SNK method is the most widely used method, which uses the student range distribution to compare the average values of all groups. This method ensures that when H0 is really established, the total α level is equal to the actual set value, that is, a kind of error is controlled. Zhang Wentong P268

Statistical definition and calculation formula of multivariate analysis of variance

Multivariate analysis of variance is used to study whether two or more control variables have significant effects on observed variables. Multivariate analysis of variance can not only analyze the independent influence of multiple factors on observed variables, but also analyze whether the interaction of multiple control factors can have a significant impact on the distribution of observed variables, and finally find the optimal combination of observed variables.

Multivariate analysis of variance not only needs to analyze the influence of independent effects of multiple control variables on the observed variables, but also needs to analyze the influence of interaction of multiple control variables on the observed variables and the influence of other random variables on the results. Therefore, it needs to decompose the sum of squares of deviations of observed variables into three parts:? The sum of squares caused by the independent action of multiple control variables; The sum of squares of deviations caused by the interaction of multiple control variables; Sum of squares of deviations caused by other random factors.

The above f statistics obey the f distribution. SPSS will automatically calculate the F value and give the corresponding associated probability value according to the F distribution table.

Implementation process in SPSS

Analysis-Conventional Linear Model-Univariate

Implementation process in SPSS

Table 5-2

Consider a problem carefully.

Math scores of three groups of students of different sexes

Name: hhyajuyu shizhaswa et jeshuish2 _ new12 _ new22 _ new32 _ new42 _ new52 _ new62 _ new72 _ new82 _ new9Math 99.00 88.00 99.00 89.00 94.00 90.0. 0 79.00 56.00 89.00 99.00 70.00 89.00 55.00 50.00 67.00 56.00 Group 00002222211165438. Don't be a man, a woman, a man, a woman, a man, a woman, a man, a man, a woman, a man, a man, a man.

Implementation steps

Figure 5

Page 5/Page 7

-7 Select the Unique command in the menu.

Figure 5-8 "Unique" Dialog Box (1)

Figure 5-9 Unique: Options dialog box (1)

Figure 5- 10 "Unique: Multiple Comparison of Observed Average" Dialog Box

Figure 5- 1 1 Unique: Model Dialog Box

Figure 5- 12 "Unique: Profile" dialog box

Figure 5- 13 "Unique: Contrast" Dialog Box

Results and discussion

(1) The first part of the SPSS output result file is shown in the following table.

(2) The second part of the output result file is shown in the following table.

(3) The third part of the output result file is shown in the following table.

(4) The fourth part of the output result file is shown in the following table.

(5) The fifth part of the output result file is shown in the following table.

(6) The sixth part of the output result file is shown in the following table.

(7) The last part of the output result is the graph that controls whether there is interaction between variables.

Statistical definition and calculation formula of covariance analysis

Definition: Covariance analysis takes those factors that are difficult to control as covariates, and analyzes the influence of control variables on observed variables under the condition of excluding the influence of covariates, so as to evaluate control factors more accurately. This function can be realized by covariance analysis. Covariance excludes those random variables that are difficult to control from the analysis, and then analyzes the influence of the controlled variables on the observed variables, so as to realize the accurate evaluation of the controlled variables. Covariance analysis requires that covariates should be continuous values, and multiple covariates are independent of each other and have no interaction with control variables.

The control variables in the previous one-way ANOVA and multivariate ANOVA are qualitative variables. Covariance analysis includes qualitative variables (control variables) and quantitative variables (covariates).

The above f statistics obey the f distribution. SPSS will automatically calculate the F value and give the corresponding associated probability value according to the F distribution table. If the correlation probability of F control variable is less than or equal to the significance level, the different levels of control variable have significant influence on the observed variable; If the correlation probability of F covariate is less than or equal to the significance level, the different levels of covariate have significant influence on the observed variables.

5.4.2 Implementation process in SPSS

Analysis-Conventional Linear Model-Univariate

5.4.2 Implementation process in SPSS

Table 5-3

Name: hhyajuyu shizhaswa et jeshuish2 _ new12 _ new22 _ new32 _ new42 _ new52 _ new62 _ new72 _ new82 _ new9.

Consider a problem carefully.

Math scores of three groups of students

Mathematics 99.00 88.00 99.00 89.00 94.00 90.00 79.00 56.00 89.00 99.00 70.00 89.00 55.00 50.00 67.00 56.00 56.00 Admission score 98.00 89.00 78.00 78.00 78.00 88.00 89.00 0 87.00 76.00 56.00 76.00 89.00 89.00 99.00 89.00 78.00 89.00 Group 000222221165438.

Implementation steps

Figure 5- 15 Select the "Unique" command in the menu.

Figure 5- 16 "unique" dialog box (2)

Results and discussion

knot

Analysis of variance is used to test the significance of differences between two or more samples. The basic idea of variance analysis is to analyze the contribution of the variation of different variables to the total variation.

Page 6/Page 7

Size, determine the influence of control variables on the research variables. Through the analysis of variance, it is analyzed whether different levels of control variables have significant influence on the results. If the different levels of the control variable can have a significant impact on the results, then when it works with the random variable * *, it will have a significant change. One-way ANOVA solves the correlation problem between different levels under one factor; There are two or more control variables in multivariate analysis of variance, which is mainly used to analyze the effects of multiple control variables, the interaction of multiple control variables and whether other random variables have significant influence on the results; Covariance analysis takes those factors that are difficult to control as covariates, and analyzes the influence of control variables on observed variables without the influence of covariates, so as to evaluate control factors more accurately. One-way ANOVA is mainly realized by "One-way ANOVA" under the "Comparative Mean" menu of "Analysis"; Multivariate analysis of variance and covariance is realized in the "Univariate" submenu under the "General Linear Model" menu under "Analysis".