Current location - Training Enrollment Network - Mathematics courses - Popularizing lecture notes-how to calculate statistical power
Popularizing lecture notes-how to calculate statistical power
Statistical efficacy is used to describe the probability that your statistical test correctly "rejects" your "zero hypothesis" (Ho) in your experimental research (the original English of probability here is likelihood, which translates into possibility, but it is more intuitive for individuals to use probability to describe what statistical efficacy is. In this concept, some terms are put forward, such as "zero hypothesis" and "statistical test", which will be explained one by one below.

"Zero hypothesis" should be a conservative conclusion that has been verified or accepted by most people. This is an assumption that most operations in the world are useless. For example, new drugs are useless for eggs; The new technology is useless. Seriously speaking, it is assumed that the effect of new drugs is not better than that of known old drugs; Assumption: The new process is not more effective than the current process. In short, all "zero hypotheses" are "natural eggs"-that is, eggs are useless. This is somewhat the shadow of the law of inertia. If you don't have enough influence, even if you think you have changed something with your own strength, it can't actually affect the overall situation. But how to prove whether you have influenced the overall situation requires statistical test-serious comparison.

"Statistical test" is simply based on the applicable statistical theory (for example, t test is used to compare the values of normal distribution; Compare percentages or counts, use chi-square, and so on. Many profound theories, really only mathematicians can make flowers with ease. Ordinary people just repeat the method according to literature and reference materials. ), compare your experimental group with the control group and see if the probability that your experimental group and the control group have different results is big enough to be recognized by the' world'-"Well, so many situations are different, then it is really different!" . The' ordinary people' here refers to the traditional and classic knowledge in this field. How much difference is' big enough'? According to the research scale, some fields think that P

Go back to how to calculate statistical power.

Compared with the "zero hypothesis", the results of statistical test will have four situations:

Namely:

(so-called type I error, one kind of error, alpha error);

(so-called type II error, type II error, beta error);

The conservative attitude in the statistical conclusion makes us tend to believe in the "zero hypothesis". After all, people are accepted by the broad masses of the people before they are verified to be wrong. Therefore, even if the "zero hypothesis" is wrong in essence, before it is overthrown, we temporarily believe in the "zero hypothesis" and do not believe in new ideas. So we are in two kinds of mistakes, "I would rather make two mistakes than fight for the first place."

It should be noted that although it is often expressed in 2x2 table, it seems to be a format for calculating sensitivity/specificity, PPV/NPV, but the Beta error probability is the probability that you accept zero hypothesis under the premise of hypothesis error. Similarly, the alpha error probability is the alpha probability that your test refuses to accept the zero hypothesis on the premise that the hypothesis is correct.

Here, Alpha+Beta is not necessarily equal to 1 (basically it is not equal to 1 except in special cases, because it is completely different), and it can be greater than or less than 1. This once puzzled me, and I always felt A+B.

With the above concepts, the essence of statistical efficacy is: 1-Beta!

It is the probability that you reject the "zero hypothesis" correctly when it is wrong; Equivalent to your test, under the probability of 1-Beta, you "correctly" rejected the zero hypothesis and correctly got the conclusion of "statistically significant difference"! It's that simple and rude! -It's impolite to say it, because even if you know all the above concepts, you won't count them under specific circumstances. Here is a simple example of how to calculate power:

The "average value of conventional blood copper concentration" in this question is equivalent to the zero hypothesis that you will compare: it is assumed that the average value of blood copper of normal people in all mankind should be like this, distributed within several corresponding standard deviations. Although no one can get the average blood copper concentration of all mankind. In addition, the calculation of this problem also involves:

3. 1 Z- test (please treat this kind of problem as Z- test for the time being, in principle, it will not be explained here for the time being);

3.2 Single-tailed test or double-tailed test (this involves whether the blood copper concentration is symmetrically distributed (sometimes only with two-way opening) or asymmetrically distributed (sometimes only with one-way opening): symmetrical double-tailed and asymmetrical single-tailed);

3.3 Selection of α: It is the probability of a certain deviation value in the overall distribution, generally at the tail end. The original hypothesis thinks that the deviation value is reasonable, and the judgment of the original hypothesis is correct, but when you test it, you think that the deviation value is unreasonable. (around? For example, if you see the blood copper concentration = 1.0, the null hypothesis says that this is the normal blood copper concentration range, but you say that this is abnormal-the null hypothesis is right, but you attribute the blood copper concentration = 1.0 to abnormality in the test. The probability of this situation is α, which is a mistake we should avoid. Therefore, we often set α to be very small, such as 0.05, which is a 5% probability, allowing us to make such mistakes. Moreover, if it is a one-tailed distribution, we allow all 5% probabilities to be at one tail; If it is a two-tailed distribution, we divide this 5% into two tails equally, that is, the probability that each tail allows this error of 2.5%. (often say p.

Therefore, the above problem, in the case of testing two tails, should be calculated as follows:

Power = P(Z>;; 1.96 ? (9.59 ? 8.72) / ( 1.3825/√4) ] + 1 ? p[Z & gt; ? 1.96 ? (9.59 ? 8.72) / ( 1.3825/√4) ]

Where p stands for probability, that is, when looking up the Z value table, Z is greater than 1.96? (9.59 ? 8.72)/( 1.3825/√4) corresponding probability, plus, 1 minus z value less than? 1.96 ? (9.59 ? The probability of 8.72)/( 1.3825/√4)-that is, the power of the problem under the two-tailed distribution.

Where 1.96 means that alpha is allowed to be 0.05, and when the data is normally distributed (2.5% on each side), it is the z value. (It is equivalent to a constant. Alpha is different, and it will be different when the distribution of single and double tails is different, but it is basically those numbers that can be remembered. )

So, after calculation and table look-up, the power above is = 0.24 15+ 1? 0.999356 = 0.242 1, that is, with the probability of 24.2 1%, it can be "correctly" judged whether the blood copper concentration of these four people is different from that of ordinary people. This power is very low. Therefore, it is necessary to increase the number of samples. How much has it increased? If you understand the above principle, set Alpha, know the average of zero hypothesis, standard deviation, the average of your current sample, your expected power (such as 80%), and calculate the number of samples in the past!

Limited ability, lack of time (it took me 3 hours to write such a thing! ), write these temporarily, please correct me. This is also the process of my own learning and progress. Kowtow!