1. Difficulty refers to the difficulty of the test paper and is an important index parameter to measure the quality of the test paper. Together with discrimination, it affects and determines the discrimination of test papers. It is generally believed that the difficulty index of the test questions is between 0.3 and 0.7, and the average difficulty of the whole test paper is preferably around 0.5, and there should not be too many questions above 0.7 and below 0.3. 1, two definitions of difficulty:
(1) p =1-x/w x is the average score of a question, and w is the perfect score of this question. In this definition, a small difficulty value means that the test questions are easy, and a large value means that the test questions are difficult. The minimum value is 0 and the maximum value is 1.
(2) In the definition of p = x/w, a small difficulty value means that the test questions are difficult, and a large value means that the test questions are easy, with a minimum value of 0 and a maximum value of 1.
2. The difficulty calculation:
(1) Subjective question difficulty
A basic formula: p = 1-x/w
B extreme grouping method P= 1—(XH+XL)∕2W XH: high grouping average (top 27%), XL: low grouping average (bottom 27%).
(2) the difficulty of objective test questions
A basic formula: p = 1-r/n r is the number of correct answers, and n is the total number.
B extreme grouping method: P= 1—(PH+PL)∕2.
PH=RH/n is called high grouping pass rate, RH: the number of people who get the right answer in high grouping, and n: the top 27% of the total number. PL=RL/n is called low packet pass rate, and RL: low packet correctness.
Second, discrimination is an index to distinguish the ability level of candidates. High discrimination can widen the score gap of candidates with different levels, so that those with high level can get high scores and those with low level can get low scores, while low discrimination can't reflect the level difference of different candidates. The discrimination of test questions is directly related to the difficulty of test questions. Generally speaking, questions with moderate difficulty are more differentiated. In addition, the discrimination of test questions is also closely related to the level of candidates. Only when the difficulty of the test questions is equal to or slightly lower than the actual ability of the candidates can the performance of its discrimination be fully displayed. Evaluation of discriminant index:-1.00≤D≤+ 1.00. The higher the discrimination index, the stronger the discrimination of the test questions. Generally speaking, if the discriminant index is higher than 0.3, the test questions are acceptable. 2. Calculation method of discrimination: basic formula method: d = (h-l) ÷ n (d stands for discrimination index, H stands for the number of people in the high group who answered the question correctly, L stands for the number of people in the low group who answered the question correctly, and N stands for a group of people, that is, the sum of the number of people in the high group and the low group). Extreme grouping method:
(1) subjective test: D = SH-SL ∕ N (WH-WL)
SH: total score of high score, SL: total score of low score, WH: the highest score of this question, WL: the lowest score of this question, and N is the number of people with high score (or low score), which accounts for 27% of the total number.
② objective questions: d = ph-pl, or d = RH-rl ∕ n.
(3) Generally, it can also be calculated by d = xh-XL ∕ x full. XH: the average score of a certain question in the high group, XL: the average score of a certain question in the low group, and X: the full score of the question.
Thirdly, reliability refers to the consistency or stability of measurement results. The greater the stability, the more reliable the evaluation results. On the contrary, if the same candidate uses a set of questions to test twice, the result is 80 points for the first time and 50 points for the second time, the reliability of the result is doubtful. Reliability is usually expressed by the correlation coefficient of two evaluation results. The correlation coefficient is 1, which shows that the evaluation tools such as test papers are completely reliable; A correlation coefficient of 0 indicates that the paper is completely unreliable. Generally speaking, the required reliability is above 0.7. 1. Methods of evaluating reliability: (1) retest method, (2) retest method-subtitle, (3) semi-test method, or: use retest reliability, retest reliability and internal consistency reliability to evaluate. Test-retest reliability refers to the correlation coefficient of the test results of the same group of candidates who have tested the same paper twice under the same conditions. Replica reliability refers to testing two or more parallel papers in conception, content, difficulty, type and quantity, and evaluating the correlation coefficient between the results. The reliability of internal consistency refers to the consistency between questions in the test paper, which is usually divided into two parts, and then the correlation coefficient between half of the test paper and the other half is calculated.
2. The reliability coefficient γxx=ST2∕SX2 ST2 is called the true score variance, and SX2 is the obtained score variance. The maximum reliability coefficient is 1, which means that the reliability of the test is high, and the minimum value is 0, which means that the reliability of the test is low. When γxx≥0.70, this test can be used for comparison between groups. When γxx≥0.85, this test can be used for comparison between individuals.
Validity Validity is the degree to which a test can test what it wants to test, that is, the degree to which the test results meet the test objectives. Any testing tool, no matter how good it is in other aspects, is worthless if its validity is too low and the test results are not what it wants to test (for example, testing students' mathematical ability with English papers). Due to the characteristics of psychological phenomenon itself, the effectiveness of evaluation is particularly important. Psychology is a kind of spiritual thing. At present, people can't directly observe it, but can only infer a person's psychological characteristics through his behavior pattern or his reaction to test items. For example, intelligence mainly depends on the individual's response to some problems and the result of right and wrong. Effectiveness is a relative concept, that is, there are only high and low levels of effectiveness, and there is no distinction between all effectiveness and all ineffectiveness. Validity can be divided into face validity, content validity, conceptual validity, predictive validity and * * * temporal validity.