(A) the shortcomings of the traditional theory of examination statistics
The universal existence of individual differences makes "teaching students in accordance with their aptitude" an ideal educational principle of pedagogy. Although in reality, it is easy for us to "... distinguish the genius who expresses unclear thoughts from the idiot who expresses clear thoughts-the former shows a deep understanding of science through calculation and conclusion, but it is impossible to' say what it looks like'; The latter seems to be full of appropriate words, but has no corresponding ability to use the ideas represented by these words; In other words, a good educator will soon judge the talent and potential of students with his years of teaching experience. However, it is often difficult to achieve the expected results with the help of formal testing. Because the current academic performance test and intelligence measurement mainly measure the explicit memory knowledge that students can consciously extract and the ability to use this knowledge by means of recognition or reproduction, it is difficult to measure the unconscious processing ability of students' implicit learning and implicit memory, but this implicit processing ability does exist and has a great influence on students' learning and the formation of basic psychological quality. This measurement of processing ability will help students to fully understand and teach students in accordance with their aptitude, because the purpose of evaluation is not to label students as good or bad, but to put children in a suitable educational environment, help students develop their strengths and avoid their weaknesses as much as possible, and pay attention to developing their implicit psychological potential while strengthening the training of explicit memory ability, and carry out comprehensive training in many directions and aspects.
Classical test theory analyzes the test results on the whole test level, ignoring individual differences and different item response modes, and confusing the different characteristics contained in the same test score. The research shows that the same number of correct reactions are probably the result of different reaction modes, and the differences of these reaction modes just reflect the real psychological characteristics or a certain psychological set. The latent feature theory and its development in modern measurement theory, namely item response theory, try to overcome this shortcoming and determine the relationship between the measurement results and those psychological features that cannot be directly observed and measured to some extent.
(B) the lack of specialized statistical analysis tools
As examination statistics is a comprehensive interdisciplinary subject integrating pedagogy, mathematical statistics and computer science, the statistical analysis software on the market currently faces all walks of life. When it is only used for educational statistics, there are many functional wastes and deficiencies, and the analysis results are too abstract to be explained to users in simple terms. Therefore, it is an urgent task to design a special examination statistical analysis tool.
(C) The rise of project response theory provides a new tool for data analysis.
In 1970s and 1980s, the most remarkable progress in measurement theory was the application of item response theory, which was an important measurement milestone after classical measurement theory. The reason why the item response theory is superior to the classical measurement theory is that it overcomes the limitation of "test score = ability" in the latter's analysis data, regards ability as a potential variable, and regards important parameters such as the difficulty and discrimination of the item as the inherent characteristics of the item itself, which has nothing to do with the tested group. At present, this theory is mainly used in many measurement fields, such as objectivity test, the establishment of question bank, the equivalence of different groups' ability responses in different tests, cross-cultural comparison and so on. In the analysis of talent evaluation data in developed countries, item response theory has become a conventional analysis tool.
Second, the research objectives and significance
Firstly, this paper introduces the application of traditional educational statistical methods to analyze the test paper from the macro level, and provides feedback information on the quality of the test paper and the overall level of students to teaching managers to help them improve their teaching work and decision-making. Secondly, aiming at the disadvantages of traditional educational statistical methods, this paper analyzes them from the micro level. Using the theory of item response, we attach importance to students' implicit learning and implicit memory, break through the limitation of "test = ability" through the differences of item response modes, reflect students' real psychological characteristics or certain psychological set, and make formative evaluation on the analysis results of test papers. The purpose of formative assessment (compared with summative assessment, formative assessment can provide more daily teaching information) is not only to diagnose and evaluate students' learning situation, but also to review and evaluate teaching contents and teaching methods.
Macro analysis of test paper
First, the macro test paper analysis of demand analysis
Testing can get a lot of information about teaching in a short time, saving time and effort, helping teaching managers to make decisions to improve teaching, get feedback information through the analysis of test papers, and understand the problems between teachers and students in teaching. Passing examinations and studying teaching measures in a planned way is an important basis for managers to improve teaching management, and it is also one of the important bases for managers to master teachers' teaching situation and give specific help, guidance and control.
Teaching administrators and subject teachers can grasp important information such as students' collective knowledge level and collective trend through macro analysis of test papers, and adjust teaching strategies and methods in time.
Second, the macro test paper analysis case
The main contents of this level of statistical analysis are: the overall distribution of test scores, average value, overall difficulty, coefficient of difference, skewness, standard deviation and its frequency and frequency distribution, the distribution of test difficulty and discrimination, the overall composition difference of test papers, test paper reliability, test paper structural validity and content validity.
This case analyzes the full range, standard deviation, median, frequency distribution, test paper difficulty, test paper reliability and test paper discrimination of SPSS (Social Science Statistics Software Package) used in the second semester of Class Two in Fudan Middle School in Shanghai. The analysis results are as follows.
(1) full range
Full scale is the difference between the maximum value and the minimum value in a set of data, which refers to the total difference between the two extreme values, usually represented by the symbol r:
(2. 1)
Full scale can be used to indicate the dispersion or difference of data. If the full range r is relatively large, it means that the test scores of candidates are quite different; If R is relatively small, it means that the test scores of candidates are relatively concentrated. In this case, if we can compare the average score of the test questions, it is easy to understand the degree of mastery of this knowledge point by all candidates.
According to the data in the table, the total distance of mathematics test paper is 77, which shows that students' test scores in this subject are quite different, while the average score of mathematics is 70.2708, which shows that the overall level is good but the poor students are too poor, which should be paid attention to. The overall distance between Chinese, history and politics is small, and the average score is high, which shows that the overall level is good and there is little difference between students. This also reflects the difference between science and liberal arts.
(2) Standard deviation
It represents the dispersion degree between the variable value and its average value, and is a numerical index reflecting the average situation of things' development and change. In the exam, it can be used to measure the degree of difference in students' grades [3], so as to have a general understanding of the degree of distinction in this exam. The calculation formula is:
(2.2)
Where s is the standard deviation; Is the observed value; Is the average; N is the number of observations. In general, it is appropriate to control the standard deviation of each exam between 9- 15. If the standard deviation is less than 8 points, it means that the distribution of scores is relatively concentrated, the discrimination of test papers is too small, and there are many moderately difficult questions; If the standard deviation is greater than 16, the results are too scattered.
It is appropriate to control the standard deviation of the test between 9- 15, so the test scores of mathematics and foreign languages are normally distributed. However, the scores of politics, Chinese, physics, chemistry and history are too concentrated, which shows that the discrimination of the test questions is not good enough.
(3) Median value
Middle school scores usually adopt a percentage system, so there is no obvious concentration trend in the distribution of test scores, so median measurement is adopted instead of unified measurement of modes. The formula is:
Intermediate position = (2.3)
Take the math test paper as an example, the mode is 60, and the median is 7 1. The most common test score is 60, and the intermediate score is 7 1. This shows that the difficulty of the test paper is moderate and slightly lower. The median foreign language test paper is 56.5, which shows that the test paper is difficult and the students' scores are generally low. The median of history papers is 90, which is relatively simple, and students generally score higher.
(4) Frequency distribution
In general, the test scores are close to normal distribution, but in the actual test, the test scores have the following four distribution modes (see figure). Reflect the different quality information of the test questions.
Figure 2. 1 Frequency Distribution Diagram
Among them, Figure A reflects that the difficulty distribution of test questions is normal; In Figure B, the normal distribution shows that there are more people with low scores, and the average score is low, indicating that the proportion of difficult questions is large; Negative skew distribution shows that there are many people with high scores, high average scores and a large proportion of low-difficulty questions; In Figure C, the frequency distribution of the peak shape shows that students' scores are concentrated near the average score, and medium-sized problems account for a large proportion; The average peak frequency distribution shows that students' grades are quite different, and the proportion of easy, medium and difficult problems is close. D chart reflects the concentration of high and low scores, with a large difficulty gradient and a small proportion of medium and difficult questions.
Taking the math test paper as an example, the score frequency distribution of 48 students in the class is as follows:
As can be seen from the figure, the frequency distribution of candidates' scores in the math test paper is negatively skewed. It shows that there are many people with high scores, the average score is high, and the difficulty of the topic is small. Most candidates scored between 60 and 80; 10~20 to 40~50 has a fault, that is, no candidate scored 20~40 points, indicating that poor students are too bad, so pay special attention.