Current location - Training Enrollment Network - Books and materials - Market investigation, evaluation and prediction methods of teaching material publishing industry
Market investigation, evaluation and prediction methods of teaching material publishing industry
Market investigation, evaluation and prediction methods of teaching material publishing industry

abstract

This topic is a design topic, which has the characteristics of large amount of data and information.

First of all, we analyzed the data in the database, ranked universities according to certain rules, adopted the method of systematic random sampling and regional correction, and narrowed the scope of market survey from 205 universities to 26 universities. A questionnaire survey was conducted in these 26 schools. We analyze the data obtained from the survey, and measure the market share of a publishing house by the ratio of the number of questionnaires related to the publishing house to the total number of questionnaires, so as to obtain the data table of the market share of each publishing house in three years, and make a prediction by using the fitting method.

Secondly, we also consider the special situation that may occur during the questionnaire sampling survey, that is, the sampling did not answer. We use two methods of secondary sampling survey to solve this problem, traditional method and Bayesian method, and briefly summarize and compare their similarities and differences. Both methods are helpful to reduce non-sampling error and improve data quality.

Thirdly, we also put forward a new understanding of the theory of printing on demand, which greatly reduced the cost of publishing houses. In the verification of data, we will compare the market share obtained by sampling survey with the market share obtained by overall analysis to find out the relative error. It is found that the error of most data is less than 0. 1, and the effect is good. In the model improvement, a new idea of improving the model is also put forward.

Finally, the topic is discussed and explained more completely, and opinions and suggestions are put forward to the publishing house on the basis of this topic.

Key words: systematic random sampling, regional correction and remedy, no answer, two-stage sampling survey method, printing on demand theory

I. Background of the problem

With the Central Committee of the Communist Party of China's 11th Five-Year Development Plan in the State Council, China's cultural industry has received unprecedented attention. At the same time, the Eleventh Five-Year Plan also announced that the publishing industry is facing unprecedented challenges. During the Eleventh Five-Year Plan period, the publishing industry will face the impact of the development of Internet, SMS and digital publishing on the publishing environment. Many publishing houses and distribution units have begun to think about and plan their future development, which is an important symbol of the rational return of the publishing industry. For publishing units, the greatest value of strategic planning lies in its process, in cultivating a systematic thinking and adaptability in the market economy environment, not just the result of planning. According to China's WTO commitment, 2006 was the last year for China's publishing industry to be fully liberalized. Deepening the system reform and coping with China's entry into WTO has become a top priority for the publishing industry. The industry attaches great importance to competitiveness as never before, and any research report, market survey and industry ranking will touch the sensitive nerves of publishing houses. Education and publishing have a great influence on the competitiveness of publishing houses, and management has become the most important means to improve competitiveness, forming a relatively stable competitive advantage. Therefore, the textbook publishing industry, which occupies a dominant position in the publishing industry, pays more attention to the investigation and study of the market and makes scientific evaluation and prediction of the market. What we need is scientific investigation, evaluation and prediction methods.

Second, the restatement of the problem.

A publishing house publishes a variety of textbooks for higher education and vocational education. Considering the strategic development, investment strategy, production arrangement, sales model and product planning of publishing houses, it is necessary to examine the market share and its changes year by year. Please design an effective and feasible survey method, establish an analysis model of survey data, and make a scientific evaluation and forecast of the market.

The basic data given in the appendix of this question is the questionnaire survey data. Due to the limitation of sampling cost, the general survey is not advisable, and the number of samples should be as small as possible on the basis of the purpose of investigation.

Title description

1. Because of the sampling cost, the general survey is not desirable, but there is a balance between the number of samples in the sampling method and the survey benefit. You should consider this balance when determining the number of samples.

2. Describe your investigation method completely and give your simulation data clearly. If questionnaire sampling survey (not limited to questionnaire survey) is adopted, please give the questionnaire format.

3. Give the market evaluation and prediction model based on the survey data, and use the data to illustrate the effectiveness and scientificity of your method.

4. The format of the reference questionnaire is given in appendix 1, and the database related to this questionnaire is also given (appendix 4). This database contains simulated answer sheets (including three years) of all students in ten provinces (in order to reduce the amount of data, it is assumed that all students in the class fill in the same form, so only one student in each major fills in the questionnaire) as the background data of this competition question.

5. Appendix 2 gives the names and classification numbers of 29 kinds of teaching materials in the database provided by this subject, and Appendix 3 gives the three-year sales of various teaching materials of a publishing house for inquiry.

6. Appendix 4 also gives the names of all schools in ten provinces and their professional names. You can use these search terms to determine your sampling query for the database.

7. If you choose your own data, please give the test method and data source of the reliability and rationality of the survey data.

Third, the basic assumptions

In order to facilitate our overall understanding of the subject, this paper investigates the market share (market share) of publishing houses and its changes year by year from the perspectives of strategic development, investment strategy, production arrangement, sales methods and product planning. So as to design an effective and feasible survey method, establish an analysis model of survey data, and make a scientific evaluation and forecast of the market. We make reasonable basic assumptions here:

(1) The market of textbook publishing industry is basically stable for a long time;

(two) the social environment is stable, and there is no major adjustment to the publishing industry in social policies;

(3) There is no big change in the application of teaching materials by the education department;

(4) There are no major changes in the school, and the number of students in the school will not increase or decrease greatly;

(5) The number of textbook publishers is basically stable, with little fluctuation in operating conditions;

(6) The publishing house is in good operating condition and has no accidents;

(7) The quality of textbooks published by textbook publishing house is guaranteed, and there is no quality problem;

Fourth, the analysis of the problem and the preparation of the model.

First, the problem analysis

According to the data in the appendix to this question, there are:

(1) See Appendix1for the format of the reference questionnaire;

(2) Appendix 2 gives the names and classification numbers of 29 kinds of textbooks in the database provided for this topic;

(3) Annex 3 gives the three-year sales of various teaching materials of a publishing house for reference;

(4) Appendix 4 gives the original data of ten provinces for three years and the corresponding school professional catalogue;

(5) You can also choose your own data.

Moreover, the basic data required by the topic is questionnaire survey data. Due to the limitation of sampling cost, general survey is not advisable. On the basis of achieving the purpose of investigation, the number of samples should be as small as possible. Therefore, we should consider the following aspects, process data and solve problems:

First of all, we analyzed the data in the database, ranked universities according to certain rules, adopted the method of systematic random sampling and regional correction, and narrowed the scope of market survey from 205 universities to 26 universities. A questionnaire survey was conducted in these 26 schools. We analyze the data obtained from the survey, and measure the market share of a publishing house by the ratio of the number of questionnaires related to the publishing house to the total number of questionnaires, so as to obtain the data table of the market share of each publishing house in three years, and make a prediction by using the fitting method.

Secondly, we also consider the special situation that may occur during the questionnaire sampling survey, that is, the sampling did not answer. We use two methods of secondary sampling survey to solve this problem, traditional method and Bayesian method, and briefly summarize and compare their similarities and differences. Both methods are helpful to reduce non-sampling error and improve data quality. It can be said that the application of a very good remedy scheme in the model makes our model more complete, effective and scientific.

Thirdly, we also put forward a new understanding of the theory of printing on demand, which greatly reduced the cost of publishing houses. In the sensitivity analysis, we also use the data of a publishing house to verify the model and extend the model to the general situation.

Finally, the topic is discussed and explained more completely, and opinions and suggestions are put forward to the publishing house on the basis of this topic.

Second, the preparation of the model (noun explanation)

1, data compression

Data compression is a technology that simplifies or compresses data to improve its transmission, storage and processing efficiency on the premise of minimum information loss. Considering the huge amount of data, which needs to be obtained and processed, data compression is an effective method to reduce workload and save computer time. It can be achieved by removing intervals, blank segments, redundant items or unnecessary data, and only retaining data reflecting characteristics, thus increasing the amount of data that can be stored in a given space and reducing the space occupied by the amount of data. Data compression methods usually include: ① restoration. Calculate redundant data by extrapolation or interpolation and remove it; ② Parameter extraction. That is, only feature data and parameters are reserved; ③ Isochronous sampling. Sampling continuously input data at equal time intervals; ④ Code conversion. The efficiency of converting data into simplified codes or encoding each data block is measured by the number of bits per pixel; ⑤ Functional application. According to the necessary sampling points obtained by equal or unequal interval sampling, the reduced data are calculated by function algorithm.

2. No answer

The so-called no answer means that for some reason, it is impossible to obtain the required information from all the sample units or all the questions in the questionnaire. It may be that the sample unit did not provide or did not fully provide the required information, or some of the information provided could not be used. Investigators who don't answer usually have different characteristics from respondents, so if they don't correct any answers, the validity and representativeness of the sample will be reduced, and the estimated value of the survey will be biased, thus reducing the accuracy of the survey and even leading to the failure of the whole survey.

3. Secondary sampling method

Two-level sampling method is a common method to deal with unanswered questions. The basic idea is to randomly sample the initial unanswered questions again, and then estimate the population with the answer data and sub-sample data of the initial sample, thus eliminating the bias influence of the unanswered questions and improving the accuracy of the estimator. This method is often used in mail investigation. Next, we will introduce two methods of secondary sampling survey to remedy non-response: traditional method and Bayesian method, and briefly summarize their similarities and differences.

4. Printing on demand

The original intention of printing on demand (POD) is to realize a brand-new change in the whole process of publishing industry through digital and ultra-high-speed printing technology according to the needs of different time, place, quantity and content, so as to meet the modern market demand of personalization, short edition and high efficiency. It is especially suitable for some printing businesses with narrow positioning, strong professionalism, strong variability and small batch days. Printing on demand is the product of the combination of advanced database technology and digital printing technology. The operation flow is to digitize the contents of the book first, and then print the pages with electronic files on a special laser printer at high speed to complete the processes of folding, assembling and binding. It has the characteristics of immediacy of printing time, variability and personalization of printing quantity and content.

Here we use its extended meaning to print according to the number of textbooks needed by the market. Due to the different types of demand for teaching materials in colleges and universities, such a model is needed to meet the needs of customers. Therefore, we propose the print on demand model here.

At the same time, printing on demand adopts the way of instant supply and instant checkout, which saves the book storage space for the publishing house. Realize "zero inventory" and solve the problems of out-of-print and print run of books. By printing on demand, publishers can get rid of the capital risk and circulation pressure brought by book printing, inventory, transportation and investment, and save costs.

Establishment and prediction of verb (verb's abbreviation) model

(1) Description of sampling survey method

Our sampling survey can be divided into three stages, namely "sample-data-analysis". The preliminary work is to carry out sampling design, obtain the list of investigation units and solve the problem of who collects statistical data (samples); The mid-term work is to carry out statistical investigation on the selected investigation units, obtain statistical data and make necessary arrangement of the data, provide accurate data for statistical analysis, and solve the problems of data acquisition and data format; The later work is to use statistical software to make statistical analysis of the data obtained from the investigation, draw scientific analysis conclusions, and achieve the ultimate goal of statistical work (analysis). The three complement each other and are indispensable.

In the actual sampling survey, the problems considered in the sampling survey include the following three aspects: (1) the accuracy of survey index estimation; Investigate the cost level; The capacity of the sample. The above three aspects are contradictory in the design of sampling survey scheme. Therefore, the importance of the above three aspects should be ranked according to the actual situation when designing the sampling survey scheme. Generally speaking, the accuracy of survey index estimation is the most important, followed by survey cost and finally sample size.

Therefore, when designing the sampling survey scheme, three issues should be given priority:

(I is the priority)

The following are the steps we summarized in the sample survey:

5. 1, the purpose of sampling

The purpose of sampling is to select representative data from the existing census database, that is, data compression. Through data compression, the data within the allowable error range can be obtained, so as to scientifically evaluate and predict the market.

5.2, the basic principles of sampling

In order to grasp the market situation, the data collected should be comprehensive and representative, which is the basic principle of sampling.

5.3, commonly used sampling methods (briefly)

Sampling methods can be divided into probabilistic sampling and non-probabilistic sampling. Because the possibility of non-probabilistic sampling results cannot be accurately measured, probabilistic sampling method is generally used. Simple random sampling, stratified random sampling and systematic random sampling are all probability sampling methods.

1. Simple random sampling

Simple random sampling means "extracting n individuals from a population containing n individuals, so that all possible combinations containing n individuals are equally likely to be extracted". When sampling with this method, every data in the database has an equal chance to be sampled, which is a random sampling method with no subjective restrictions at all. It is a basic random sampling method and the basis of other random sampling methods.

2. Stratified random sampling

Sometimes products can be divided into several layers, and the quality of each layer is obviously different. In order to obtain representative samples, the whole batch of products is divided into several layers, so that the quality of products in the same layer is as uniform and tidy as possible, and some products are randomly selected from each layer to form a sample. This sampling method is called stratified random sampling. On the premise of correct stratification, stratified sampling is better than simple random sampling, but if the distribution of batch quality is not understood or stratified incorrectly, the effect of stratified sampling will be counterproductive.

3. Systematic random sampling

If a batch of products can be arranged in a certain order and divided into n equal parts, a simple random sampling method is used to extract a unit product from each part at the same position to form a sample. This sampling method is called systematic random sampling. Its representativeness is generally better than that of simple random sampling, but when the fluctuation period of product quality is equivalent to the sampling interval, the sampled units may all be products with good quality or poor quality, and the representativeness is poor at this time.

5.4, the correct choice of sampling method

In order to better evaluate and forecast the market, it is very important to choose an appropriate sampling method.

1. When the data is relatively stable and there are few data, choose simple random sampling. If the overall quality is unknown, choose simple random sampling.

2. When different data come from different regions, stratified random sampling can be used to obtain representative samples.

3. When the data is unstable and there is a big difference, systematic random sampling should be adopted.

According to the actual situation, it is very important to choose the correct and reasonable sampling method to improve the representativeness and randomness of the samples, so as to improve the effectiveness of sampling. Only by sampling scientifically, reasonably and effectively can we achieve our requirements.

Therefore, in view of the characteristics of this subject, such as large data volume, wide geographical area and great differences in the demand for various books, three sampling methods are comprehensively considered, and the sampling problem is solved by using data compression with systematic sampling as the main method and other sampling as the supplement.

At the same time, we also applied Excel software. Although the function of Excel software in statistical analysis is not as good as that of professional statistical software (such as SPSS, SAS, etc. ), its powerful, flexible and easy-to-use data management and sorting functions are beyond the reach of professional statistical software. So we use Excel software more in data processing, which greatly speeds up our data processing.

The sampling questionnaire adopts the format given in the question.

(2) Establishment of the model

First, we look up some useful data from the database and write it down first.

1.29 disciplines

The number of publishing houses is 25.

3. There are 205 universities, and the corresponding number of universities in each province and city is as follows (table 1):

Beijing Guangdong Hebei Anhui Henan Fujian Guangxi Gansu Guizhou Hainan

49 30 26 24 24 14 13 12 9 4

(Table 1)

4. Based on all the data in the database, the number of questionnaires related to publishing houses in the number of questionnaires recovered each year is shown in Table 2:

Number of related questionnaires from publishing house in the first year, number of related questionnaires from publishing house in the second year and number of related questionnaires from publishing house in the third year.

p 196 325 323 327

p559 328 336 346

p 106 353 352 35 1

p 199 380 379 393

p307 406 4 1 1 4 18

p 102 444 45 1 452

p 13 1 476 475 472

p5 1 1 490 495 503

p030 497 503 5 12

p063 506 508 5 15

p4 16 640 637 635

p304 654 66 1 666

p 1 10 747 754 764

p246 773 778 78 1

p432 87 1 870 868

p 09 1 9 10 9 13 9 13

p 1 18 1002 10 15 103 1

p 2 10 1308 13 1 1 1308

p044 1606 1604 1602

p390 204 1 2035 2025

p405 3098 3 162 3227

p534 402 1 400 1 3983

p293 5095 4947 4767

p 1 15 18267 18 1 16 17967

p357 20490 20646 208 12

(Table 2)

Analysis: In question 4, the significance of the questionnaire is explained. The name of a publishing house appears on a questionnaire, explaining the whole major of the students who fill in the questionnaire and using the textbooks of this publishing house. We can assume that on the whole, the average number of majors is the same, so the greater the number of questionnaires corresponding to a publishing house in a certain year, the more people buy books in the publishing house, indicating that the publishing house has a larger market share.

Based on all the data in the database, we can investigate the book purchasing situation of each university every year. We can count the number of students who fill out the questionnaire in each university. According to our hypothesis, the more questionnaires there are, the greater the demand for books in schools.

We rank in descending order according to the number of questionnaires corresponding to different universities. At the same time, it is found that the number of questionnaires has not changed in three years, so the ranking of universities will not change. Therefore, the number of questionnaires in three years can be considered according to the situation in the first year.

Because it is 205 universities, there are still many ranking data we have analyzed, which are omitted here.

Let's analyze the number of universities in each questionnaire section: see Table 3.

Number of questionnaires filled out by universities

700~799 1

600~699 7

500~599 34

400~499 34

300~399 39

200~299 38

100~ 199 20

0~99 32

(Table 3)

Accordingly, we take systematic random sampling and divide 205 universities into 26 parts according to the descending number of questionnaires (according to the ranking, every 8 schools are a part, the first 25 parts are full and the 26th part is vacant). For each part, the same position is determined by simple random sampling (the first position of each part is selected in this question).

Let's examine whether the scheme we have determined now conforms to the "comprehensiveness, regionality and representativeness" of the data.

See (Table 4)

Provinces Beijing Guangdong Hebei Anhui Henan Fujian Guangxi Gansu Guizhou Hainan

* * * There are 49 30 26 24 2414131294 universities.

Select the number of universities 7 4351306 5438+011.

(Table 4)

As can be seen from the table, individual data can not well reflect the region and representativeness. We need to fine-tune the results of random sampling in the system. For example, we can replace a school in Anhui province with a school in Guangxi province with a lower ranking.

Finally, we determined the number of schools as follows (Table 5):

Provinces Beijing Guangdong Hebei Anhui Henan Fujian Guangxi Gansu Guizhou Hainan

* * * There are 49 30 26 24 2414131294 universities.

After adjustment, the number of selected universities is 6433322111.

(Table 5)

Finally, we selected a specific university, and the results are as follows (Table 6):

Zhengzhou University Fuzhou University Guangxi Institute for Nationalities Beijing Institute of Technology China Agricultural University Guizhou Institute of Technology Hebei Agricultural University Beijing Union University

Henan, Fujian, Guangxi, Beijing, Beijing, Guizhou, Hebei and Beijing.

744 597 552 542 5 19 507 470 45 1

Continue to go to the table

Maoming College of Guangxi Normal University Hainan University zhengzhou institute of aeronautical industry management Beijing Broadcasting Institute Shangqiu Normal College Zhanjiang Normal College Beijing Institute of Mechanical Industry

Hainan, Guangxi, Guangdong, Henan, Beijing, Henan, Guangdong and Beijing.

4 14 405 385 369 342 305 30 1 276

Continue to go to the table

Hexi College of Tangshan Normal University Zhong Kai Agricultural Technology College Chaohu College Fujian Medical University Beijing Institute of Electronic Technology Anhui Medical University Anhui College of Traditional Chinese Medicine

Hebei Province Gansu Province Guangdong Province Anhui Province Fujian Province Beijing Anhui Province Anhui Province

253 239 208 204 138 103 77 68

Continue to go to the table

Chinese People's Armed Police Force College Guangzhou Institute of Physical Education

Guangdong Province, Hebei Province

35 32

(Table 6)

In order to summarize our survey methods, we selected some schools from 205 schools as representatives and accepted the questionnaire survey. The selection method of the school is: the whole school adopts the method of systematic random sampling, but in order to ensure "comprehensiveness, regionality and representativeness", the sampling method has been revised. We selected the above 26 colleges and universities as the objects of issuing questionnaires.

Next, according to the above survey method, we find out the number of questionnaires of 26 universities corresponding to each publishing house in the first year from the database. We use Excel software to process and get the sum of the number of questionnaires from 26 universities in the first year.

The corresponding table is as follows (Table 7):

Press p559p199p102p106p196p307p030p131p511p060.

Number of questionnaires 38 45 47 47 49 55 59 64 67 68 72 85

Continue to go to the table

p 1 10 p 09 1 p 18 P432 p 2 10 p 044 P390 P405 P534 P293 p 16 5438+05 P357

88 99 1 18 1 19 1 19 160 204 266 378 5 15 63 1 2507 2657

(Table 7)

Using Matlab software, we also show that the ratio of the number of questionnaires corresponding to each publishing house to the total number of questionnaires in the first year is as follows (Table 8):

Press p559p199p102p106p196p307p030p131p511p060.

The questionnaire ratio is 0.0044 0.0053 0.0055 0.0055 0.0057 0.0064 0.0069 0.0075 0.0078 0.0079 0.0084 0.0099.

Continue to go to the table

p 1 10 p 09 1 p 18 P432 p 2 10 p 044 P390 P405 P534 P293 p 16 5438+05 P357

0.0 103 0.0 1 16 0.0 138 0.0 139 0.0 139 0.0 187 0.0238 0.03 1 1 0.0442 0.0602 0.0737 0.2930 0.3 105

(Table 8)

Based on the original assumptions and explanations, we can use it to examine market share. Correspondingly, the ratio of the number of questionnaires corresponding to each publishing house to the total number of questionnaires in the second and third years can also be obtained.

The results of the second year are as follows, as shown in Table 9.

Press p559p199p106p102p196p307p304p131030p5165438.

The proportion of the questionnaire is 0.0044 0.0051.0055 0.0058 0.0061.0065 0.0069 0.0072 0.0076 0.0083 0.0098 0.05438+005.

Continue to go to the table

p063 P246 P432 p 1 18 p 09 1p 2 10 p044 P390 P405 P534 P293 p 1 15 P357

0.0 107 0.0 1 18 0.0 138 0.0 14 1 0.0 139 0.0 185 0.0259 0.03 1 0.0447 0.0595 0.07 1 1 0.2886 0.3 123

(Table 9)

Sixth, the error analysis of the model.

Using Matlab software, based on all the data in the database, we can get the ratio of the number of questionnaires corresponding to each publishing house to the total number of questionnaires.

For our sampling method, taking the first year as an example, we can calculate the relative error between the two ratios to test our method.

See table 10 for the table.

P 196

p559 p 106 P 199

p307 p 102 p 13 1 p 5 1 1 p030 p063 p 4 16 p304

Standard 0.0049 0.0050 0.0054 0.0058 0.0062 0.0068 0.0072 0.0075 0.0076 0.0077 0.0097 0.05438+000.

Measure 0.0057 0.0044 0.0055 0.0053 0.0064 0.0055 0.0078 0.0079 0.0075 0.0084 0.0099 0.0069.

Error:16.3%12.0%1.8% 8.6% 3.2%1%8.3% 5.3%1.3% 9.654338+0% 2.

Continue to go to the table

p 1 10 p246 p432 p 09 1 p 18 p 2 10 p 044 p390 p405 p534 p293 p 1 15 p357

0.0 1 14 0.0 1 18 0.0 133 0.0 138 0.0 152 0.0 199 0.0244 0.03 1 1 0.047 1 0.06 12 0.0775 0.2779 0.3 1 17

0.0 103 0.0 1 16 0.0 139 0.0 138 0.0 139 0.0 187 0.0238 0.03 1 1 0.0442 0.0602 0.0737 0.2930 0.3 105

9.6% 1.7% 4.5% 0 8.6% 6% 2.5% 0 6.2% 1.6% 4.9% 5.4% 0.4%

(Table 10)

As can be seen from the above table, except for a few groups, the relative errors of other groups are less than 0. 1. It can be seen that our method is better within the precision requirements.

Seven. Verification and improvement of the model

From the publishing house's point of view, we put forward the theory of printing on demand in printing. In order to verify the validity and scientificity of the model, we counted the book sales of three years 10 provinces and cities by region, and then used Excel software to process the data and make an intuitive chart to verify the choice of 26 universities.

Books sold by a publishing house in 10 provinces and cities for three years;

1. See table 1 1 for the table:

Beijing, Guangdong, Henan, Hebei, Anhui, Fujian, Gansu, Guangxi, Guizhou and Hainan

The first year is 3934 2837 2544 23741998140013610/7844 29018374.

In the second year, 3902 2809 2533 2354198013961331017838 28218244.

Third year: 3870 2773 2504 2315196313781221009 830 28318047.

Total117068419 75817043 59414174 339130432512 825.

The average value is 3902 2807 2527 2348198193211015838 28518222.

Rank 1 2 3 4 5 6 7 8 9 10

(Table 1 1)

We can see from the table that:

(1) 10 in all provinces and cities, the sales volume of Chinese books in all provinces and cities is basically the same within three years, and the annual number is not much different;

(2) Within three years, the total sales volume of books in 10 provinces and cities is basically the same, with an average annual sales volume of18222;

(3) Through the comparison of three years, we can get the annual sales situation and ranking of 10 provinces and cities.

2. The drawings are as follows:

(Figure 1) is a natural bar chart of book sales in various provinces and cities in the past three years.

(Figure 1)

(Figure 2) Pie chart of book sales in various provinces and cities in three years (10).

(Figure 2)

Through drawing, we can see that:

(1) Beijing sold the most books in three years, reaching 21.4%; Guangdong province followed closely, reaching15.4%; Henan province followed closely, reaching13.9%; The sales of books in their three provinces and cities have reached 50.7%, more than half.

(2) The book sales of Beijing, Guangdong, Henan, Hebei, Anhui and Fujian all exceeded 10%, and the total sales of the five provinces reached 74.5%.

According to the appendix, we can find out the percentage of provinces where P 1 15 publishing house is located according to our model, as shown in table 12.

Provinces Beijing Guangdong Hebei Anhui Henan Fujian Guangxi Gansu Guizhou Hainan

The percentage is 25.4%10.2%10.2% 4.7%18.1%7.9%10.4% 3.3% 4.5% 5.4%.

(Table 12)

It can be seen that compared with the pie chart, the data of some provinces are different.

On this basis, the improvement direction of the model is found. When choosing universities, we should not only carry out systematic random sampling and regional correction, but also make the distribution of schools in each province relatively balanced. The result obtained in this way must be better.

Eight. Remedial scheme of model

Here, we will introduce two methods to remedy the non-response: traditional method and Bayesian method, and briefly summarize their similarities and differences.

First, the traditional two-stage sampling method.

The traditional two-level sampling method was first proposed by Hansen and Hurwitz. Based on the traditional statistical inference, it samples the population for the first time by simple random sampling method, and obtains the estimation of the observed value and the weight of the answering unit, and then randomly selects a sub-sample from the non-answering unit to investigate and get the information of the non-answering unit. Finally, the population is inferred and estimated by combining the survey results of two parts. This method is actually stratified two-level sampling, which is divided into two layers, namely, the answer layer and the non-answer layer.

Let the total capacity be n, including respondents and non-respondents, and randomly select the initial sample, with the sample size of, including respondents, and the sample average value of; There is a non-responder. Then a sub-sample is randomly selected from the sample, and its sampling ratio and sample average value are. Overall response rate, overall non-response rate is,. Then the estimator of the subsampling population mean is: (1)

According to the sampling variance formula of secondary sampling, we can get:

(2)

Where is the variance of the sample and the variance of the nonreactive layer.

It can be seen that the first term of population variance is affected by the initial sample size, while the second term is not only affected by the sample proportion of non-respondents. At this time, the second term of variance is zero, which means that all the investigators have answered and collected all the data, which is equivalent to a simple random sampling with a sample size of.

Considering that the term in the variance formula above has nothing to do with the sample distribution, it is rewritten as:

(3)

If the cost difference between the two surveys in the second sampling is considered, the total cost function can be written as:

(4)

Where is the unit survey cost of the initial sample, which is