The Silk Road was a channel for cultural exchanges between China and the West in ancient times, in which glass was a valuable material evidence of early trade. Early glass was often made into beaded ornaments in West Asia and Egypt, and was introduced to China. After absorbing its technology, China ancient glass used local materials, so it was similar to foreign glass products in appearance, but its chemical composition was different.
The main raw material of glass is quartz sand, and the main chemical component is silicon dioxide (SiO2). Due to the high melting point of pure quartz sand, in order to reduce the melting temperature, flux should be added during refining. In ancient times, plant ash, trona, saltpeter and lead ore were commonly used, and limestone was added as a stabilizer, which was converted into calcium oxide (CaO) after calcination.
The main chemical composition of the added flux is different. For example, lead-barium glass is added with lead ore as a flux in the firing process, and the contents of lead oxide (PbO) and barium oxide (BaO) are high, so it is usually considered as a glass variety invented by China himself. The glass of Chu culture is mainly lead-barium glass.
Potassium glass is mainly popular in Lingnan, Southeast Asia and Indian regions, with plant ash and other high-potassium substances as flux.
There are a lot of information about ancient glass products in China. Archaeologists divide these cultural relics into high-potassium glass and lead-barium glass according to their chemical composition and other detection methods. The classification information of these cultural relics is given, and the proportion of the corresponding main components is given (spaces indicate that this component has not been detected).
The characteristics of these data are compositional, that is, the cumulative sum of each component proportion should be 100%, but due to the detection method and other reasons, the cumulative sum of its component proportion may not be 100%. In this question, the data with the cumulative sum of component proportions between 85%~ 105% are regarded as valid data. Please analyze and model your team according to the relevant data in the attachment to solve the following problems:
Question 1: analyze the relationship between the surface weathering of these glass relics and their glass types, patterns and colors; Combined with the types of glass, the statistical law of weathering chemical composition content on the surface of cultural relics samples is analyzed, and the chemical composition content before weathering is predicted according to the weathering point detection data.
Solution: (1) This problem is a typical statistical analysis problem. Surface weathering, its glass type, decoration and color are all factors. Therefore, to analyze the relationship between them, chi-square test, contingency table analysis, mosaic visualization and other intuitive analysis methods should be adopted. And correspondence analysis is not suitable for features with a few factors.
(2) According to the types of glass, the statistical law of weathering chemical composition content on the surface of cultural relics samples is analyzed. This can be analyzed by analysis of variance. For example, the following analysis results, the proportion of each situation is clear at a glance, can be analyzed accordingly, and then further multi-factor analysis of variance for quantitative analysis.
To predict the chemical composition content before weathering, a corresponding regression prediction model or parameter estimation model can be established.
Question 2: Analyze the classification rules of high-potassium glass and lead-barium glass according to the attached data; For each category, the appropriate chemical components are selected for classification, the specific classification methods and results are given, and the rationality and sensitivity of the classification results are analyzed.
Solution: (1) Analyze the classification law. This question is relatively simple. With the help of some data visualization methods, we can find the laws of data.
(2) For each category, select appropriate chemical components and classify them into subclasses. Subclassification is essentially a data clustering problem. Some feature selection or feature extraction methods can be used to process and transform the features of the data (there are many methods), thus clustering the data more conveniently.
There are also many methods to cluster data, and different sub-classification results may be obtained by combining different methods. Rationality and sensitivity analysis can be expressed more intuitively through numerical quantification and visualization. Cluster evaluation usually uses contour coefficient, gap statistics and so on. For example, the analysis results shown in the figure below.
Question 3: Analyze the chemical composition of unknown glass relics in Annex Table 3, identify their types, and analyze the sensitivity of classification results.
Solution: Predict the data categories in Table 3. This is a classification problem, and the corresponding classification model can be established. There are many simple or complex methods, which are displayed in the provided programs. Decision tree, random forest, support vector machine, neural network and so on will do. But the classification accuracy of different algorithms is different.
Question 4: According to different kinds of glass cultural relics samples, analyze the correlation between their chemical components and compare the differences between different kinds of chemical components.
Solution: Analyze the correlation between chemical components. This is a problem of correlation analysis, such as correlation analysis, regression analysis, machine learning regression and other algorithms, which can make corresponding predictions. It can be used to analyze a single chemical component and also to analyze the association of multiple chemical components. This difference can be compared and analyzed with the previous analysis results.
According to the analysis results, the differences can be summarized and counted.