For example, for 100 evenly distributed points, a unit interval can be sampled, and the distance of each point does not exceed 0.01; When the dimension increases to 10, if a unit hypercube is sampled in a small square with the distance between adjacent points not exceeding 0.0 1, 1020 sampling points are needed. So this 10 dimensional hypercube can also be said to be 10 18 times of the unit interval. (This is an example given by richard bellman. )
This name has been mentioned in many fields, such as sampling, combinatorial mathematics, machine learning and data mining. The common feature of these problems is that when the dimension increases, the volume of space increases too fast, so the available data becomes sparse. Sparsity is a problem for any method that requires statistical significance. In order to obtain a statistically correct and reliable result, the amount of data needed to support this result usually increases exponentially with the increase of dimensions. Moreover, when organizing and searching data, it also depends on detecting object areas, and the objects in these areas form groups through similar attributes. However, in high-dimensional space, all data are sparse and dissimilar from many angles, so the commonly used data organization strategies become extremely inefficient.
"Dimension disaster" is usually used as a powerless excuse not to deal with high-dimensional data. However, academic circles have always been interested in it and continue to study it. On the other hand, due to the existence of inherent dimensions, this concept means that any low-dimensional data space can be transformed into a higher-dimensional space simply by adding spare (such as copying) or random dimensions. On the contrary, many data sets in high-dimensional space can be reduced to low-dimensional data without losing important information. This is also reflected in the effectiveness of many dimensionality reduction methods, such as the widely used principal component analysis. For distance function and nearest neighbor search, the current research also shows that unless there are too many unrelated dimensions, data sets with dimension disaster characteristics can still be processed, because related dimensions can actually make many problems (such as cluster analysis) easier.