1960, mathematicians Erdos and Renyi established stochastic graph theory, which provided a new method for constructing networks. In this method, whether there is an edge connection between two nodes is no longer a definite thing, but is determined according to a probability, and the generated network is called a random network. For forty years, the concept of random graph has dominated the research of complex networks. However, it is not until recent years that scientists have obtained many results after calculating and studying a large number of actual data of real networks. Most actual networks are not completely random, neither regular nor random, but networks with different statistical characteristics. Such a one? Some networks are called complex networks, and the study of complex networks marks the arrival of the third stage of network research.
1998, Watts and his tutor Strogatz wrote an article "Collective Dynamics of Small-World Networks" on Nature, which described the small-world characteristics of real-world networks, with large cohesion coefficient and short average path length. Subsequently, in 1999, Barabasi and his doctoral student Albert put forward a scale-free network model (the degree distribution is power-law distribution) in the article "The emergence of scale in random networks", which portrayed the phenomenon that "the rich get richer" which is common in actual networks and opened a new era of complex network research.
With the deepening of research, more and more properties of complex networks have been discovered. One of the most important studies is an article "Community Structure in Social and Biological Networks" written by Gwen and Newman on PNAS in 2002. The article points out that there are clustering characteristics in complex networks, and each class is called a community, and puts forward an algorithm to discover these communities. Since then, people have done a lot of research on community discovery in complex networks and produced a large number of algorithms.
Many complex systems can be modeled as complex networks for analysis, such as common power networks, aviation networks, transportation networks, computer networks and social networks. Complex network is not only a form of data expression, but also a means of scientific research.
The definition of complex network
Qian Xuesen gave a strict definition of complex network:
Complex networks have the same characteristics, such as smaller average path length, larger aggregation coefficient and power law distribution of node degree.
The implication is that a complex network refers to a network with high complexity, and its characteristics are mainly reflected in the following aspects:
Small world theory is also called six-dimensional space theory or six degrees of separation. Small-world characteristics point out that there will be no more than six people between any member and any stranger in social networks.
When considering the characteristics of the network, two characteristics are usually used to measure the network:
For regular networks, the feature path length between any two points (individuals) is long (how many individuals are linked together), but the aggregation coefficient is high (the probability that you are friends of friends is high). For random networks, the characteristic path length between any two points is short, but the aggregation coefficient is low. However, in the small-world network, the characteristic path length between points is small, which is close to random network, while the aggregation coefficient is still quite high, which is close to regular network.
The small-world characteristics of complex networks are closely related to the information dissemination in the networks. The actual social and ecological networks are all small-world networks. In such a system, the speed of information transmission is very fast, and changing a few connections can greatly change the performance of the network. For example, the existing network can be adjusted, such as cellular telephone network, and the performance can be significantly improved by changing several lines.
Most networks in the real world are not random networks, and a few nodes often have a large number of connections, while most nodes are few. The degree distribution of nodes conforms to the power distribution, which is called the scale-free characteristic of the network. A complex network whose degree distribution conforms to the power law distribution is called scale-free network.
For example, the distribution of the number of co-channel users in Zhihu:
Scale-free characteristics reflect the serious heterogeneity of complex networks, and the connection state (degree) between nodes is seriously uneven: a few nodes called hubs in the network have extremely many connections, while most nodes have only a few connections. A few hubs play a leading role in the operation of scale-free networks. Broadly speaking, the scale-free nature of scale-free networks is an inherent property to describe the serious uneven distribution of a large number of complex systems as a whole.
In fact, the scale-free characteristics of complex networks are closely related to the robustness analysis of networks. The existence of power law distribution in scale-free network greatly improves the possibility of the existence of high nodes, so scale-free network not only shows robustness to random faults, but also shows vulnerability to deliberate attacks. This robustness and vulnerability have a great influence on the network fault tolerance and anti-attack ability.
The research shows that the scale-free network has strong fault tolerance, but its anti-attack ability is quite poor for selective attacks based on node degrees. The existence of advanced nodes greatly weakens the robustness of the network. Malicious attackers can attack the network by selecting several advanced nodes, thus quickly paralyzing the network.
Birds of a feather flock together. Nodes in complex networks often show cluster characteristics. For example, there is always a circle of acquaintances or friends in a social network, in which each member knows other members. The meaning of clustering degree is the degree of network collectivization; This is a cohesive tendency of the network. The concept of connected group reflects the distribution and interconnection of small networks gathered in a large network. For example, it can reflect the relationship between this circle of friends and another circle of friends.
The following figure is a description of the phenomenon of network aggregation:
The small-world characteristics, scale-free power law distribution or high aggregation of real networks urge people to construct various network models in theory to explain these statistical characteristics and explore the evolution mechanism of these networks. This part introduces the principles and construction methods of several classical network models, including ER random network model, BA scale-free network model and small-world model.
ErdOs-Renyi stochastic network model (ER stochastic network model for short) is a network model proposed by Hungarian mathematicians ErdOs and Renyi. In 1959, in order to describe the network in communication and life sciences, Erdos and Renyi proposed that this kind of system can be effectively simulated by randomly arranging the connections between network nodes. The simplicity of this method and related theorems led to the revival of graph theory research, and a new field of studying stochastic networks appeared in mathematics. ER stochastic network model has been widely used in computer science, statistical physics, life science, communication engineering and other fields.
ER stochastic network model is an equal opportunity network model. In this network model, given a certain number of individuals (nodes), its association (contact) probability with any other individuals (nodes) is the same, and it is recorded as households. Because the probability of one node connecting K other nodes will decrease exponentially with the increase of K value. In this way, if it is defined as the number of other individuals connected to each individual, we can know that the connection probability p(k) obeys a bell-shaped Poisson distribution, and sometimes random networks are also called exponential networks.
There is an important prediction in random network theory: although the connections are randomly arranged, the resulting network is highly democratic, that is, the number of connections of most nodes will be roughly the same. In fact, it is very rare that the number of nodes in a random network is much higher or lower than the average.
In the past 40 years, scientists have become accustomed to treating all complex networks as random networks. 1998 when studying the project describing the world wide web (with web pages as nodes and hyperlinks as edges), scholars think that a random network will be found: people will decide which websites to link network files to according to their own interests, and their personal interests are diverse, and the number of web pages to choose from is extremely large, so the final linking method will present quite random results.
However, this is not the case. Because on the World Wide Web, not all nodes are equal. When choosing the location of linked web pages, people can choose from billions of websites. However, most of us are only familiar with a small part of the whole World Wide Web, which often includes more linked websites, because such websites are more easily known. As long as you link to these websites, you create or strengthen your preference for them. This process of "preferential attachment" also occurs in other networks. On the Internet, the more routers are connected, the wider the bandwidth is, so new users are more inclined to connect to these routers. In the biotechnology industry in the United States, some well-known companies are more likely to attract allies, which further strengthens their attractiveness in future cooperation. Similarly, in the paper citation network (the paper is a node, and the citation relationship is an edge), scientific documents with more citations will attract more researchers to read and quote. Aiming at the new characteristics of "priority connection" of these networks, scholars put forward the BA scale-free network model.
The discovery of scale-free networks has brought human understanding of complex networks into a new world. The most important feature of scale-free network is that the degree distribution of nodes obeys power law. BA model is the first abstract model of scale-free network. Considering the growth and preferential connectivity of the system, BA model has brought us a lot of inspiration and can be applied to many practical networks. However, the two basic assumptions of BA model are too simple to explain many realistic phenomena, and there is still a long way to go.
Some scholars try to expand the BA model, that is, add some assumptions according to the real network, so as to further explore the laws of complex network systems. The expansion of BA model can consider three factors: the cost of optimal selection, the reconnection of edges and the initial state of the network. The extended BA model can better simulate the network phenomenon in the real world.
During the period of 1999, Marubbarabasi and his brother Albert discovered scale-free networks in their research on the Internet, which gave mankind a new understanding of complex network systems. In the past, people used to regard all complex networks as random networks, but Barabasi and Albert found that the Internet is actually composed of several highly connected pages, and more than 80% of the pages have fewer than four links. There are only a few nodes, accounting for less than one tenth of the total number of nodes, and there are more than 1000 links. The link distribution of this web page follows the so-called "power law": the probability of any node having a link is proportional to 1/k, and it does not have a high concentration peak like a bell curve, but a continuous decreasing curve. If we take the double logarithmic coordinate system to describe the power law, we get a straight line.
Scale-free network refers to a network in which the degree distribution of nodes conforms to the power law distribution, and it is called scale-free network because it lacks a characteristic scale to describe the problem. In the next few years, researchers discovered scale-free networks in many different fields. From ecosystem to interpersonal relationship, from food chain to metabolic system, scale-free networks can be seen everywhere.
Why is the stochastic model inconsistent with reality? After in-depth analysis of er model, Barabasi and Albert found that the problem is that the network discussed in ER model is a fixed-scale network and will not continue to expand. It is precisely because the real network often has the characteristics of continuous growth, so it is easier to connect early nodes (old nodes). When the network expands to a certain scale, these old nodes can easily become distributed nodes with a large number of connections. This is the "growth" of the network.
Secondly, when each node is connected with other nodes in the er model, the probability of establishing the connection is the same. In other words, all nodes in the network are equal. This situation is also inconsistent with reality. For example, when a newly established website chooses to link with other websites, it will naturally choose one of the well-known websites to link, and the hypertext links on the new personal homepage are more likely to point to well-known websites such as Sina and Yahoo. In this way, those well-known websites will get more links, which is the so-called "preferred connection". This phenomenon is also called "Matthew effect" or "the rich are getting richer".
"Growth" and "priority connection" explain the existence of distributed nodes in the network.
The key of BA scale-free model lies in that it reduces the scale-free characteristics of complex networks to two very simple mechanisms: growth and preferential connection. Of course, this inevitably makes the BA scale-free network model have some obvious limitations compared with the real network. For example, the influence of local characteristics of some real networks on network evolution results, and the influence of the outside world on the deletion of network nodes and their connection edges.
There is node exchange between natural or artificial real networks and the outside world, and the connections between nodes are constantly changing. The network itself has a certain self-organizing ability and will respond to changes in itself or the outside world. Therefore, based on the BA model, the dynamic process of the model can be summarized, including the random deletion of existing nodes or connections in the network and the corresponding connection compensation mechanism.
For each time step, consider the following three assumptions:
An important discovery in the study of complex networks is that the average path length of most large-scale real networks is much less than expected, which is called "small world phenomenon" or "six-degree separation".
The so-called small-world phenomenon is a basic phenomenon from social networks, that is, everyone only needs a few intermediaries (an average of six) to establish contact with people all over the world. In this theory, everyone can be regarded as a node of the network, and there are a lot of paths to connect them. The connected nodes represent people who know each other.
In 1998, Watts and Strogatz introduced a single-parameter small-world network model between regular networks and completely random networks, which is called WS small-world model. The model well reflects the characteristics of small average path length and large clustering coefficient of social networks.
The construction method of WS small world model is as follows:
In WS small-world model, P = 0 corresponds to regular network, and P = 1 corresponds to completely random network. By adjusting the sound value, the transition from regular network to completely random graph can be controlled. Therefore, WS small-world network is a kind of network between regular network and random network.
The randomization process in WS small-world model construction algorithm may destroy the connectivity of the network. So Newman and Watt later put forward the NW small world model. The construction method of NW small world model is as follows:
NW model only changes "randomized reconnection" into "randomized edge" in the construction of WS small-world model.
The difference between NW model and WS model is that it does not cut off the original edge in the regular network, but reconnects a pair of nodes with probability p ... The network thus constructed not only has a large number of clusters, but also has a small average distance. The advantage of NW model is that it simplifies the theoretical analysis, because WS model may have isolated nodes, while NW model does not. When the family is small enough and n is large enough, the NW small-world model is essentially equivalent to the WS small-world model.
The small-world network model reflects some characteristics of the actual network, such as the friend network. Most people's friends are colleagues and classmates who live in the same place with them. The geographical location is not far, or they just work or study in the same unit. On the other hand, there are also some people who live far away, even friends who are far away in foreign countries. This situation is just like the remote connection generated by reconnecting in WS small-world model or adding a connection in NW small-world model.
One of the main characteristics of the small-world network model is that the average distance between nodes decreases exponentially with the number of remote connections. For regular networks, the average distance l can be estimated as l is proportional to n; For the small-world network model, L is proportional to ln (n)/ 1n (k). For example, in a city with a population of10 million, the average contact distance between people is about 6, which greatly shortens the distance between the living. The model consists of a regular ring, usually a one-dimensional ring, with almost periodic boundary conditions (that is, almost every node in the ring is connected with a fixed number of adjacent nodes) and a "shortcut" formed by a small number of randomly selected nodes (reconnecting existing edges). Small-world network has the characteristics of "high network aggregation" and "low average path".
From the small-world network model, we can see that the performance of the network will change greatly as long as some connections are changed. This property can also be applied to other networks, especially to the adjustment of existing networks. For example, a mobile phone network can significantly improve its performance (low cost and low workload) by changing the connection of several lines. It can also be applied to the backbone router of the Internet to change the traffic and improve the transmission speed. The same idea can also be applied to the rapid delivery of e-mail, the positioning of specific websites and so on.
If you study complex networks, at present, the best video tutorial is:
Network analysis of social computing and social network analysis
Overview of clustering algorithms in complex networks.
2) Network Analysis Summary of Complex Network Analysis
3) Complex networks and social networks