Most regularization methods add a structural loss on the basis of experience loss to limit the learning ability of the model and improve the generalization ability of the model. L 1 regularization method is to add a parameter norm as structural loss on the basis of experience loss. The mathematical expression is as follows:
We all know that adding L 1 regularization to the model loss can get the sparse solution of parameters. Next, we explain the following contents from the perspective of geometry and mathematics, but the focus is still on mathematical derivation.
This number will appear in almost all articles explaining L 1 regularization. The isoline in the figure is the isoline of L, and the black square is the graph L 1 of the regular term. In the figure, when the L isoline and
The first intersection of graph L 1 is the optimal solution. In the above figure, l and L 1 intersect at a vertex of L 1, and this vertex is the optimal solution. Note that the value of this vertex is (w 1, w2)=(0, w). It can be intuitively imagined that since the graph L 1 with the regularization term L 1 is prismatic and has many prominent angles (four in two dimensions and more in many dimensions), the probability of the L isoline contacting these angles will be much greater than the probability of contacting other parts of L 1, and there will be many weights equal to 0 at these angles, which is also L/kloc.
High energy ahead, non-combatants please evacuate quickly! ! !
The following parts are pasted from word.
1、 blogs.com/heguanyou/archive/20 17/09/23/7582578.html