Why is the least square method the best method in linear regression?
First of all, what is the definition of "optimal"? It is possible to minimize the sum of error squares (2 norm) and the sum of error distances (1 norm). There are many other optimal standards: for example, no unified "optimal" is achieved by adding various regulatory provisions. The problem is that you must first define a reasonable objective function (such as the aforementioned 2 norm, 1 norm and so on). ), and minimizing the solution of this objective function is optimal in a certain sense. No objective function is better than other objective functions. Every reasonable objective function has its scope of application, and its characteristics can be proved by mathematical methods. A core problem is that when the sum of squares of errors is taken as the objective function, Gauss discovered more than 200 years ago that a unique explicit solution can be derived, which is also called the least square method. When people study it further, it is found that some beautiful conclusions can be obtained under the condition of Gaussian noise, such as the least square solution is equivalent to the maximum likelihood estimation and optimization. However, with other objective functions, it is difficult to get the optimal solution explicitly-and with the development of convex optimization in recent years, the optimal solution can also be obtained by mature algorithms with objective functions such as 1 norm.