Undoubtedly, if you want to be a top data scientist, you need to have various advantages, such as programming ability, certain business intelligence and unique analytical ability. But it is always good to understand the "mechanical principle under the hood". A deep understanding of the mathematical mechanism behind the algorithm will give you an advantage in your peers.
For newcomers who enter the field of data science from other industries (hardware engineering, retail, chemical processing industry, medicine and health, business management, etc.), these basic mathematical knowledge is particularly important. ). Although these fields may require experience in spreadsheets, numerical calculation and projection, the mathematical skills required for data science may be quite different.
Consider a web developer or business analyst. They may have to deal with a lot of data and information every day. Data science should be about science, not data. According to this idea, some tools and technologies become indispensable.
Simulate the process by detecting potential dynamics.
Formation hypothesis
Strictly evaluate the quality of data sources
Uncertainty of quantitative data and forecast
Identifying hidden patterns from information flow
Understand the limitations of the model
Understand mathematical proof and the abstract logic behind it.
By its nature, data science is not limited to a specific subject field, and it can deal with various phenomena, such as cancer diagnosis and social behavior analysis. This leads to the dazzling array of N-dimensional mathematical objects, statistical distribution, optimization objective function and so on.
Functions, variables, equations and graphs
Mathematics in this field covers the basic knowledge, from the binomial theorem of equations to everything between:
Logarithm, exponent, polynomial function, rational number
Basic geometry and theorem, trigonometric identity
Real and complex numbers, basic properties
Series, quantity, inequality
Drawing and drawing, Cartesian coordinates and polar coordinates, conic curves
Where it may be used.
If you want to know how search can run faster after sorting a database with millions of entries, then you will encounter the concept of "method of bisection". To understand its mechanism, you need to understand logarithmic and recursive equations. Or, if you want to analyze a time series, you may encounter concepts such as "periodic function" and "exponential decay".
statistical data
The importance of mastering the basic concepts of statistics and probability cannot be overemphasized. Many practitioners in this field actually think that classic (non-neural network) machine learning is just statistical learning. Key planning is essential to cover the most basic concepts:
Data summary and descriptive statistics, centralized trend, variance, covariance, correlation
Basic probability: expectation, probability calculus, Bayesian theorem, conditional probability
Probability distribution function: uniform, normal, binomial, chi-square, central limit theorem
Sampling, measurement, error, random number generation
Hypothesis test, A/B test, confidence interval, P value
Analysis of variance, t test
Linear regression, standardization
If you master these concepts, you will soon leave a deep impression on people. As a data scientist, you use them almost every day.
linear algebra
This is a basic branch of mathematics to understand how machine learning algorithms act on data streams. From friend recommendation on QQ, song recommendation on cool dog, to deep migration learning to transform your selfie into a portrait of salvador dali, all these involve matrix and matrix algebra. The following is the basic mathematics that needs to be studied:
Basic properties of matrices and vectors: scalar multiplication, linear transformation, transposition, * * * yoke, rank, determinant.
Inner product and outer product, matrix multiplication rules and various algorithms, matrix inversion
Special matrices: square matrix, identity matrix, triangular matrix, unit vector, symmetric matrix, Hermite matrix, oblique Hermite matrix and unitary matrix.
Matrix decomposition concept /LU decomposition, Gaussian/Gaussian-Jordan elimination, solving equation Ax=b linear equations.
Vector space, basis, space, orthogonality, orthogonality, linear least square method
Eigenvalue, eigenvector, diagonalization, singular value decomposition
If you have used dimension reduction technique (principal component analysis), you may have used singular value decomposition to realize compact dimension representation of data sets with fewer parameters. All neural network algorithms use linear algebra technology to represent and deal with network structure and learning operation.
calculus
Whether you like it or hate it in college, calculus has many applications in data science and machine learning. This is a valuable skill:
Uniqueness, Limit, Continuity and Differentiability of Functions
Mean value theorem, infinitive, Robida's law
Maximum and minimum value
Product sum chain rule
Taylor series, the concept of summation/integration of infinite series
Basic theorem and mean value theorem of integral, calculation of definite integral and generalized integral.
function
Multivariate function, limit, continuity, partial derivative
Fundamentals of ordinary differential equations and partial differential equations
Want to know how logistic regression algorithm is realized? It is possible to use a method called "gradient descent" to find the minimum loss function. To understand how it works, you need to use the concepts of calculus: gradient, derivative, limit and chain rule.
Discrete mathematics
This field is not common in data science, but all modern data science is completed with the help of computing systems, and discrete mathematics is the core of these systems.
Set, subset
Counting function, combinatorics, countability
Basic proof skills: induction and reduction to absurdity.
The Basis of Induction, Deduction and Propositional Logic
Basic data structures: stack, queue, graph, array, hash table, tree.
Properties of graphs: connected components, degree, concepts of maximum flow/minimum cut, coloring of graphs.
Recursive relations and equations
In any social network analysis, you need to know the attributes of a graph and the fast algorithm for searching and traversing the network. In the choice of any algorithm, you need to understand the complexity of time and space.
Research topic of optimization and operation
These topics are most related to theoretical computer science, control theory or operational research. But the understanding of these powerful technologies can also achieve fruitful results in the practice of machine learning. In fact, the goal of every machine learning algorithm is to minimize some estimation error under various constraints, which is an optimization problem. Here is the math you need to learn:
The basis of optimization, how to formulate the problem
Maximum, minimum, convex function, global solution
Linear programming, simplex algorithm
integer programming
Constrained programming, knapsack problem
Simple linear regression problems using least square loss function usually have accurate analytical solutions, while logistic regression problems do not. To understand why, you need to be familiar with the concept of convexity in optimization. This series of studies will also clarify why we must be satisfied with "approximate" solutions to most machine learning problems.
Although there are many things to learn, there are good resources on the Internet. After reviewing these topics and learning new concepts, you can hear hidden "music" in daily data analysis and machine learning projects. This is a great leap to become a great data scientist.
Want to know more exciting content, come and pay attention to nonsense science.