Textbooks for undergraduates and postgraduates can also be used as reference books for researchers and teachers in related fields. The theme of machine learning is about how computer programs automatically improve their performance with the accumulation of experience. Machine learning has been successfully applied in many fields, from data mining programs to detect credit card transaction fraud, to information filtering systems to gain users' reading interest, and then to cars that can drive automatically on highways. At the same time, the basic theory and algorithm of this subject have also made great progress.
The goal of this textbook is to show the core algorithms and theories in machine learning. Machine learning absorbs the achievements and concepts of many disciplines, including statistics, artificial intelligence, philosophy, information theory, biology, cognitive science, computational complexity and control. The author believes that the best way to learn machine learning is to look at machine learning from the perspective of these disciplines and understand the background, algorithms and implied assumptions of the problem. These were difficult to do in the past because of the lack of inclusive raw materials in this field. The main purpose of this book is to provide such information.
Due to the multidisciplinary nature of the materials, this book does not require readers to have corresponding knowledge background, but introduces the basic concepts of other disciplines, such as statistics, artificial intelligence, information theory and so on. When necessary. The introduction focuses on those concepts that are most closely related to machine learning. This book can be used as a teaching material for undergraduate or graduate students majoring in computer science and engineering, statistics and social sciences, and can also be used as a reference for software researchers or practitioners.
The writing of this book follows two principles: first, it can be understood by college students; Secondly, it should contain what I want my own doctoral students to master before starting to study research equipment.
The third principle guiding the writing of this book is that it should reflect the balance between theory and practice. Machine learning theory is devoted to answering "How does learning performance change with the number of given training samples?" And "For various learning tasks of the same type: which learning algorithm is the most suitable?" Using theoretical results from statistics, computational complexity and Bayesian analysis, this book discusses this theoretical problem. At the same time, this book also covers many practical aspects: it introduces the main algorithms in this field and expounds the running process of the algorithms.
The realization and data of some of these algorithms can be obtained on the Internet through the website http://www.cs.cmu.edu/-Tom/mlbook.html,, including the source code and data of neural network for face recognition, the source code and data of decision tree learning for credit analysis and the source code and data of Bayesian classifier for analyzing text documents. I am very grateful to my colleagues who helped me create these online resources: Jason Rennie, Paul Hsiung, Jeff Shufelt, Matt Glickman, Scott Davies, Joseph O'Sullivan, Ken Lang\Andrew McCallum and Thorsten Joachims. Introduction to Chapter 1
1. 1 Standard description of learning problems
1.2 design a learning system
1.2. 1 Selection and training experience
1.2.2 Select the objective function
1.2.3 Select the representation of the objective function.
1.2.4 selection function approximation algorithm
1.2.5 final design
1.3 Some viewpoints and problems of machine learning
1.4 How to read this book
1.5 abstract and supplementary reading materials
utilize
Chapter 2 Concept Learning and General to Special Order
2. 1 Introduction
2.2 Concept learning tasks
2.2. 1 Definition of terms
2.2.2 Inductive learning hypothesis
2.3 Concept learning as a search
2.4 search: looking for extremely special assumptions.
2.5 Variant Space and Candidate Elimination Algorithm
2.5. 1
2.5.2 Post-list elimination algorithm
2.5.3 A more concise representation of variant space
2.5.4 Candidate elimination learning algorithm
Examples of algorithms
2.6 Description of Variant Space and Candidate Elimination
2.6. 1 Will the candidate elimination algorithm converge to the correct assumption?
2.6.2 What kind of training samples do you need in the next step?
2.6.3 How to use the concept of incomplete learning
2.7 induced bias
2.7. 1 deviation hypothesis space
2.7.2 Unbiased learners
2.7.3 The futility of unbiased learning
2.8 Small starting and supplementary reading materials
utilize
Chapter 3 Decision Tree Learning
3. 1 Introduction
3.2 Decision Tree Representation
3.3 Application of Decision Tree Learning
3.4 Basic decision tree learning algorithm
3.4. 1 Which attribute is the best classification attribute?
3.4.2 Example
3.5 Hypothetical Space Search in Decision Tree Learning
3.6 inductive deviation of decision tree learning
3.6. 1 defines the deviation and the preferred deviation.
3.6.2 Why do short hypotheses take precedence?
3.7 Common Problems in Decision Tree Learning
3.7. 1 Avoid data over-fitting.
3.7.2 Merge continuous value attributes
3.7.3 Other indicators of attribute selection
3.7.4 Processing training samples with missing attribute values
3.7.5 Handling attributes of different costs
3.8 Abstract and supplementary reading materials
utilize
Chapter IV Artificial Neural Network
4. 1 Introduction
4.2 neural network representation
4.3 Problems Suitable for Neural Network Learning
4.4 perception machine
4.4. 1 Representation ability of perceptron
4.4.2 perceptron training rules
4.4.3 Gradient Descent and Delta Rule
summary
4.5 Multilayer Network and Back Propagation Algorithm
4.5. 1 differentiable threshold unit
Back propagation algorithm
4.5.3 Derivation of Back Propagation Law
4.6 Description of Back Propagation Algorithm
4.6. 1 convergence and local minimum
4.6.2 Representation ability of feedforward network
4.6.3 Hypothetical Space Search and Inductive Deviation
Hidden layer representation
4.6.5 Generalization, Over-fitting and Stop Criteria
4.7 Example: Face Recognition
4.7. 1 task
Design elements
4.7.3 Learning Representation of Hidden Layer
4.8 Advanced topics of artificial neural networks
4.8. 1 Other optional error functions
4.8.2 Other optional error minimization processes
Recursive network
4.8.4 Dynamic modification of network structure
4.9 Abstract and supplementary reading materials
utilize
Chapter V Evaluation Hypothesis
5. 1 motivation
5.2 Estimation Assumption Accuracy
5.2. 1 sample error rate and true error rate
5.2.2 Confidence Interval of Discrete Value Hypothesis
5.3 sampling theory basis
5.3. 1 Error rate estimation and binomial proportional estimation
binomial distribution
Mean and variance
5.3.4 Estimator, Deviation and Variance
Confidence interval
5.3.6 Bilateral and unilateral borders
5.4 General method of inferring confidence interval
5.5 Difference between the error rates of the two hypotheses
5.6 Comparison of learning algorithms
5.6. 1 paired t test
Practical considerations
5.7 Abstract and supplementary reading materials
utilize
Chapter 6 Bayesian learning
6. 1 Introduction
6.2 Bayesian rules
6.3 Bayesian rules and concept learning
6.3. 1 brute force Bayesian concept learning
6.3.2 Mapping Hypothesis and Consistent Learners
6.4 Maximum Likelihood and Minimum Error Square Hypothesis
6.5 Maximum Likelihood Hypothesis of Prediction Probability
6.6 Minimum description length standard
6.7 Bayesian optimal classifier
6.8 Gibbs algorithm
6.9 Naive Bayesian Classifier
6. 10 Example: Learning Classified Text
6. 1 1 Bayesian belief network
6. 1 1. 1 conditional independence
6. 1 1.2 means that
6. 1 1.3 Reasoning
6. 1 1.4 Learning Bayesian belief network
6. 1 1.5 Gradient Ascending Training of Bayesian Networks
6. 1 1.6 Learning the structure of Bayesian network
6. 12 EM algorithm
6. 12. 1 Estimate the mean of k Gaussian distributions.
6. General expression of12.2em algorithm
6. Derivation of12.3k mean algorithm
6. 13 abstract and supplementary reading materials
utilize
Chapter 7 Computational Learning Theory
7. 1 Introduction
7.2 It is possible to learn approximately correct assumptions.
7.2. 1 problem framework
7.2.2 Assumption error rate
7.2.3 PAC learnable habits
7.3 Sample Complexity of Finite Hypothesis Space
7.3. 1 Unknown learning and inconsistent assumptions
7.3.2 The conjunctions of Boolean characters can be learned by PAC.
7.3.3 learnable habits of other concept categories of PAC
7.4 Sample Complexity of Infinite Hypothesis Space
7.4. 1 decomposition instance set.
7.4.2 Vapnik-Chervonenkis dimension
7.4.3 Sample Complexity and VC Dimension
7.4.4 VC dimension of neural network
7.5 Error Range Model of Learning
7. 5. 1 lookup -S error bounds
7.5.2 Error range of halving algorithm
Optimal error range
Weighted majority algorithm
7.6 Abstract and supplementary reading materials
utilize
Chapter 8 Example-based Learning
8. 1 Introduction
8.2 k nearest neighbor algorithm
8.2. 1 Distance Weighted Nearest Neighbor Algorithm
8. 2. 2k- Nearest Neighbor Algorithm Description
Terminology annotation
8.3 Local Weighted Regression
8.3. 1 locally weighted linear regression
8.3.2 Local Weighted Regression Description
8.4 radial basis function
8.5 Case-based reasoning
8.6 Comments on Passive Learning and Active Learning
8.7 Abstract and supplementary reading materials
utilize
Chapter 9 Genetic Algorithm
9. 1 motivation
9.2 Genetic Algorithm
9.2. 1 stands for hypothesis.
Genetic operator
9.2.3 Fitness Function and Hypothesis Selection
9.3 Example
9.4 Hypothetical space search
9.5 Genetic programming
9.5. 1 program representation
9.5.2 Example
9.5.3 Genetic programming description
9.6 Evolution and learning model
Lamarckian evolution
Baldwin effect
9.7 Parallel Genetic Algorithm
9.8 Abstract and supplementary reading materials
utilize
Chapter 10 learning rule set
10. 1 Introduction
10.2 sequence covering algorithm
10.2. 1 generally search in special columns.
Several variants of 10.2.2
10.3 learning rule set: abstract
10.4 learning first-order rules
10.4. 1 first-order Horn clause
10.4.2 terminology
10.5 learning first-order rule set: foil
10.5. 1 Generation of candidate specialization in foil
10.5.2 guide the search for foil
10.5.3 learning recursive rule set
10.5.4 foil summary
10.6 as the induction of reverse deduction
10.7 inverse induction
10.7. 1 first-order induction
10.7.2 inverse induction: first-order case
10.7.3 inverse induction summary
10.7.4 generalization,-inclusion and implication
10.7.5 program
10.8 summary and supplementary reading materials
utilize
Chapter 1 1 Analysis and learning
1 1. 1 Introduction
1 1.2 perfect domain theory learning: Pross -EBG
1 1.3 Interpretation of interpretation-based learning
New features in 1 1.3. 1
1 1.3.2 deductive learning
1 1.3.3 inductive bias based on explanatory learning
1 1.3.4 knowledge level learning
1 1.4 Search Control Knowledge Learning Based on Interpretation
1 1.5 summary and supplementary reading materials
utilize
Chapter 12 Combination of Induction and Analytical Learning
12. 1 motivation
12.2 inductive analysis learning method
12.2. 1 learning problems
12.2.2 imaginary space search
12.3 Using prior knowledge to get the initial hypothesis
12.3. 1 KB ANN algorithm
12.3.2 Example
12.3.3 Description
12.4 changing the search target with prior knowledge
12.4. 1 tangent function algorithm
12.4.2 Example
12.4.3 Description
12.4.4 EBNN algorithm
12.4.5 Description
12.5 uses prior knowledge to expand the search operator.
12.5. 1 focl algorithm
12.5.2 Description
Research status of 12.6
12.7 summary and supplementary reading materials
utilize
Chapter 13 Strengthening learning
13. 1 Introduction
13.2 learning tasks
13.3 Q learning
13.3. 1 q function
An algorithm for learning q
13.3.3 Example
Convergence of 13.3.4
13.3.5 experimental strategy
13.3.6 update sequence
13.4 uncertain returns and actions
13.5 time difference learning
13.6 summarized from examples
13.7 and dynamic programming
13.8 summary and supplementary reading materials
utilize
Appendix symbol convention
To clarify the meaning of reading, we should start with cultivating students' interest, teaching reading methods and cultivating th