Detailed bibliography of hadoop application development technology
Introduction chapter 1 Hadoop overview1.1Hadoop origin1.1Google and Hadoop module 1. 1.2 Why do you want to introduce Hadoop1./kloc-? 1.2 Hadoop Ecosystem1.3 Introduction to Hadoop Common Projects1.4 Application of Hadoop in China 1.5 This chapter summarizes the second chapter Hadoop installation 2. 1 Hadoop environment installation configuration 2. 1. 1 installation. 2. 1.2 Install Ubuntu 2. 1.3 Install VMwareTools 2. 1.4 Install JDK 2.2 Hadoop Installation Mode 2.2. 1 Stand-alone Installation 2.2.2 Pseudo Distributed Installation 2.2.3 Distributed Installation 2.3 How to Use Hadoop 2.3. 1 Hadoop start and stop 2.3.2 Hadoop configuration file 2.4 This chapter summarizes Chapter 3 Quick Start of MapReduce 3. 1 WordCount Example Preparing Development Environment 3. 1. 1 Creating Java Project with Eclipse 3. 1.2 Import. Implementation of Hadoop JAR file 3.2 MapReduce code 3.2. 1 Write WordMapper class 3.2.2 Write WordReducer class 3.2.3 Write WordMain driver class 3.3 Package, Deploying and Running 3.3.2 Deploying and Running 3.3.3 Test Results 3.4 This chapter summarizes Chapter 4: Detailed explanation of Hadoop distributed file system 4. 1 Understanding the characteristics of HDFS 4. 1 unified interface 4. 1.3 HDFS Web service 4.2 HDFS architecture 4.2. 1 Rack 4.2.2 Block 4.2.3 Metadata Node 4.2.4 Data Node 4.2.5 Auxiliary Metadata Node 4.2.6 Namespace 4.2.7 Data Replication 4.2.8 Block Backup Principle 4 Rack Awareness 4.3 Hadoop RPC Mechanism 4.3. 1 RPC implementation process 4.3.2 RPC entity model 4.3.3 file reading 44.3 Why is there HA mechanism 4.5 Federated mechanism of HDFS 4.5. 1 Limitation of HDFS architecture of a single NameNode 4.5.2 Why is there Federated mechanism 4.5.3 Federated architecture 4.5.4 Managing multiple namespaces 4.6 Hadoop file system access 4.6./ Kloc-0/ Security Mode 4.6.2 Command Processing File for Shell to Access HDFS 4.6. 3 HDFS 4.7 JavaAPI Interface 4.7. 1 HadoopURL Read Data 44.8.4 Add Node 4.8.5 Delete Node 4.9 HDFS Rights Management 4.9. 1 User identity 4.9.2 Principles of rights management 4.9.3 Shell command for setting rights 4.9.4 Super user 4.9.5 HDFS rights configuration parameters 4. 10 This chapter summarizes 5 Hadoop files. I/ O explains the data structure of Hadoop file 5.1.1sequence file storage 5. 1.2 MapFile storage 5. 1.3 SequenceFile is converted to MapFile 5.2 HDFS data integrity 5.2. 1 proofreading. Verify 5.2.2 Block Detection Program 5.3 Serialization of files 5.3. 1 Serialization requirements of interprocess communication 5.3.2 Serialization of Hadoop files 5.3.3 Writable interfaces 5.3.4 Writable comparable interfaces 5.3.5 Custom writable interfaces 5.3.6 sequence. Frame 5.3.7 Data Serialization System Avro 5.4 Hadoop Writeable Type 5.4. 1 Writeable Class Hierarchy 5.4.2 Text Type 5. 4. 3 null Writeable Type 5.4.4 ObjectWritable Type 5.4.5 GenericWri Table Type 5.5 File Compression 5.5. 1 Hadoop-supported compression format 5.5.2 Encoder and decoder in Hadoop 5.5.3 Local library 5.5.4 Separable compression LZO 5.5.5 Performance comparison of compressed files LZO and Snappy 5.6 This chapter summarizes the working principle of chapter 6 MapReduce 6.1Functional programming concept of MapReduce 6./ Kloc-0/ List Processing 6. 1.2 Mapping Data Table 6. 1.3 Restoring Data Table 6.65 438+0.4 Mapper and Reducer Work 6. 1 .5 Application Example: Word Frequency Statistics 6.2 MapReduce Framework Structure 6. 2. 655 MapReduce 6.3.7 Task Completion 6.3.6 MapReduce Fault Tolerance 6.4. 1 Task Failure 6.4.2 TaskTracker Failure 6.4 .3 Job Tracker Failure 6.4.4 Sub-task Failure 6.4.5 Handling Method of Repeated Times of Task Failure 6.5 Shuffle Stage and Sorting Stage 6.5. 1 Shuffle 6.5.2 Shuffle 6.3 Shuffle Process Parameter Optimization 6.6 Task Execution Line 6.6. 1 Assume that 6 scheduler 6.8.66 0 Hadoop scheduler framework 6.8.2 is executed to write Hadoop scheduler 6.9 Introduction to Yarn 6.9. 1 Asynchronous Programming Model 6.9.2 YARN Supported Computing Framework 6.9.3 YARN Architecture 6.9.4 YARN Workflow 6. 1 0 This chapter summarizes the application of Eclipse plug-in in Chapter 7. 1 Compile Hadoop source code 7. 1 Download Hadoop source code 7. 2 Create a new Hadooplocation 7.2.3 Hadoop plug-in operation HDFS 7.2.4 Run MapReduce driver class 7.3 Debugging MapReduce 7.3. 1 Enter debugging operation mode 7.3.2 Debugging specific operation 7.4 Unit test framework MRUnit 7.4./ Kloc-0/ Understanding MRUnit Framework 7.4.2 Preparing Test Cases 7.4.3 Mapper Unit Test 7.4.4 Reducer Unit Test 7.4.5 MapReduce Unit Test 7.5 This chapter summarizes Chapter 8. 1. 1 MapReduce Workflow 8. 1.2 Map Process. The number of words in the kloc-0/.3 Reduce process 8. 1.4 the results produced by each process 8. 1.5 Mapper abstract class 8. 1. 6 Reducer Abstract Class 8. 1.7 MapReduce Driver 8. 1.8 MapReduce Minimum Driver 8.2 Input Format 8.2. 1 InputFormat Interface 8.2.2 InputSplit Class 8.2.3 RecordReader Class 8.2.4 Application. Example: randomly generate 100 decimals and find the maximum 8.3 output format 8.3. 1 OutputFormat interface 8.3.2 RecordWriter class 8.3.3 Application example: put words with the same initials in a file 8.4 compression format 8.4. 1 How to use it in MapReduce? Compression 8.4.2 Compressing the output of map job 8.5 MapReduce optimization 8.5. 1 combiner class 8.5.2 splitter class 8.5.3 distributed cache 8.6 auxiliary class 8.6. 1 reading Hadoop configuration file 8.6.2 Setting Hadoop configuration file. Attribute 8. 6. 3 genericoptions sparser Option 8.7 Streaming Interface 8.7. 1 Streaming Working Principle 8.7.2 Streaming Programming Interface Parameters 8.7.3 Job Configuration Attribute 8.7.4 Application Example: Capturing the Title of a Web Page 8.8 This chapter summarizes the advanced application of 9Ma precure 9. 1 Counter 9. 1 Default Counter 9. 1.2 Custom Counter 9. 1.3 Getting Counter 9.3Doop System Tuning 9.5. 1 Small File Optimization 9.5.2 Map and Reduce Number Setting 9.6 This chapter summarizes Chapter 10 Data Warehouse Tool Hive 1 0. 0. 1.2 Hive data type 10. 1.3 Hive characteristics 10. 1.4 Hive download and installation 10.2 Hive architecture 10.2./. /data storage of kloc-0/0.2.3 hive 10.2.4 Hive interpreter 10.3 Hive file format 10.3. 1 TextFile format 10.3.2 SequenceFile format. 10.3.4 custom file format 10.4 Hive operation 10.4. 1 table operation 10.4.2 view operation 10.4.3 index operation/kloc- Hive composite type 10.5. 1 structure type 10.5.2 array type 10.5.3 mapping type 10.6 Hive connection operation syntax10.6./kloc-. N principle 10.6.3 external connection 10.6.4 mapping end connection 10.6.5 handling semantic differences of null values 10.7 Hive optimization strategy 10.7. 1 column clipping. 5438+00.7.3 GroupBy operation 10.7.4 Merge small files 10.8 Hive built-in operators and functions 10.8. 1 string function 10.8.2 Set statistical function/kloc. Kloc-0/ User-defined function UDF 654380.2 Authorization and revocation of roles 10. 10.3 Super Administrator Permission 101Application Example: Developing Hive Program with JDBC10. .2 code implementation 10. 12 this chapter summarizes the features of the open source database hbase 1 1 to understand hbase1/hbase.50000.000000000605 .6554666 5438+0.2hbase access interface 15438+0 1.2.2 Frame structure and stream 1 1.2.3 Relationship between table and region1.2.3 META。 Table 1 1.3 Key algorithms and location 1 1.3.2 Reading and writing process 1 1.3.3 Area allocation1.3. Ase installation11.4.1hbase stand-alone installation1.4.2hbase distributed installation11.5.65438 0.5hbase Shell operation. 5.2 DDL operation 1 1.5.3 DML operation1.5.4 hbaseshell script1.6 hbase client 1 1 Java API interaction. 2 MapReduce operation HBase 1 1.6.3 writes data into HBase 1 1.6.4 from hbase1.6.5 avro, REST and Thrift interfaces1/kloc. Kloc-0/2 Mahout algorithm 12. 1 use Taste 12.2 Mahout data representation 12. 2. 1 preference class 12.2 dataMySQL data model class/kloc-0. .4 mahout recommender 12.4. 1 user-based recommender 12.4.2 project-based recommender 12.4.3 SlopeOne recommendation strategy 12.5 recommendation system12.5. Recommended system case 12.6 This chapter summarizes the definitions of three configuration file parameters in Appendix A Hive built-in operators and functions Appendix B HBase default configuration description Appendix C Hadoop.