Current location - Training Enrollment Network - Mathematics courses - What are the core technologies of big data?
What are the core technologies of big data?
Big data technology system is huge and complex, and its basic technologies include data acquisition, data preprocessing, distributed storage, database, data warehouse, machine learning, parallel computing, visualization and so on.

1. Data acquisition and preprocessing: FlumeNG real-time log acquisition system, which supports customizing various data senders in the log system for data acquisition; Zookeeper is a distributed open source distributed application coordination service, which provides data synchronization service.

2. Data storage: Hadoop, as an open source framework, is specially designed for offline and large-scale data analysis, and HDFS, as its core storage engine, has been widely used in data storage. HBase is a distributed, column-oriented open source database, which can be considered as the encapsulation of hdfs, and its essence is data storage and NoSQL database.

3. Data cleaning: As the query engine of Hadoop, MapReduce is used for parallel computing of large-scale data sets.

4. Data query analysis: The core work of Hive is to translate SQL statements into MR programs, which can map structured data into a database table and provide HQL(HiveSQL) query function. Spark supports in-memory distributed data sets, which can not only provide interactive queries, but also optimize iterative workload.

5. Data visualization: For some BI platforms, the data obtained from the analysis will be visualized to guide decision-making services.