It is a step in KDD (knowledge-discovery in databases), and it is a process of mining and analyzing a large amount of data and extracting information from it.
Some of these applications include market segmentation-such as identifying the characteristics of customers who buy specific products from specific brands, fraud detection-identifying transaction patterns that may lead to online fraud, and so on.
In this paper, Tianjin Computer has trained and sorted out eight best open source data mining tools.
1, WekaWEKA, as an open data mining platform, gathers a large number of machine learning algorithms that can undertake data mining tasks, including data preprocessing, classification, regression, clustering, association rules and visualization on the new interactive interface.
2.RapidMinerRapidMiner is the world's leading data mining solution, which adopts advanced technology to a great extent.
Its data mining task covers a wide range, including various data arts, which can simplify the design and evaluation of data mining process.
3.Orange is a component-based data mining and machine learning software suite. Its function is a friendly, powerful, fast and multifunctional visual programming front end for browsing data analysis and visualization. Python must develop scripts.
It contains a series of complete data preprocessing components, and provides data accounting, transformation, modeling, model evaluation and exploration functions.
It is developed by C++ and Python, and its graphics library is developed by cross-platform Qt framework.
4. Knimektime (KonstanzinformationMiner) is a user-friendly, intelligent and well-developed open source platform for data integration, data processing, data analysis and data exploration.
5.jHepWorkjHepWork is a complete object-oriented scientific data analysis framework.
Jython macros are used to display data in one-dimensional and two-dimensional histograms.
The program includes many tools, which can be used to interact with 2D and 3D scientific graphics.
6.ApacheMahoutApacheMahout is a brand-new open source project developed by ApacheSoftwareFoundation(ASF). Its main goal is to create some extensible machine learning algorithms for developers to use freely under the license of Apache.
The project has developed into the second year, and there is only one public version at present.
Mahout includes many implementations, including clustering, classification, CP and evolutionary programs.
In addition, by using the ApacheHadoop library, Mahout can be effectively extended to the cloud.
7.Elkielki (an application supported by ping KDD-index-Structures) is mainly used for clustering and finding outliers.
ELKI is a data mining platform similar to weka, which is written in java and has a GUI graphical interface.
Can be used to find outliers.