Usage Instruction
The k-means clustering (k-means) algorithm is an iterative solution-based cluster analysis algorithm, which is based on the principle of pre-dividing the data into K groups, selecting K objects as the initial cluster centers, and then assigning each object to the nearest initial cluster center, with the cluster centers and the objects assigned to them representing a cluster. The cluster centers will be recalculated for each sample assignment. This process will be repeated until the desired result is reached. The termination condition of the iteration can be that no objects are reassigned to different clusters, no cluster centers change again, or the maximum number of iterations is reached (default number of iterations: 300).
The k-means method is sensitive to outliers, so the outliers/outliers can be removed before the analysis, which would otherwise have an impact on the clustering results.
Application Scenario:
Distribution center site selection, such as a supply chain needs to add distribution centers to meet business needs, based on the cost to determine the number of distribution centers need to be located to meet the supply points can be assigned to the nearest distribution center.
Multi-point task division, such as product direct sales staff need to visit distribution supermarkets, how to reasonably allocate multiple supermarkets to N direct sales staff, can use k-means clustering analysis to cluster geographically similar supermarkets into one class, each product direct sales staff can correspond to a cluster.
Cluster analysis is exploratory in nature, and its results may not directly solve the problem, but may provide some guidance.
Parameter Description
Parameter Name | Default Value | Parameter Definition | Parameter Type |
---|---|---|---|
Source Datasets | Set the vector dataset to be analyzed, supporting the point dataset. | DatasetVector | |
Target data source | The data source where the specified dataset of stored results is located。 | Datasource | |
Resulting dataset name | Name of the specified result dataset。 | String | |
Number of clusters | 1 | The number of clusters is the number of groups expected to be obtained by clustering. A suitable value of k is generally chosen based on a priori experience with the data, or after several attempts to select the optimal number. The default value is 1. | int |
Output Result
- "Cluster_ID" field is added to the source dataset to indicate the resultant clustering categories;
- The result vector dataset indicates the K cluster centroids of the final clusters.