Feature Description
The k-means clustering algorithm is an iterative clustering analysis algorithm. Its principle involves dividing data into K groups by selecting K objects as initial cluster centers, then assigning each object to the nearest initial cluster center. The cluster centers and their assigned objects form a cluster. Each sample allocation triggers recalculation of cluster centers. This process repeats until convergence. Termination conditions include: no objects being reassigned to different clusters, no changes in cluster centers, or reaching maximum iterations (default: 300).
The k-means method is sensitive to outliers. It is recommended to remove anomalies/outliers before analysis to avoid impacting clustering results.
Application Scenarios:
- Distribution center location planning, e.g., determining optimal locations for new distribution centers in a supply chain based on cost constraints.
- Multi-point task allocation, e.g., assigning sales representatives to visit geographically clustered retail stores through k-means analysis.
Clustering analysis serves as exploratory analysis. While results may not directly solve problems, they provide actionable guidance.
Parameter Description
Parameter | Default | Description | Type |
---|---|---|---|
Source Dataset | Specifies the vector dataset to be analyzed. Supports point datasets. | DatasetVector | |
Target Datasource | Specifies the datasource storing the result dataset. | Datasource | |
Result Dataset Name | Specifies the name of the result dataset. | String | |
Number of Clusters | 1 | The expected number of groups for clustering. Typically determined by prior knowledge or multiple trials. Default: 1. | int |
Output
- Adds a Cluster_ID field in the source dataset indicating the resulting cluster category.
- Generates a vector dataset containing K cluster centroids.