K-Means Clustering

Feature Description

The k-means clustering algorithm is an iterative clustering analysis algorithm. Its principle involves dividing data into K groups by selecting K objects as initial cluster centers, then assigning each object to the nearest initial cluster center. The cluster centers and their assigned objects form a cluster. Each sample allocation triggers recalculation of cluster centers. This process repeats until convergence. Termination conditions include: no objects being reassigned to different clusters, no changes in cluster centers, or reaching maximum iterations (default: 300).

The k-means method is sensitive to outliers. It is recommended to remove anomalies/outliers before analysis to avoid impacting clustering results.

Application Scenarios:

  • Distribution center location planning, e.g., determining optimal locations for new distribution centers in a supply chain based on cost constraints.
  • Multi-point task allocation, e.g., assigning sales representatives to visit geographically clustered retail stores through k-means analysis.

Clustering analysis serves as exploratory analysis. While results may not directly solve problems, they provide actionable guidance.

Parameter Description

Parameter Default Description Type
Source Dataset   Specifies the vector dataset to be analyzed. Supports point datasets. DatasetVector
Target Datasource   Specifies the datasource storing the result dataset. Datasource
Result Dataset Name   Specifies the name of the result dataset. String
Number of Clusters 1 The expected number of groups for clustering. Typically determined by prior knowledge or multiple trials. Default: 1. int

Output

  1. Adds a Cluster_ID field in the source dataset indicating the resulting cluster category.
  2. Generates a vector dataset containing K cluster centroids.