Density Clustering
Feature Description
Spatial density clustering refers to performing spatial aggregation analysis on a collection of points using the DBSCAN density clustering algorithm. This algorithm can connect regions with high density and spatial proximity within point data based on a given search radius and the minimum number of points required within that range. It divides areas with sufficiently high density into clusters and can discover arbitrarily shaped clusters in spatial data with noise. The result returns a feature dataset (FeatureRDD).
Parameter Description
| Parameter Name | Default Value | Parameter Interpretation | Parameter Type |
|---|---|---|---|
| Analysis Dataset |
Connection information for accessing data, which needs to include information such as data type, connection parameters, and dataset name. Set using the format '--key=value', with multiple values separated by a space ' '. For example, connecting to HBase data: --providerType=hbase --hbase.zookeepers=192.168.12.34:2181 --hbase.catalog=demo --dataset=dltb; Connecting to DSF data: --providerType=dsf --path=hdfs://ip:9000/dsfdata ; For local data: --providerType=dsf --path=/home/dsfdata | String | |
| Cluster Radius | Cluster radius with unit, referred to as the E-neighborhood in the DBSCAN algorithm. Input format like "1 Kilometer". Supported units include Meter, Centimeter, Millimeter, Decimeter, Kilometer, Yard, Inch, Foot, Mile, Degree, Second, Minute, Radian. The default unit is Meter. | JavaDistance | |
| Cluster Number Threshold | 2 | Cluster number threshold, used as the criterion to determine if a point is a core point. Note that this value includes the point itself. In the DBSCAN algorithm, points exceeding this value are called core objects. This value must be greater than or equal to 2. | Integer |
| Field Name for Saving Clustering Category |
Field name used to save the clustering category. | String | |
| Fields to Retain in the Result Dataset (Optional) |
Fields that need to be retained in the result dataset. | String | |
| Result Dataset |
Connection information for accessing data, which needs to include information such as data type, connection parameters, and dataset name. Set using the format '--key=value', with multiple values separated by a space ' '. For example, connecting to HBase data: --providerType=hbase --hbase.zookeepers=192.168.12.34:2181 --hbase.catalog=demo --dataset=dltb; Connecting to DSF data: --providerType=dsf --path=hdfs://ip:9000/dsfdata ; For local data: --providerType=dsf --path=/home/dsfdata | String | |
| Data Query Conditions (Optional) |
Data query conditions, supporting attribute conditions and spatial query, e.g., SmID<100 and BBOX(the_geom, 120,30,121,31) | String |