Density-Based Clustering

Density-based clustering divides spatially dense point distributions into clusters using specified algorithms. It provides three clustering methods: Density-Based Clustering, Hierarchical Clustering (HDBSCAN), and Ordering Clustering (OPTICS).

Definitions

  • MinPts: Minimum number of points required to form a cluster.
  • Clustering radius ε: Points within this radius that meet or exceed MinPts form a cluster.
  • Core point: A point with at least MinPts neighboring points within ε.
  • Boundary point: A point within ε of a core point but with fewer than MinPts neighbors.
  • Noise point: Neither a core nor boundary point.
  • Core distance: Distance from point X to its k-th nearest neighbor (k=MinPts), denoted as corek(x). Example when k=5:

  • Reachable distance: The maximum value between core distance and actual distance between two points.

    Example: With minPts=3 and ε=d(P,5), core distance of Point P is d(P,1). Reachable distances: Point 2 and 3 use core distance d(1,P); Point 4 uses actual distance d(4,P).

Principles

  1. Density-Based Clustering (DBSCAN)

    Identifies clusters via spatial density, defining clusters as maximal sets of density-connected points. Robust to noise and detects arbitrary-shaped clusters.

    Algorithm: Finds core objects using Eps and MinPts, then expands clusters from these cores.

    Parameter adjustment: Increasing Eps merges clusters; decreasing splits clusters. Higher MinPts increases noise detection.

    Use case: Identifying accident-prone areas with explicit thresholds (e.g., 3 accidents within 2000m).

  2. Hierarchical Clustering (HDBSCAN)

    Separates clusters from noise using variable density thresholds. Creates stable clusters through hierarchical merging.

    Algorithm: Improves DBSCAN by building minimum spanning trees with core/reachable distances. Splits clusters based on stability criteria.

    Cluster extraction conditions:

    1. Split clusters when 1/λ1 + 1/λ2 > 1/λ
    2. Stop splitting if subclusters have fewer than MinPts

  3. Ordering Clustering (OPTICS): Reduces parameter sensitivity by considering density connectivity. Computationally intensive but flexible in detecting variable-density clusters.

Applications

  • Site selection for chain stores using customer position data
  • Emergency station planning for pipe burst clusters
  • Identifying high-accident road sections
  • Ecological population analysis
  • Anomaly detection in data cleaning/security

Feature Entry

  • Spatial Statistics Tab -> Cluster Distribution -> Density-Based Clustering
  • Toolbox -> Spatial Statistics -> Cluster Distribution -> Density-Based Clustering

Parameters

Parameter requirements and output fields vary by method:

Method DBSCAN HDBSCAN OPTICS
Parameters

Clustering radius

MinPts

MinPts

Clustering radius

MinPts

Compactness

Output Fields

Source_ID

Cluster_ID

Source_ID

Cluster_ID

Prob

Outlier

Exemplar

Stability

Source_ID

Cluster_ID

ReachOrder

ReachDist

  • Source Dataset: Input point dataset for analysis.
  • Clustering Method:
    1. DBSCAN: Requires clustering radius and MinPts.
    2. HDBSCAN: Requires MinPts only.
    3. OPTICS: Requires radius, MinPts, and compactness.
  • Clustering Radius:
    • For DBSCAN: Defines neighborhood search range.
    • For OPTICS: Maximum reachable distance threshold.
  • Unit: Measurement units (default: meters).
  • MinPts: Minimum points to form cluster.
  • Compactness: Clustering tightness (0-100). Higher values produce denser clusters.

Output Interpretation

Results vary by method but always include Cluster_ID rendering. Key fields:

  • Source_ID: Original point ID.
  • Cluster_ID: Cluster membership (-1=noise).
  • Prob: Cluster membership probability (1=high).
  • Outlier: 1 indicates potential outlier.
  • Exemplar: 1 marks representative points.
  • Stability: Cluster stability score.
  • ReachOrder: Processing sequence.
  • ReachDist: Reachability distance for density assessment.

Case Study

Using OPTICS to analyze restaurant customer distribution identified four clusters (grey=noise). New branches should prioritize high-density cluster areas:

Cluster 3 has the highest proportion, indicating maximum customer density:

Related Topics

Hot Spot Analysis (Getis-Ord Gi*)

Analyzing Patterns