High/Low Clustering

Feature Description

High/Low Clustering measures the degree of clustering for either high values or low values using the Getis-Ord General G statistic. The General G Index is an inferential statistic that uses limited data to estimate characteristics of the entire population. When the returned P-value is small and statistically significant, we can reject the null hypothesis. In such cases, a positive Z-score indicates the observed General G Index is larger than expected, suggesting clustering of high attribute values in the study area; a negative Z-score indicates the observed General G Index is smaller than expected, suggesting clustering of low attribute values.

Application Cases

  • Detecting abnormal peaks in emergency room visits may indicate outbreaks of local or regional health issues.
  • Comparing spatial patterns of different retail industries in urban areas to understand competitive sectors (e.g., car dealerships) and non-competitive sectors (e.g., health centers/gyms) through comparative shopping analysis.
  • Assessing the degree of spatial clustering over time to examine changes, such as analyzing urban population clustering patterns during urban development and densification processes.

Feature Entry

  • Spatial Statistics Tab -> Analyzing Patterns -> High/Low Clustering.
  • Toolbox -> Spatial Statistics -> Analyzing Patterns -> High/Low Clustering.

Parameter Description

  • Source Dataset: The vector dataset to analyze, supporting point, line, and polygon datasets.
  • Evaluation Field: The numeric field representing attribute values for analysis.
  • Conceptualization Model: Defines spatial relationships between features. More realistic models yield more accurate results.
    • Fixed Distance: Suitable for point data and polygon data with varying sizes.
    • Polygon Adjacent (Common Edges/Intersect): For polygons sharing edges or intersections.
    • Polygon Adjacent (Node/Common Edges/Intersect): For polygons sharing nodes, edges, or intersections.
    • Inverse Distance: All features influence each other with weights inversely proportional to distance, suitable for continuous data.
    • Inverse Distance Square: Similar to inverse distance but with faster distance decay (weights = 1/distance²).
    • K Nearest Neighbors: Uses K nearest features with equal weights (1). Effective for ensuring minimum neighbors and handling uneven distributions.
    • Spatial Weight Matrix: Uses a spatial weight matrix file (.swmb) to model spatial relationships, ideal for network-based analyses like crime cluster detection.
    • Undifferentiated Region: Combines fixed distance and inverse distance models. Features within a threshold have equal weights; others follow inverse distance rules.
  • Distance Threshold: "-1" calculates default distance; "0" treats all as neighbors; positive values define adjacency range.
  • Inverse Distance Power: Controls distance decay rate. Higher values reduce distant influences faster.
  • Number of Neighbors: Specifies K for K Nearest Neighbors model.
  • Distance Measurement: Euclidean Distance or Manhattan Distance. Details in Spatial Statistics Glossary.
  • Spatial Weights Matrix Standardization: Normalizes weights by row sums (0-1 range), recommended for administrative boundaries to reduce bias from varying neighbor counts.

Result Interpretation

Analysis results are stored in a CAD dataset and displayed on the map.

The High/Low Clustering analysis returns five parameters: General G Index, Expected Value, Variance, Z-score, and P-value. As an inferential statistic, a small P-value (<0.05) allows rejecting the null hypothesis. A positive Z-score indicates high-value clustering; a negative Z-score indicates low-value clustering:

Use High/Low Clustering to detect spatial peaks of high values in uniformly distributed data. When high and low clusters cancel out (observed G ≈ expected G), consider Spatial Autocorrelation analysis instead.

Example

Sample Data: Download case data (requires extraction after download).

Analyzing viral hepatitis incidence using: Evaluation Field = case count, Conceptualization Model = inverse distance, Distance Measurement = Euclidean, Standardized spatial weights. Results show:

Conclusions: P-value <0.01 and Z-score >2.58 indicate 99% confidence in significant high-value clustering of hepatitis cases.

Related Topics

Spatial Autocorrelation

Incremental Spatial Autocorrelation

Average Nearest  Neighbor