Hotspot Analysis

Feature Description

Performs hotspot analysis on point, line, or region datasets. The dataset to be analyzed must have an ID field. The result type returned is a feature dataset (FeatureRDD).

Given a set of weighted features, Hotspot Analysis identifies statistically significant hot spots and cold spots using the Getis-Ord Gi* statistic. Hotspot analysis examines each feature within the context of its neighboring features. Therefore, a single isolated high value does not constitute a hotspot; a feature is considered a hotspot only if both itself and its neighbors have high values.

Analytical Principle

The General G Index is typically used to measure high and low value clustering. In the hotspot analysis tool, the z-score and p-value are measures of statistical significance used to determine whether to reject the null hypothesis for each individual feature. Features with a confidence interval (Gi_Bin field) between +3 and -3 reflect statistical significance at the 99% confidence level; those between +2 and -2 reflect 95% confidence; those between +1 and -1 reflect 90% confidence. Clustering of features with a confidence interval of 0 is not statistically significant.

If a feature has a high z-score and a small p-value, it indicates a spatial clustering of high values. If the z-score is low and negative with a small p-value, it indicates a spatial clustering of low values. The higher (or lower) the z-score, the more intense the clustering. A z-score close to zero suggests no apparent spatial clustering.

Application Scenarios

Application areas include: crime analysis, epidemiology, voting pattern analysis, economic geography, retail analysis, traffic accident analysis, and demography. Some examples of applications are:

  • Where are disease outbreaks concentrated?
  • Where does the proportion of kitchen fires exceed the normal range among all residential fires?
  • Where should emergency evacuation zones be located?
  • Where and when do dense clusters appear?
  • In which locations and during what time periods should we allocate more resources?

Result Output

The result dataset from hotspot analysis includes the z-score (GI_ZSCORE), p-value (GI_PVALUE), and confidence interval (GI_CONFINVL). Both the z-score and p-value are measures of statistical significance used to determine whether to reject the null hypothesis for each feature. The confidence interval field identifies statistically significant hot spots and cold spots. Features with a confidence interval of +3 or -3 reflect 99% statistical confidence; +2 or -2 reflect 95% confidence; +1 or -1 reflect 90% confidence. Clustering for features with a confidence interval of 0 is not statistically significant. As shown in the table below:

Z-Score (Standard Deviation) P-Value (Probability) Confidence Level GI_CONFINVL Value
< -1.65 or > 1.65 < 0.10 90% -1 , 1
< -1.96 or > 1.96 < 0.05 95% -2 , 2
< -2.58 or > 2.58 < 0.01 99% -3 , 3

Parameter Description

Parameter Name Default Value Parameter Interpretation Parameter Type
Input Feature Dataset   The input feature dataset FeatureRDD
Evaluation Field   Evaluation field String
Conceptualization Model of Spatial Relation   Conceptualization model of spatial relation, supports Inverse Distance, Inverse Distance Squared, Fixed Distance, K-Nearest Neighbors, Zone of Indifference JavaConceptualizationModel
Break Distance Tolerance
(Optional)
0.0 Meter Break distance tolerance. Input format like "10 Meter". Invalid when conceptualization model is "K-Nearest Neighbors". JavaDistance
Inverse Distance Power Index
(Optional)
1.0 Inverse distance power index. Only valid when conceptualization model is "Inverse Distance", "Inverse Distance Squared", or "Zone of Indifference". Double
K-Nearest Neighbor Records
(Optional)
0 K-nearest neighbor records. Only valid when conceptualization model is "K-Nearest Neighbors". Integer
Self Weight Field
(Optional)
  Self weight field String
Apply FDR (False Discovery Rate) Correction
(Optional)
false Whether to apply FDR (False Discovery Rate) correction. When false, statistical significance is based on P-value and Z-score fields. Otherwise, the critical P-value for determining confidence level is lowered to account for multiple testing and spatial dependence. Boolean