Feature Description
Performs hotspot analysis on point, line, or region datasets. The dataset to be analyzed must have an ID field. The result type returned is a feature dataset (FeatureRDD).
Given a set of weighted features, Hotspot Analysis identifies statistically significant hot spots and cold spots using the Getis-Ord Gi* statistic. Hotspot analysis examines each feature within the context of its neighboring features. Therefore, a single isolated high value does not constitute a hotspot; a feature is considered a hotspot only if both itself and its neighbors have high values.
Analytical Principle
The General G Index is typically used to measure high and low value clustering. In the hotspot analysis tool, the z-score and p-value are measures of statistical significance used to determine whether to reject the null hypothesis for each individual feature. Features with a confidence interval (Gi_Bin field) between +3 and -3 reflect statistical significance at the 99% confidence level; those between +2 and -2 reflect 95% confidence; those between +1 and -1 reflect 90% confidence. Clustering of features with a confidence interval of 0 is not statistically significant.
If a feature has a high z-score and a small p-value, it indicates a spatial clustering of high values. If the z-score is low and negative with a small p-value, it indicates a spatial clustering of low values. The higher (or lower) the z-score, the more intense the clustering. A z-score close to zero suggests no apparent spatial clustering.
Application Scenarios
Application areas include: crime analysis, epidemiology, voting pattern analysis, economic geography, retail analysis, traffic accident analysis, and demography. Some examples of applications are:
- Where are disease outbreaks concentrated?
- Where does the proportion of kitchen fires exceed the normal range among all residential fires?
- Where should emergency evacuation zones be located?
- Where and when do dense clusters appear?
- In which locations and during what time periods should we allocate more resources?
Result Output
The result dataset from hotspot analysis includes the z-score (GI_ZSCORE), p-value (GI_PVALUE), and confidence interval (GI_CONFINVL). Both the z-score and p-value are measures of statistical significance used to determine whether to reject the null hypothesis for each feature. The confidence interval field identifies statistically significant hot spots and cold spots. Features with a confidence interval of +3 or -3 reflect 99% statistical confidence; +2 or -2 reflect 95% confidence; +1 or -1 reflect 90% confidence. Clustering for features with a confidence interval of 0 is not statistically significant. As shown in the table below:
| Z-Score (Standard Deviation) | P-Value (Probability) | Confidence Level | GI_CONFINVL Value |
| < -1.65 or > 1.65 | < 0.10 | 90% | -1 , 1 |
| < -1.96 or > 1.96 | < 0.05 | 95% | -2 , 2 |
| < -2.58 or > 2.58 | < 0.01 | 99% | -3 , 3 |
Parameter Description
| Parameter Name | Default Value | Parameter Interpretation | Parameter Type |
|---|---|---|---|
| Input Feature Dataset | The input feature dataset | FeatureRDD | |
| Evaluation Field | Evaluation field | String | |
| Conceptualization Model of Spatial Relation | Conceptualization model of spatial relation, supports Inverse Distance, Inverse Distance Squared, Fixed Distance, K-Nearest Neighbors, Zone of Indifference | JavaConceptualizationModel | |
| Break Distance Tolerance (Optional) |
0.0 Meter | Break distance tolerance. Input format like "10 Meter". Invalid when conceptualization model is "K-Nearest Neighbors". | JavaDistance |
| Inverse Distance Power Index (Optional) |
1.0 | Inverse distance power index. Only valid when conceptualization model is "Inverse Distance", "Inverse Distance Squared", or "Zone of Indifference". | Double |
| K-Nearest Neighbor Records (Optional) |
0 | K-nearest neighbor records. Only valid when conceptualization model is "K-Nearest Neighbors". | Integer |
| Self Weight Field (Optional) |
Self weight field | String | |
| Apply FDR (False Discovery Rate) Correction (Optional) |
false | Whether to apply FDR (False Discovery Rate) correction. When false, statistical significance is based on P-value and Z-score fields. Otherwise, the critical P-value for determining confidence level is lowered to account for multiple testing and spatial dependence. | Boolean |