Hot spot analysis

Instructions for Use

Perform hotspot analysis on point, line, and surface datasets. The dataset to be analyzed must have an ID. The returned result type is Feature Dataset (FeatureRDD). Hot spot analysis is the use of local General G index statistics to identify statistically significant hot and cold spots given a set of weighted elements. Hotspot analysis looks at every element in the neighboring element environment, so only an isolated high value does not constitute a hotspot. A single element and its neighbors are considered hotspots if they are all high values.

Analysis principle

When measuring high and low value clustering, it is usually necessary to use the General G index. In the hot spot analysis tool, z score and p value are both measures of statistical significance, which are used to judge whether to reject the null hypothesis element by element. The elements in the confidence interval (Gi_Bin field)+3 to -3 reflect statistical significance with a confidence level of 99%, the elements in the confidence interval+2 to -2 reflect statistical significance with a confidence level of 95%, and the elements in the confidence interval+1 to -1 reflect statistical significance with a confidence level of 90%; The clustering of elements in confidence interval 0 is not statistically significant. If the z-score of the element is high and the p-value is small, it indicates a high value spatial clustering. If the z-score is low and negative, and the p-value is small, it indicates a low value spatial clustering. The higher (or lower) the z-score, the greater the degree of clustering. If the z-score is close to zero, it indicates that there is no significant spatial clustering.

Application Cases

Application areas include: crime analysis, epidemiology, voting pattern analysis, economic geography, retail analysis, traffic accident analysis, and demographics. Some application examples include: *Where are the concentrated outbreaks of diseases located? *Where does the proportion of kitchen fires in all residential fires exceed the normal range? *Where should the emergency evacuation area be located? *Where/when did dense areas appear? *Where and during what time period should we allocate more resources?

Return Results

The result dataset of hot spot analysis includes z-score (GI_ZSCORE), P-value (GI_PALUE), and confidence interval (GI_CONFINVL). Z score and P value are both measures of statistical significance, which are used to judge whether to reject the null hypothesis element by element. The confidence interval field identifies hot and cold spots with statistical significance. The elements with confidence intervals of+3 and -3 reflect statistical significance with a confidence level of 99%, the elements with confidence intervals of+2 and -2 reflect statistical significance with a confidence level of 95%, the elements with confidence intervals of+1 and -1 reflect statistical significance with a confidence level of 90%, and the clustering of elements with confidence intervals of 0 has no statistical significance. As shown in the table below:

Z-score (standard deviation)	P-value (probability)	Confidence level	GI_ ConfiNVL value
<-1.65 or>1.65	< 0.10	90%	-1 , 1
<-1.96 or>1.96	< 0.05	95%	-2 , 2
<-2.58 or>2.58	< 0.01	99%	-3 , 3

## Parameter Description |Parameter Name | Default Value | Parameter Definition | Parameter Type| |:-----|:----|:-------|:----| |Imported Feature Dataset | | Imported Feature Dataset | FeatureRDD| |Evaluation Field | | Evaluation Field | String| |Spatial Relationship Conceptualization Model | | Spatial Relationship Conceptualization Model, Supports Inverse Distance, Inverse Distance Square, Fixed Distance, K-Nearest Neighbor, Undifferentiated Region | Java Conceptualization Model| |Interrupt Distance Tolerance
(Optional) | 0.0 Meter | Interrupt Distance Tolerance, input format such as "10 Meter", invalid when conceptualization mode is "K Nearest Neighbor"| JavaDistance| |Inverse distance power exponent
(Optional) | 1.0 | Inverse distance power exponent, valid only when conceptualization modes are "Inverse distance", "Inverse distance squared", and "undifferentiated region"| Double| |Number of Adjacent Objects
(Optional) | 0 | Number of k Adjacent Objects, only valid when conceptualization mode is' K Nearest Neighbors' | Integer| |Self weight field
(Optional) | | Self weight field | String| |Whether to perform FDR (error detection rate) correction
(Optional) | false | Whether to perform FDR (error detection rate) correction. When false, the statistical significance is based on the P-value and Z-field. Otherwise, the key P-value for determining confidence will be reduced to balance multiple tests and spatial dependence | Boolean|