Hotspot Analysis is the use of local General G Index statistics to identify statistically significant hot and cold spots given a set of weighted elements. Hotspot Analysis looks at every feature in the context of adjacent features, so a single isolated high value does not constitute a hotspot. A single feature and its surroundings are high values, that is, the area is a cluster of high values and high values, which is called a hotspot. On the contrary, the cold spot indicates that not only its own value is very low, but also it is adjacent to low values, that is, the gathering area of low values and low values.
Application case
Application areas include crime analysis, epidemiology, voting pattern analysis, economic geography, retail analysis, traffic accident analysis, and demography. Some examples of applications include:
- Where is the concentrated outbreak of disease?
- Where do kitchen fires account for more than normal proportion of all residential fires?
- Where should the emergency evacuation zone be located?
- Where/when does the peak concentration occur?
- Where and when should we allocate more resources?
Function entrance
- Spatial Statistical Analysis tab-> Clustering Distribution-> Hotspot Analysis. (iDesktopX)
- Toolbox-> Spatial Statistical Analysis-> Clustering Distribution-> Hotspot Analysis. (iDesktopX)
Main parameters
- Source Data: Set the Vector Dataset to be analyzed, which supports three types of Dataset: point, line and surface.
- Evaluation Field: Set the Property Field value of the analysis element involved in the analysis. Only numerical fields are supported.
- Conceptual Model: The selection should reflect the inherent relationships between the features to be analyzed. Set the way features interact with each other in space. The more realistic the model, the more accurate the results will be.
- Fixed Distance: Applicable to point data and face data with large size change.
- Polygon Adjacent (Common Edges/Intersect): Applies to face data with adjacent edges and intersections.
- Polygon Adjacent (Node/Common Edges/Intersect): Applies to face data with adjacent points, adjacent edges, and intersections.
- Inverse Distance: All features are treated as neighbors to all other features. All features contribute to the target feature, but the contribution decreases as the distance increases. Features are weighted as a fraction of the distance. Applies to continuous data.
- Inverse Distance Square: Similar to the Inverse Distance ", influence decreases more rapidly as distance increases, and the weight between elements is one part of the square of the distance.
- K Nearest Neighbors: The K elements closest to the target element are included in the calculation of the target element (the weight is 1), and the remaining elements are excluded from the calculation of the target element (the weight is 0). This option is useful if you want to ensure that you have a minimum number of adjacent features for analysis. This method works well when the distribution of the data varies over the study area such that some elements are distant from all others. When the proportion of fixed analysis is not as important as fixed adjacent Records, the K nearest neighbor method is more suitable.
- Spatial Weight Matrix: a Spatial Weight Matrix File is required. Spatial weights are numbers that reflect the distance, time, or other costs between each element and any other element in the Dataset. If you want to model the accessibility of urban services, for example, to find areas of urban crime concentration, it is a good way to model spatial relationships with the help of networks. Before analysis, create a Spatial Weight Matrix File (.swmb) using the Generate Cyberspace Weights Tool, and then specify the full path to the SWMB file you created.
- Undifferentiated Region: This model is a combination of Inverse Distance "and Fixed Distance" that treats each feature as an adjacent feature to each other. This option is not suitable for large Datasets. Features within the specified fixed distance range have equal weight (weight of 1); features outside the specified fixed distance range have less influence as the distance increases.
- Break Distance Tolerance: "-1" indicates that the default distance is calculated and applied. This default value is to ensure that each feature has at least one adjacent feature. "0" indicates that no distance is applied, and each feature is an adjacent feature. A non-zero positive value indicates adjacent features when the distance between features is less than this value.
- Inverse Distance Power Index: An index that controls the importance of the distance value. The higher the power value, the smaller the influence of the distance.
- Number of Adjacent Elements: Set a positive integer, indicating that the nearest K elements around the target element are adjacent elements.
- Measure Distance Method: The Measure Distance method uses Euclidean distance and Manhattan distance. Detail Description for Euclidean Distance and Manhattan Distance. Refer to the Basic Vocabulary of Spatial Statistical Analysis .
- FDR Correction or not: If False Discovery Rate (FDR) Correction, then statistical significance will be based on False Discovery Rate Correction. Otherwise, statistical significance will be based on P-value and z-score fields.
- Self Weight Field: Set the distance weight value. Only numeric fields are supported.
- Result Settings: Set the Datasource and Dataset Name where the Result Data will be saved.
Explanation of results
The returned Result Dataset will contain three Property Fields: z-score (Gi _ Zscore) and p-value (Gi _ Pvalue), confidence interval (Gi _ ConfInvl). Map will render the content of the Gi _ ConfInvl field, and the statistic histogram of the evaluation field will be displayed in the statistic chart window. The specific meaning of each field is explained as follows:
Z Score (SD) | The meaning of representation | Analysis of hot and cold spots |
---|---|---|
Z > 0 and small value of P | Indicates a spatial cluster of high values. The higher the Z score, the greater the degree of clustering. | Hotspot, the corresponding Gi _ ConfInvl field is a positive number. |
Z is close to 0 | Indicates that there is no obvious spatial clustering. | -- |
Z < 0 and small value of P | Indicates a spatial cluster of low values. The lower the z-score, the greater the clustering | For a cold spot, the corresponding Gi _ ConfInvl field is negative. |
In the attribute table, the cross-reference meaning of the detailed value:
Z Score (SD) | P-value (probability) | Gi _ ConfInvl Value | Confidence level | Analyst Result |
---|---|---|---|---|
<-2.58 | <0.01 | -3 | 99% | Cold spot, with a statistical significance of 99% confidence. |
< -1.96 | < 0.05 | -2 | 95% | Cold spot, with a 95% confidence level of statistical significance. |
<-1.65 | <0.1 | -1 | 90% | Cold spot, with a statistical significance of 90% confidence. |
< Close to 0 | -- | 0 | -- | There is no statistical significance. |
>1.65 | <0.1 | 1 | 90% | Hot spots, with a statistical significance of 90% confidence. |
>1.96 | < 0.05 | 2 | 95% | Hot spots, with a 95% confidence level of statistical significance. |
>2.58 | <0.01 | 3 | 99% | Hot spots, with a statistical significance of 99% confidence. |
Instance
Carry out Hotspot Analysis on the incidence of viral hepatitis in a region in 2013, set the evaluation field as the number of cases in 2013, the conceptual model as Inverse Distance, and the Measure Distance method as Euclidean Distance. The spatial weight matrix is normalized, and others are defaulted. The Result Dataset property table is as follows:
Under the assumption of Random distribution, the results show that:
- The Z value of the red area in the northwest of the
- region is greater than 2. 58, which is surrounded by high values, showing a high value clustering, thus forming a spatial aggregation distribution feature of the region with a high number of morbidity. Therefore, it can be concluded that the significant areas in the northwest are the areas surrounded by high values, there are about 5, showing obvious High Value Aggregate areas, which are the high-risk areas of viral hepatitis and need to take preventive measure. The Z value in the gray
- area is close to 0, which is a non-statistical characteristic area.
cases is as follows: