High-low clustering is a measure of the degree of clustering of high or low values using Gettis- Ord General G statistics. General G Index is also an inferential statistic that uses limited data to estimate the characteristics of the overall situation. When the result returns a small, statistically significant p-value, Reject the null hypothesis can be made, in which case the observed General G Index will be larger than expected General G Index if the Z-value is positive. Indicates that high values of the attribute will be clustered in the study area; if the Z value is negative, the observed General G Index will be smaller than expected General G Index, indicating that low values of the attribute will be clustered in the study area.
Application case
- Look for unusual spikes in the number of a & E visits, which may indicate outbreaks of local or regional health problems.
- Compare the spatial patterns of different types of retail in the city, using comparison shopping to understand which industries are competitive (e.g., car dealerships) and which industries reject competition (e.g., health centers/gyms).
- Summarize the degree of clustering of spatial phenomena to examine changes in different periods or different locations. For example, the well-known clustering of cities and their populations. When using high/low clustering analysis, you can compare the degree of population clustering in a city over time (town development and density analysis)
Function entrance
- Spatial Statistical Analysis tab-> Analysis Mode-> High/Low Clustering. (iDesktopX)
- Toolbox, Spatial Statistical Analysis, Analysis Mode, High/Low Clustering. (iDesktopX)
Main parameters
- Source Data: Set the Vector Dataset to be analyzed, which supports three types of Dataset: point, line and surface.
- Evaluation Field: Set the Property Field value of the analysis element involved in the analysis. Only numerical fields are supported.
- Conceptual Model: The selection should reflect the inherent relationships between the features to be analyzed. Set the way features interact with each other in space. The more realistic the model, the more accurate the results will be.
- Fixed Distance: Applicable to point data and face data with large size change.
- Polygon Adjacent (Common Edges/Intersect): Applies to face data with adjacent edges and intersections.
- Polygon Adjacent (Node/Common Edges/Intersect): Applies to face data with adjacent points, adjacent edges, and intersections.
- Inverse Distance: All features are treated as neighbors to all other features. All features contribute to the target feature, but the contribution decreases as the distance increases. Features are weighted as a fraction of the distance. Applies to continuous data.
- Inverse Distance Square: Similar to the Inverse Distance ", influence decreases more rapidly as distance increases, and the weight between elements is one part of the square of the distance.
- K Nearest Neighbors: The K elements closest to the target element are included in the calculation of the target element (the weight is 1), and the remaining elements are excluded from the calculation of the target element (the weight is 0). This option is useful if you want to ensure that you have a minimum number of adjacent features for analysis. This method works well when the distribution of the data varies over the study area such that some elements are distant from all others. When the proportion of fixed analysis is not as important as fixed adjacent Records, the K nearest neighbor method is more suitable.
- Spatial Weight Matrix: a Spatial Weight Matrix File is required. Spatial weights are numbers that reflect the distance, time, or other costs between each element and any other element in the Dataset. If you want to model the accessibility of urban services, for example, to find areas of urban crime concentration, it is a good way to model spatial relationships with the help of networks. You can select an existing Spatial Weight Matrix File (.swmb) or create a new one based on the Source Dataset.
- Undifferentiated Region: This model is a combination of Inverse Distance "and Fixed Distance" that treats each feature as an adjacent feature to each other. This option is not suitable for large Datasets. Features within the specified fixed distance range have equal weight (weight of 1); features outside the specified fixed distance range have less influence as the distance increases.
- Break Distance Tolerance: "-1" indicates that the default distance is calculated and applied. This default value is to ensure that each feature has at least one adjacent feature. "0" indicates that no distance is applied, and each feature is an adjacent feature. A non-zero positive value indicates adjacent features when the distance between features is less than this value.
- Inverse Distance Power Index: An index that controls the importance of the distance value. The higher the power value, the smaller the influence of the distance.
- Number of Adjacent Elements: Set a positive integer, indicating that the nearest K elements around the target element are adjacent elements. This parameter needs to be set when "K Nearest Neighbors" "is selected for the conceptual model.
- Measure Distance Method: The Measure Distance method uses Euclidean distance and Manhattan distance. Detail Description for Euclidean Distance and Manhattan Distance. Refer to the Basic Vocabulary of Spatial Statistical Analysis .
- Spatial Weights Matrix Standard ization: Spatial Weights Matrix Standard ization is recommended when the distribution of features may deviate due to sampling design or imposed aggregation scheme. When you select a Spatial Weights Matrix Standard ization, each weight is divided by the sum of the rows (the sum of the weights of all adjacent features). Weighting of Spatial Weights Matrix Standard ization is typically used in conjunction with fixed distance neighboring features, and is almost always used for neighboring features based on face adjacency. This reduces the bias that occurs when an element has a different number of adjacent elements. The Spatial Weights Matrix Standard ization will scale all weights between 0 and 1, creating a relative (rather than absolute) weight scheme. You may want to select the Spatial Weights Matrix Standard ization "option whenever you are working with a face feature that represents an administrative boundary.
Explanation of results
Analyst Result is a CAD Dataset and will be displayed in a Map.
The High/Low Clustering Analyst Result includes five parameters: General G index, expected value, variance, Z score, and P value. The High/Low Clustering analysis is an inferential statistic, which means that the Analyst Result will be interpreted under the null hypothesis. The Analyst Result returns a p value that is small and statistically significant, you can Reject the null hypothesis. If the null hypothesis is rejected, the sign of the z-score will become very important. If the z-score is a positive number, the observed General G Index will be larger than expected General G Index, indicating that high values of the attribute will be clustered in the study area; If the z-score is negative, the observed General G Index will be smaller than expected General G Index, indicating that low values of the attribute will be clustered in the study area. As shown in the following figure:
High/Low Clustering analysis is preferred when there is a perfectly uniform distribution of values and you want to look for unusual spatial peaks of high values. When the observed General G Index is equal to the expected General G Index, the high and low values are clustered at the same time, and they tend to cancel each other. Spatial Autocorrelation analysis can be used at this point.
Instance
Case data: Click here to download the case data . After downloading, unzip it for use.
For the existing viral hepatitis data of a province in a certain year, perform High/Low Clustering analysis on the number of cases of viral hepatitis data (Pneumonia), set the evaluation fields as the number of cases respectively, and the conceptualization mode as Inverse Distance. The Measure Distance method is Euclidean Distance, which standardizes the spatial weight matrix and defaults to other methods. Analyst Result is as follows:
The following conclusions can be drawn from the Analyst Result:
Under the assumption of Random Distribution, P value < 0.01 and Z score > 2.58, the Analyst Result of the number of hepatitis cases in the province in the current year was significant with 99% confidence. The General G value was higher than expected General G index and the Z value was significant, and the observed incidence number showed High Value Aggregate, indicating that the incidence number showed clustering in the high value area.
Related topics
Incremental Spatial Autocorrelation