Geographic Detector

The uneven spatial distribution of economic society, Land Use, biodiversity and climate characteristics is the diversity of spatial differentiation in the process of natural and socio-economic development. Spatial heterogeneity, also known as spatial hierarchical heterogeneity, refers to the geographical phenomenon that the variance within a layer is less than variance between layers, which is manifested as classification or zoning. Such as geographical zoning, climate zones, Land Use Map, and national major function zoning. Various types of habitats, such as different landforms, soil types and climates, provide shelter for a large number of species, and spatial stratification and heterogeneity are of great significance to geographical research. Spatial hierarchical heterogeneity is the hierarchical regularity of spatial heterogeneity.

Geographic Detector is a Spatial Analysis method that detects spatial differentiation and reveals the driving forces behind it. The core idea is based on the assumption that if an independent variable has an important effect on a dependent variable, the spatial distribution of the independent variable and the dependent variable should be similar. Geographic differentiation can use Geographic Detector for Statistic Analysis. Geographic Detector has two advantages. First, Geographic Detector can detect both numerical data and qualitative data. Second, the interaction of the two factors on the dependent variable can be detected. By calculating and comparing the Q value of each single factor and the Q value of the superposition of two factors, Geographic Detector can judge whether there is interaction between the two factors, as well as the strength, direction, linearity or nonlinearity of the interaction. The superposition of two factors includes not only the multiplication relationship, but also other relationships, as long as there is a relationship, it can be tested.

Note: Functional principle and case cited from: Wang Jinfeng, Xu Chengdong. Geographic Detector: Principle and Prospect J. Acta Geographica Sinica, 2017, 72 (1): 116-134. (Http://geoscien.neigae.ac.cn/article/2017/0375-5444/0375-5444-72-1-116.shtml)

Functional principle

Geographic Detector is used to analyze spatial hierarchical heterogeneity, mainly including four detectors (Factor Detector, Risk Area Detector, Ecological Detector and Interaction Detector). Analyst Result can answer the following questions respectively: ① Is there spatial heterogeneity? What factors contribute to this hierarchical heterogeneity? ② Is there a significant regional difference in variable Y? ③ What is the relative importance of factor X? (4) Does factor X play an independent role in dependent variable Y or has a generalized interaction?

Differentiation and Factor Detection

Detecting the spatial differentiation of Y and the extent to which a certain factor X explains the spatial differentiation of attribute Y is measured by the value of Q (Wang et al., 2010b), and the expression is:

Where H = 1, … , L is the stratification of variable Y or factor X, i.e., classification or partition; Nh and N are the number of units of layer H and the whole area, respectively; σ2 H and σ2 are the variances of the Y values of the layer H and the whole area, respectively. SSW and SST are the sum of the variance within the layer (Within Sum of Squares) and the total variance of the whole area (Total Sum of Squares), respectively. The value range of Q is 0,1, and the larger the value is, the more obvious the spatial differentiation of Y is; if the stratification is generated by the independent variable X, the larger the value of Q is, the stronger the explanatory power of the independent variable X to the attribute Y is, and vice versa. In the extreme case, a Q value of 1 indicates that the factor X completely controls the spatial distribution of Y, a Q value of 0 indicates that the factor X has nothing to do with Y, and a Q value indicates that X explains 100 × Q% of Y.

A simple transformation of the Q-value satisfies the non-central F distribution (Wang et al., 2016a):

Where: λ is the non-central parameter; Yh is the mean value of layer H.

Interaction Detection

Identify the interaction between different risk factors Xs, that is, assess whether factors X1 and X2 together increase or decrease the explanatory power of dependent variable Y, or whether the effects of these factors on Y are independent of each other. The evaluation method comprises the following steps of: firstly, respectively calculating Q values of two factors X1 and X2 to Y, namely, Q (X1) and Q (X2), calculating a Q value when the two factors are interacted (a new polygon distribution formed by superposing two tangent layers of the variables X1 and X2), namely, Q (X1 ∩ X2), and comparing the Q (X1), the Q (X2) and the Q (X1 ∩ X2). The relationship between the two factors can be divided into the following categories:

Risk Area Detection

It is used to judge whether there is a significant difference between the attribute means of the two sub-regions, and t statistics is used to test:

Where Yh is the mean of an attribute, such as incidence or prevalence, in subregion H, NH is the number of samples in subregion H, and Var is the variance. The statistic t approximately follows the Student's t distribution, where the Calculator Method for degrees of freedom is:

The null hypothesis H0 : Yh = 1 = Yh = 2 , if H0 is rejected at confidence level α, It is considered that there is a significant difference in the mean values of the attributes between the two sub-regions.

Ecological Exploration

It is used to compare whether there is a significant difference between the effects of the two factors X1 and X2 on the spatial distribution of the attribute Y, which is measured by the F statistic:

In the formula, NX1 and NX2 respectively represent the sample sizes of the two factors X1 and X2; SSWX1 and SSWX2 denote the sum of the intra-layer variances of the layers formed by X1 and X2, respectively; L1 and L2 denote the number of layers of the variables X1 and X2, respectively. Where the null hypothesis H0 : SSWX1 = SSWX2 . If H0 is rejected at the significance level of α, this indicates that there is a significant difference between the effects of the two factors X1 and X2 on the spatial distribution of the attribute Y.

Function entrance

  • Spatial Statistical Analysis tab-> Analysis Mode-> Geographic Detector. (iDesktopX)
  • Toolbox, Spatial Statistical Analysis, Analysis Mode, Geographic Detector. (iDesktopX)

Main parameters

  • Source Data: Set the Dataset to be analyzed. Point, line, surface and Class Dataset of attribute table are supported.
  • Dependent variable field: refers to the measured or recorded variable, which will change with the change of another (or several) variables, and is a numerical quantity, such as the incidence of neural tube defects and birth defects (NTDs) in each village.
  • Independent Variable: It is the factor or condition that causes the dependent variable to change. It is the explanatory variable of the dependent variable. It supports setting multiple explanatory variables, such as soil type, elevation, hydrologic basin, etc. Note that the independent variable here should be a type quantity. If it is a numerical quantity, it should be grouped or stratified to minimize the variance within the group and maximize the variance between the groups. Grouping can be based on expert knowledge, k-means can be used, or it can be divided equally after sorting. It shall be ensured that there are at least two sample units of the dependent variable in each group or stratum of categorical variables, so that the mean or variance of the stratum can be calculated.
  • Result Data: Datasource of the specified Save Analysis Results. The Analyst Result of the four detectors will generate a new Tabular Dataset and store it in the Datasource.

Explanation of results

All the detector results will generate a new Tabular Dataset to be stored in the Datasource, and the Output Analysis Results in the Geographic Detector panel on the right will analyze the results of each detector as follows:

  • Factor Detector: to detect the spatial hierarchical heterogeneity of variable Y, and to detect the extent to which a certain factor X explains the spatial differentiation of variable Y, measured by Q value. If the stratification is generated by the independent variable X, the larger the Q value is, the more consistent the spatial distribution of X and Y is, and the stronger the explanatory power of the independent variable X for the attribute Y is, and vice versa. The Factor Detector _ result Tabular Dataset is the factor detection result.
  • Ecological Detector: It is used to compare whether there is a significant difference in the impact of different impact factors on the spatial distribution of attribute values. The Ecological Detector _ result Tabular Dataset are the results of ecological exploration.
  • Interaction Detector: It is used to identify the interaction between different explanatory variables, and evaluate whether the two factors will increase or decrease the explanatory power of the dependent variable when they work together, or whether these factors are independent of each other. The Interaction Detector _ result Tabular Dataset is the result of interaction detection, and the types of interaction between explanatory variables and dependent variables include:
    • Weaken, nonlinear: nonlinear weakening;
    • Weaken, uni-: single factor nonlinear attenuation;
    • Enhance, bi-: two-factor enhancement;
    • Independent: independent;
    • Enhance, nonlinear: nonlinear enhancement.
  • Risk Detector: It is used to judge whether the mean values of attributes in different regions are significant. The RiskDetector _ result Tabular Dataset is the detection result of the risk area.

Instance

Case Data: Click here to download < a class = "contentpage hyperlink" href = "./Data/GeoDe Tector. Zip" "=" "> GeoDe Tector case data , Download and unzip to use.

The Geographic Detector function is used to analyze the incidence of neural tube defects and birth defects (NTDs) in a county. The environmental factor variables include: soil type, elevation, and hydrological watershed. The following figure shows the data of environmental factor analysis:

Analyst Result is as follows:

  • Factor Detector: Show Result is the Calculate Result of all factor Q values. The result shows that the hydrological watershed variable (watershed) has the highest Q value, indicating that the river is the most important environmental factor determining the spatial pattern of NTDs among these variables.

  • Ecological Detector: The results were tested by t test with a significance level of 0.05. "√" means significant, and "×" means not significant. As far as the effect of soil type on the spatial distribution of NTDs is concerned, there is a significant difference between soil type and other variables.

  • Interaction Detector: Taking the elevation factor as an example, the results show that the interaction effect of any two variables on the spatial distribution of NTDs is greater than that of the first variable alone, and the interaction effect of two explanatory variables on the spatial distribution of NTDs is a two-factor enhancement.

  • Risk Detector: Show Results shows the results of risk area detection for a single risk factor. Taking soil type as an example, the x-axis in the histogram is Unique Value, which is the number of each layer of environmental factors; the y-axis is Mean od explained variable, which is the average incidence of NTDs in each soil type area. The significance comparison of

    each type is to compare whether the incidence of NTDs on each soil type (1-5) is significantly greater than that on another soil type by using the t test with the significance level of 0.05. "√" indicates that there is significance, and "×" indicates that there is no significance.

Related topics

Spatial Autocorrelation

High/Low Clustering

Incremental Spatial Autocorrelation

Average Nearest Neighbor