The theoretical basis of Spatial Statistical Analysis is based on statistics. Understanding some basic statistical vocabulary and concepts is helpful in understanding Spatial Statistical Analysis features.
The null hypothesis
A statistical term, also known as a null hypothesis, refers to a hypothesis that is established in advance when a statistical test is performed. When the null hypothesis is true, the relevant statistics should obey a known probability distribution. When the calculated value of the statistic falls into the negative domain, it is known that a small probability event has occurred, and the original hypothesis should be negated. The null hypothesis in Spatial Statistical Analysis is spatial randomness, that is, Spatial Data is Spatial Random Distribution. Or the value associated with Spatial Data is Spatial Random Distribution. As shown in the figure below, if your Calculate Result is in the range of -2 to 2, it means that your assumption is acceptable, but if the Calculate Result is not in this range, it means that it is a small probability event.
For example, if there are N crimes in a city in February, they should be uniformly distributed in every area of the city without any additional conditions. This is the so-called "null hypothesis". In spatial statistics, the null hypothesis refers to the completely random (uniform) distribution of spatial locations in a certain area. Criminal incidents may also be Random Distribution, some areas do not have one, some areas have several, we need to analyze the P value and Z value, To determine whether it is a Accept the null hypothesis or a Reject the null hypothesis.
P value
The p-value represents the probability. It reflects the probability of an event. In Spatial Statistical Analysis, the p-value represents the probability that the observed spatial pattern is spatially random. When the value of p is small, it means that the observed spatial pattern is unlikely to be generated by random processes, so it can be Reject the null hypothesis.
Z score
Z score, representing a multiple of the standard deviation. The standard deviation can reflect the dispersion of a Dataset. Both the z-score and the p-value are associated with a standard normal distribution. The critical values of the Z score and the P value are shown in the following figure:
As shown in the figure above, the P value and the Z score generally appear together. Both z-scores and p-values are associated with a standard normal distribution, with very high or very low (negative) z-scores occurring at either end of the normal distribution, which are associated with very small p-values. When you run an analysis and get a very small p-value and a very high or very low Z-score, it is an indication that the observed spatial pattern is unlikely to reflect the theoretical random pattern represented by the null hypothesis, allowing for Reject the null hypothesis, that is, whether the data is aggregated or discrete.
Confidence level
Spatial Statistical Analysis uses inferential statistics to establish a "null hypothesis" in advance when conducting statistical tests, assuming that the values of elements or the correlation between elements are random spatial patterns. The P value is used to represent the correct probability of the "null hypothesis" and to determine whether to accept or reject the "null hypothesis". The Z score is a multiple of the standard deviation used to determine whether the data is clustered, discrete, or random. Typical confidence levels are 90, 95, or 99%. For example, when P is less than 0.1 in the result of calculation, it means that the data has a 10% probability of being random and a 90% probability of being clustered or discrete. In this case, the "null hypothesis" can be rejected and the data can be considered to be clustered or discrete.
The following table shows the uncorrected critical p-values and critical z-scores at different levels of confidence (the corrected critical p-values can be used by applying the False Discovery Rate (FDR). These critical values are equal to or less than values shown in the following table.)
Z Score (SD) | P-value (probability) | Confidence level |
---|---|---|
* < -1.65 or * > 1.65 | *<0.10 | 90% |
* < -1.96 or * > 1.96 | *<0.05 | 95% |
* < -2.58 or * > 2.58 | *<0.01 | 99% |
Morans
Morans (Moran's I) is an important index used to measure the spatial correlation. Morans is a rational number, and after variance normalization, its value is normalized to between -1.0 and 1.0. Moran's I > 0 indicates positive spatial correlation, and the larger the value is, the more obvious the spatial correlation is. Moran's I < 0 indicates negative spatial correlation, and the smaller the value is, the greater the spatial difference is. Moran's I = 0, and the space is random.
Center of mass
The Dataset Type entered by some functions of Spatial Statistical Analysis can be point, line, or surface. However, in spatial relationships such as inverse distance or fixed distance, the actual Spatial Distance is required to calculate the spatial weight. Therefore, for point, line, and face objects, the centroid of the object is used in Measure Distance. The centroid of an object is the weighted Mean Center of all child objects. The weighting term for a point object is 1 (that is, the center of mass is itself), the weighting term for a line object is length, and the weighting term for a face object is area.
Self weight
Some features of Spatial Statistical Analysis allow the user to provide a numeric field to represent their own weight value. Self weight is the distance or weight between an element and itself. In the usual case, the weight value is 0. However, if the user specifies his own Weight Field, this value will participate in the calculation instead of himself.
Distance
There are two types of distances in Spatial Statistical Analysis, Euclidean Distance and Manhattan Distance.
Euclidean Distance is the most commonly used Distance Measure method in the rectangular coordinate system, that is, the plane Straight-line distance between two points, if the coordinates of the two points are (x1, Y1) and (x2, Y1) respectively. Y 2), then the European Measure Distance formula is:
Manhattan Distance is a measurement method different from Euclidean Distance. The distance between two points is no longer a straight-line distance, but the sum of the lengths projected to the coordinate axis. It is the sum of the distance of the projection of the line segment formed by two points on the axis in the Euclidean Distance fixed rectangular coordinate system. The calculation formula is:
Regression analysis
Regression analysis is a Statistic Analysis method to determine the quantitative relationship between two or more variables. According to the relationship between independent variables and dependent variables, it can be divided into linear regression analysis and nonlinear regression analysis. If only one independent variable and one dependent variable are included in the regression analysis, and the relationship between them can be approximated by a straight line, this kind of regression analysis is called monadic linear regression analysis. If two or more independent variables are included in the regression analysis, and there is a linear relationship between the dependent variable and the independent variables, it is called multiple linear regression analysis.
Spatial Weight Matrix
Spatial weight matrix is a representation of the spatial structure of data. It is a quantification of the spatial relationships that exist between Dataset elements (or, at least, a quantification of the way such relationships are conceptualized). Because the spatial weight matrix imposes a structure on the data, you should choose a conceptualization that best reflects how the elements actually interact with each other (and, of course, what you are trying to measure). For example, if you want to measure the clustering of a particular species of seed-reproducing tree species in a forest, some form of inverse distance may be most appropriate. However, if the geographic distribution of commuters in an area is to be assessed, travel time and travel costs may be better options.
Although physically implemented by various methods, the spatial weight matrix is conceptually an NxN table (N represents the number of elements in the Dataset). Each element has only one row and one column. The pixel values for any given row/column combination are the weights and can be used to quantify the spatial relationship between these row and column elements.