Regression analysis models, examines, and explores spatial relationships, explains causal relationships between dependent and independent variables, provides a better understanding of key factors affecting modeled variables, and predicts unknown values. Regression analysis is used for the following reasons:
- Find the cause: model a certain phenomenon to better understand the cause of the phenomenon, analyze the influence of explanatory variables on the phenomenon, and what corresponding measures should be taken. For example, to understand the characteristics of the main living environment of some endangered species (such as temperature, humidity, food, hunting, natural enemies), so as to assist the protection of the species through relevant environmental protection, legal provisions and other measures.
- Prediction: Based on the known values of a set of dependent variables and the relationship between the dependent variables and the explanatory variables, predict the values of the independent variables at the designated time or place. For example, the fiscal revenue in the first half of this year can be predicted according to the total industrial output value, retail sales of consumer goods, total exports and fixed assets investment in the first half of this year through the value and relationship between last year's fiscal revenue and the total industrial output value, retail sales of consumer goods, total exports and fixed assets investment.
- Data Mining: To understand or test whether there is a positive or negative correlation between two phenomena. For example, are there more burglaries in economically wealthy areas or in poor areas?
Application of regression analysis
The following types of questions can be addressed: Why does a phenomenon persist and what factors cause it? Modeling a phenomenon to predict values at other places or other times?
Regression analysis is widely used to explain market share, sales, brand preference, and marketing effectiveness. The problem to be solved by regression analysis is to express the quantitative relationship between two or more fixed distances or proportions in a functional form. Some examples of applications include:
- Modelling secondary school retention to better understand the factors that contribute to children staying in school.
- The function between traffic accidents and speed, road conditions, weather and other factors is constructed to provide data reference information for the police, aiming at reducing the traffic accident rate.
- Construct a function between property damage due to fire and variables such as fire department involvement, response time, or property value. If response time is found to be a critical factor, you may need to build more fire stations. If the level of intervention is found to be critical, you may need to increase the number of equipment and firefighters.
Principles of analysis
A regression equation is a mathematical formula that uses one or more explanatory variables to best predict the dependent variable. The dependent variable in the regression equation is always labeled y, and the independent or explanatory variable is always labeled X, which may be inconvenient for professionals in the geography community who regard X and y as coordinates. Each independent variable is associated with a regression coefficient that describes the strength and sign of the relationship between the variable and the dependent variable. A possible form of the regression equation is as follows, where y is the dependent variable, X is the explanatory variable, and β is the regression coefficient.
- Dependent (y): This variable represents the observation you are trying to predict or understand (e.g., house price, number of burglaries). In the regression equation, the dependent variable is on the left side of the equal sign. When building a model, you are given a set of known y values that you can then use to build a regression model.
- Explanatory Variables (X): Independent variables, located to the right of the equal sign, used to model or predict the value of the dependent variable. For example, if you want to predict how much a store will buy each year, you may want to use some explanatory variables in your model to represent the number of potential customers, the distance to competitors, the visibility of the store, and local consumption patterns.
- Regression coefficient (β): The regression coefficient is a set of numerical values indicating the strength and type of relationship between the explanatory variable and the dependent variable. Each explanatory variable has a corresponding regression coefficient. When the dependent variable and the explanatory variable are positively correlated, the sign of the correlation coefficient is also positive; when the relationship is negatively correlated, the sign of the correlation coefficient is also negative. If the correlation is strong, the coefficient is also relatively large; if the correlation is weak, the correlation coefficient is close to zero.
The first step in Geographical ly Weighted Regression Analysis is to determine a study area and use the different spatial locations of each element to calculate the attenuation function, which is a continuous function. When you put the spatial position of each element (usually coordinate information (X, y)) and the value of the element into this function, you can get a weight value (β), which can be put into the regression equation. W (UI, vi) in the following decay function is the spatial weight matrix.
Ordinary Least Squares
Ordinary Least Squares (OLS) is the simplest and most commonly used of all regression methods. It provides a global model for a predicted variable or process and creates a regression equation to represent the process. OLS can output a variety of important diagnostic parameters to let users know whether a useful model has been found, and the calculated regression coefficients have better statistical properties, and their linearity, unbiasedness and validity are the best. Geographically weighted regression (GWR) is one of several spatial regression methods that are increasingly used in geography and other disciplines. By fitting a regression equation to each element in the Dataset, a partial model is provided for the predicted variable or process. When used properly, these methods provide robust and reliable Statistical Data for checking and estimating linear relationships.
Both OLS and GWR are linear regression equations, and regression analysis can show the probability of one or more variables affecting a positively or negatively correlated change in another variable. For example, the higher the density of garbage cans, the higher the cleanliness of the environment, and this relationship is a positive correlation.
Regression analysis should start with the Ordinary Least Squares (OLS) by first obtaining a properly specified OLS model and then running the GWR with the same explanatory variables. For example, in order to predict the next year's housing prices, we can first use OLS to model, find the important factors affecting housing prices, build a more accurate prediction model, and then use GWR to analyze.
Geographically Weighted Regression
Geographically weighted regression (GWR) is one of several spatial regression techniques increasingly used in geography and other disciplines. By fitting a regression equation to each element in the Dataset, geographically weighted regression (GWR) provides a local model for the variable or process you are trying to understand and predict. Geographically weighted regression (GWR) constructs these independent equations by combining the dependent and explanatory variables for factors that fall within the bandwidth of each target factor. The shape and size of the bandwidth depends on the kernel type, bandwidth method, distance, and number of neighbors parameters entered by the user. GWR is a local form of linear regression used to model spatially varying relationships, which addresses the spatial non-stationarity of relationships or structures between variables caused by changes in geographical location.
Related topics
Geographically Weighted Regression Analysis
Measuring Geographic Distributions