Regression Analysis

Regression analysis can model, check, and explore spatial relationships. Besides, it can explain causal relationships between dependent variables and independent variables, thereby understanding key factors that influence the variables for modeling and predicting unknown values. The reasons that we use regression analysis include:

  • Finding reason: models a phenomenon to know what causes it and analyze the degree that the explanatory variable influences it and which measures need taking. Such as we can understand the characteristics of the main living environment of certain endangered species (like temperature, humidity, food, hunting, and natural enemies) to assist the protection of this species through relevant environmental protection, legal regulations and other measures.
  • Prediction: predicts the value of the independent variable according to a group of dependent variables and relationships with explanatory variable. For example, based on the value and relationship between fiscal revenue and total industrial output, retail sales of consumer goods, total exports, and investment in fixed assets last year, given the total industrial output, retail sales of consumer goods, total exports, and investment in fixed assets in the first half of this year, we can predict the fiscal revenue in the first half of the year.
  • Data Mining: learn or check whether there is a positive or negative correlation between the two phenomena. For example: are there more burglaries in economically rich areas or poor areas?

Regression Analysis Application

The following types of questions can be solved: why does a phenomenon continue to occur and what factors contribute to this situation? Modeling a phenomenon to predict other locations or other time values?

Regression analysis is widely used to explain market share, sales, brand preference and marketing effect. To express the relationship between two or more fixed distance or proportional quantities is to solve the problem of regression analysis. Some of the application examples include:

  • Modeling high school retention rates to better understand the factors that help keep kids in school.
  • Modeling traffic accidents as a function of speed, road conditions, weather, and so forth, to inform policy aimed at decreasing accidents.
  • Modeling property loss from fire as a function of variables such as degree of fire department involvement, response time, or property values. If you find that response time is the key factor, you might need to build more fire stations. If you find that involvement is the key factor, you may need to increase equipment and the number of officers dispatched.

Analysis Theory

Regression equation is the mathematical formula applied to the explanatory variables to best predict the dependent variable you are trying to model. Unfortunately for those in the geosciences who think of x and y as coordinates, the notation in regression equations for the dependent variable is always y and for the independent or explanatory variable is always X. Each independent variable is associated with a regression coefficient describing the strength and the sign of that variable's relationship to the dependent variable. A regression equation might look like this (y is the dependent variable, the Xs are the explanatory variables, and β are regression coefficients.

  • Dependent Variable (y): This variable represents the observation you are trying to predict or understand (e.g. house price, number of burglaries). In the regression equation, the dependent variable is on the left side of the equal sign. When building a model, you should give a set of y values. And then iDesktopX can construct a regression model with these values.
  • Explanatory Variable (X): independent variables that locate on the right side of the equal sign. They can be used for modelling and predicting the value of the dependent variable. For example, to predict the annual purchase volume of the store, you can use some explanatory variables in your model to indicate the number of potential customers, the distance from competitors, whether the store is prominent, and local consumption patterns.
  • Regression Coefficient (β): a group of values that indicate the relationship strength and type between the independent variables and the dependent variables. A positive correlation corresponds to positive values between variables and dependent variables. Conversely, the coefficients are negative. If the relationship is strong, the regression coefficient is relatively large. But if the relationship is weak, the regression coefficient is close to 0.

Geographical weighted regression analysis should identify a study area first. Calculate the decay function by using the different space position of each feature. The decay function is a continuous function. Decay function with this, when you put every feature of spatial location (typically coordinate information (x, y)) and the value of the features into this function, you can get a weighted value (β). This value can be brought into the regression equation. The W (UI, vi) in the decay function is the spatial weight matrix.

Ordinary Least Squares

Among all regression methods, ordinary least squares (OLS) is the simplest and most commonly used method. It can provide a global model for the variable or process to be predicted and creates a regression equation to represent the process. OLS can output a variety of important diagnostic parameters to help users know whether they have found a useful model. The calculated regression coefficients have better statistical properties, and the best linearity, unbiasedness, and effectiveness. Geographically Weighted Regression (GWR) is one of spatial regression methods used in geography and other disciplines. It provides a local model for the variable or process to be predicted by fitting a regression equation to each element in the dataset. Besides, these methods can provide powerful and reliable statistical data to check and estimate linear relationships when using them properly.

Both OLS and GWR are linear regression equations. Regression analysis can indicate the probability that one or more variables affect another variable in a positive or negative correlation. For example, the higher the density of trash cans, the higher the cleanliness of the environment. This relationship is a positive correlation.

When performing regression analysis, you should start with ordinary least squares (OLS). First, obtain a correct OLS model, and then run GWR using the same explanatory variables. For example, to predict the next year's housing prices, you can model using OLS, find the key factors that affect housing prices, build a more accurate model, and then analyze with GWR.

Geographically Weighted Regression

Geographically Weighted Regression (GWR) is one of several spatial regression techniques increasingly used in geography and other disciplines. GWR provides a local model of the variable or process you are trying to understand/predict by fitting a regression equation to every feature in the dataset. GWR constructs these separate equations by incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. The shape and size of the bandwidth is dependent on user input for the Kernel type, Bandwidth method, Distance, and Number of neighbors parameters. It is a local form of linear regression for modeling space change relations.

Related Topics

Ordinary Least Squares

Geographically Weighted Regression

Basic Concepts

Measureing Geographic Distribution

Cluster Distributions

Analyzing Mode