Predictive Analytics | Site Perfect

Predictive analytics is an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns. The core of predictive analytics relies on capturing relationships between explanatory variables and the predicted variables from past occurrences, and exploiting it to predict future outcomes. It is important to note, however, that the accuracy and usability of results will depend greatly on the level of business and data understanding of the user. Site Perfect Research has the know-how to leverage several different statistical techniques to fit the location and forecasting challenges you face.

Multivariate Regression Modeling

Regression models are the mainstay of predictive analytics. The focus lies on establishing a mathematical equation as a model to represent the interactions between the different variables in consideration.

Linear regression models analyze the relationship between the dependent variable (e.g., sales volumes) and a set of independent or predictor variables (e.g., demographics, lifestyle categories, competitive impacts, site factors, etc.). This relationship is expressed as an equation that predicts the response variable as a linear function of the parameters. These parameters are adjusted so that a measure of fit is optimized. Much of the effort in model fitting is focused on minimizing the size of the residual, as well as ensuring that it is randomly distributed with respect to the model predictions.

CHAID models

CHAID is a type of decision tree technique, based upon adjusted significance testing. CHAID can be used for prediction (in a similar fashion to regression analysis) as well as classification, and for detection of interaction between variables. CHAID stands for CHi-squared Automatic Interaction Detector.

In practice, CHAID is often used in the context of direct marketing to select groups of consumers and predict how their responses to some variables affect other variables. Like other decision trees, CHAID’s advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis.

CHAID detects interaction between variables in the data set. Using this technique it is possible to establish relationships between a ‘dependent variable’, like sales, and other explanatory variables such as distance, store format, and demographic/lifestyle categories. CHAID does this by identifying discrete groups of respondents and, by taking their responses to explanatory variables, seeks to predict what the impact will be on the dependent variable.

The following example shows the use in predicting the basic radius (in miles) of primary trade areas.

Normal Curves

Normal curves as used in sales forecasting differs from the same term’s use in statistical probability (which is more of the traditional Bell Curve). Normal curves represent one of the oldest sales forecasting techniques.

Sales penetrations are graphed against one or more variables to depict their impact on sales performance as their values change. For example, one common normal curve would show the decrease in sales per capita over distance for three different population density ranges as represented by three curvilinear lines (distance on the x-axis and sales per capita on the y-axis).

Normal curves are especially effective when used in conjunction with analogs. They also have the advantage of visual representation of data that can be effective in communicating the relative influence of key variables to specialists and lay persons alike.

Below are two examples: