Category:Advanced Data Analysis
Predictive modeling involves working out how one set of variables predicts another variable. There are two 'flavors' of this type of analysis that are commonplace: driver analysis and targeting.
Driver analysis seeks to work out the relative role that different drivers of preference play in a market. For example, if there is one question that measures overall satisfaction and another question which measures satisfaction with different areas of a company (e.g., customer service, price, quality), then driver analysis can be used to quantify the relative impact of each of the overall areas on overall preference. Driver analysis is usually conducted using Regression. Choice Modeling is a type of driver analysis.
Targeting, which seeks to identify the distinguishing characteristics of a group of people of particular interest (e.g., if wanting to understand the demographic profile of the heaviest users of a service). There are many dozens of techniques in widespread use for targeting. However, most of these techniques are very complicated and easy to misinterpret. The only one of these tools that can readily be used by non-statisticians is Predictive Trees.
Segmentation involves finding groups of people that have given similar responses in a survey with the goal being that similar strategies can be developed for each of the segments of similar people. For example, if a study has been conducted looking at the type of leisure activities that people undertake then segmentation will identify groups of people that undertake similar types of leisure activities. There are three main ways that people segment: judgment, cluster analysis and latent class analysis.
When judgment is used to create segment it usually involves selecting a single variable, or a combination of a small number of variables, that are known to relate to many of the key variables in the survey. For example, if the crosstabs revealed that age was correlated with many of the other questions in the survey than it may make sense to segment using age.
The traditional approach to conducting segmentation has been to use Cluster Analysis. Cluster analysis assumes that:
- There is no missing data (i.e., each respondent has provided data on all the variables.
- All the variables are numeric.
- All the variable have the same range (e.g., the same highest and lowest values).
It is often the case that one or more of these assumptions is not met. There are various techniques that can be used to try and overcome these assumptions (see the SurveyAnalysis.org pages on segmentation for more information).
Latent class analysis
The modern alternative to using cluster analysis is to use Latent Class Analysis and modern latent class analysis programs automatically overcome each of the assumptions of cluster analysis.
Perceptual maps are charts that show the the relationships between different categories in a survey. For example, the map below shows how different brands of cola are perceived. Most perceptual maps used in survey analysis are created using Correspondence Analysis. The example below was created in Displayr
Sometimes it is useful to understand how if there are correlations between large numbers of variables. For example, this can be useful to:
- Understand how attitudes and/or behaviors are interrelated.
- Identify redundant questions in a questionnaire if there is a need to simplify it.
- Identify redundant concepts in form of a concept test.
- Summarize data.
- Transform data prior to the application of other multivariate techniques (e.g., Cluster Analysis and Regression).
The identification of correlated variables is generally done using Principal Components Analysis, but most of the time when researchers conduct principal components analysis they refer to it as factor analysis (which is technically a different technique, but the differences are trivial).