# Decompositions

A *decomposition* involves breaking something apart. It is a useful way to estimate many things. Let us start with a simple estimation problem. How many Japanese people are there in Australia with dentures (fake teeth that can be taken in and out). And yes, this was was a real-world consulting project.

The simplest and laziest approach to this problem is to do a survey. For example, you might email 1,000 people in Australia and ask how many people in each household are Japanese and have dentures. The result of this would be an estimate of the proportion of Australians that say they are Japanese and have dentures. To get to our required result we would multiply this by the number of Australians. Thus, we have used the following model:

Number Proportion Japanese = Japanese × Number dentures dentures Australians

This formula is a decomposition. We have decomposed the thing we are trying to estimate into two separate estimates, the proportion and the population size.

As research designs go this is a poor one. The proportion we are trying to estimate is likely to be a very small one (e.g., less than 0.1%). Thus, we would expect that we might need to interview well over 1,000 people before we identified a single one of them and perhaps 50,000 or 100,000 people before we obtained a sufficiently precise estimate, for a cost of millions dollars. People researching bizarre topics are usually short of funds, so we can feel confident that this research design is inappropriate. Furthermore, we probably would not get a very precise answer anyway, as Japanese speakers are relatively less likely to participate. So, how can we resolve this? There are lots of other possible decompositions. For example:

Number Number Japanese = people with × Proporton Dentures dentures Japanese

This decomposition requires two completely different inputs: the number of people with dentures and the proportion of people that are Japanese. Both of these numbers may be available from publically available sources (e.g., trade associations, government statistics), so we might be able to do this very cheaply. Of course, the result may also be quite inaccurate, because this decomposition implicitly assumes that people of Japanese origin are neither more nor less likely to have dentures than the rest of the population.

So, we can use different decompositions to solve the same problem. The trick is to trade-off which will be cheapest with which will be most accurate (i.e., most precise).

Now let us solve a more traditional problem. Consider the problem of trying to predict sales of a new brand of laundry detergent. A standard decomposition for this is:

Sales = Market Share × Market Size

There are lots and lots of ways to estimate market share. One of them is to present people with a screen showing a picture of a supermarket shelf, including the new brand, and ask people to choose one; the proportion of people that choose the new brand is then an estimate of the market share. The market size can usually be obtained from historic sales data.

An alternative decomposition of sales of a new product is:

Intention Sales = to × Purchase × Population purchase frequency size

Intention to purchase is estimated by showing somebody a picture of the proposed new product, and asking them if they will buy it or not (this is called Concept Testing). Purchase frequency can be estimated by asking people how many times they will buy it (although looking at historical purchase rates of similar products will often be more valuable). The population size can be obtained from government statistical agencies.

When designing research we need to find the most cost-effective way of producing sufficiently precise estimates. Consider the decomposition of:

Sales = Market Share × Market Size

Let us say we are trying to forecast laundry detergent and we are trying to produce a forecast for next year. To produce our sales estimate we need to estimate market share and market size. It is inevitable that the market size will be broadly similar to the sales from the previous year, with a little growth. So, if the market size last year was $13 billion, the market size next year will probably be between $13 billion and $14 billion.

Now think about the market share estimate. If the new product is a 'dog', it will get 0% market share. If it is wonderful, perhaps it can get 20% of the market. So, if we multiply the lower bound estimates of market size and market share we compute a lower bound estimate sales of 0% of $13 billion = $0 and an upper bound of 20% of $14 billion = $2.8 billion. Our range of estimates for market size make comparatively little difference to our forecast. If we assume that the market size is $13 billion, this drops the upper bound from $2.8 billion to $2.6 billion. By contrast, changing the estimated market share from 20% to 0% drops the upper bound all the way down to $0. Thus, the estimate of sales is most sensitive to the estimated market share analysis and, it follows from this sensitivity analysis, that our focus in producing an estimate of sales should be on estimating the market share if we are decomposing sales into market share and market size. This process, of working out which bit of the research design will most impact upon the precision of an estimate, is known as sensitivity analysis.