What is Multilevel Modeling?

What is Multilevel Modeling?

Multilevel models are known by several monikers, such as hierarchical linear models, mixed models and random-effects models, and are used in medical research, economics, sociology, marketing, educational psychology and numerous other fields. Marketing-related applications of multilevel modeling include some forms of advanced survey analytics, market response modeling and CRM.

Multilevel modeling is quite a complex subject but here is a succinct, non-mathematical definition of it excepted from Multilevel Analysis: Techniques and Applications, Third Edition (Hox et al.):

“The term ‘multilevel’ refers to a hierarchical or nested data structure, usually subjects within organizational groups, but the nesting may also consist of repeated measures within subjects, or respondents within clusters, as in cluster sampling. The expression multilevel model is used as a generic term for all models for nested data. Multilevel analysis is used to examine relations between variables measured at different levels of the multilevel data structure…multilevel modeling has contributed to the analysis of traditional individuals within groups data, repeated measures and longitudinal data, sociometric modeling, twin studies, meta-analysis and analysis of cluster randomized trials.” 

The book is an excellent overview of multilevel modeling from a behavioral and social science perspective, and below I’ve included another selection. Any copy/paste and editing errors are mine.

“Social research regularly involves problems that investigate the relationship between individuals and the social contexts in which they live, work, or learn. The general concept is that individuals interact with the social contexts to which they belong, that individual persons are influenced by the contexts or groups to which they belong, and that those groups are in turn influenced by the individuals who make up that group. 

The individuals and the social groups are conceptualized as a hierarchical system of individuals nested within groups, with individuals and groups defined at separate levels of this hierarchical system. Naturally, such systems can be observed at different hierarchical levels, and variables may be defined at each level. This leads to research into the relationships between variables characterizing individuals and variables characterizing groups, a kind of research that is generally referred to as ‘multilevel research’. 

In multilevel research, the data structure in the population is hierarchical, and the sample data are a sample from this hierarchical population. For example, in educational research, the population typically consists of classes and pupils within these classes, with classes organized within schools. The sampling procedure often proceeds in successive stages: first, we take a sample of schools, next we take a sample of classes within each sampled school, and finally we take a sample of pupils within each sampled class.

In multilevel research, variables can be defined at any level of the hierarchy. Some of these variables may be measured directly at their ‘own’ natural level; for example, at the school level we may measure school size and denomination, at the class level we measure class size, and at the pupil level, intelligence and school success. In addition, we may move variables from one level to another by aggregation or disaggregation. 

Aggregation means that the variables at a lower level are moved to a higher level, for instance, by assigning to the classes the class mean of the pupils’ intelligence scores. Disaggregation means moving variables to a lower level, for instance by assigning to all pupils in the schools a variable that indicates the denomination of the school they belong to.

The lowest level (level 1) is usually defined by the individuals. However, this is not always the case. For instance, in longitudinal designs, repeated measures within individuals are the lowest level. In such designs, the individuals are at level two, and groups are at level three. Most software allows for at least three levels, and some software has no formal limit to the number of levels. However, models with many levels can be difficult to estimate, and even if estimation is successful, they are unquestionably more difficult to interpret. 


At each level in the hierarchy, we may have several types of variables. The distinctions made in the following are based on the typology offered by Lazarsfeld and Menzel (1961), with some simplifications. In our typology, we distinguish between global, structural and contextual variables. 

Global variables are variables that refer only to the level at which they are defined, without reference to other units or levels. A pupil’s intelligence or gender would be a global variable at the pupil level. School denomination and class size would be global variables at the school and class level. Simply put: a global variable is measured at the level at which that variable actually exists. 

Structural variables are operationalized by referring to the sub-units at a lower level. They are constructed from variables at a lower level, for example, in defining the class variable ‘mean intelligence’ as the mean of the intelligence scores of the pupils in that class. Using the mean of a lower-level variable as an explanatory variable at a higher level is called aggregation, and it is a common procedure in multilevel analysis. Other functions of the lower-level variables are less common, but may also be valuable. For instance, using the standard deviation of a lower-level variable as an explanatory variable at a higher level could be used to test hypotheses about the effect of group heterogeneity on the outcome variable (cf. Klein and Kozlowski, 2000). 

Contextual variables are the result from disaggregation; all units at the lower level receive the value of a global variable for the context to which they belong at the higher level. For instance, we can assign to all pupils in a school the school size, or the mean intelligence, as a pupil-level variable. Disaggregation is not needed in a proper multilevel analysis. For convenience, multilevel data are often stored in a single data file, in which the group-level variables are repeated for each individual within a group, but the statistical model and the software will correctly recognize these as a single value at a higher level. The term contextual variable, however, is still used to denote a variable that models how the context influences an individual. 

In order to analyze multilevel models, it is not important to assign each variable to its proper place in the typology. The benefit of the scheme is conceptual; it makes clear to which level a measurement properly belongs. Historically, multilevel problems have led to analysis approaches that moved all variables by aggregation or disaggregation to one single level of interest followed by an ordinary multiple regression, analysis of variance, or some other ‘standard’ analysis method. However, analyzing variables from different levels at one single common level is inadequate, and leads to two distinct types of problems.

The first problem is statistical. If data are aggregated, the result is that different data values from many sub-units are combined into fewer values for fewer higher-level units. As a result, much information is lost, and the statistical analysis loses power. On the other hand, if data are disaggregated, the result is that a few data values from a small number of super-units are ‘blown up’ into many more values for a much larger number of sub-units. 

Ordinary statistical tests treat all these disaggregated data values as independent information from the much larger sample of sub-units. The proper sample size for these variables is of course the number of higher-level units. Using the larger number of disaggregated cases for the sample size leads to significance tests that reject the null-hypothesis far more often than the nominal alpha level suggests. In other words, investigators come up with many ‘significant’ results that are totally spurious. 

The second problem is conceptual. If the analyst is not very careful in the interpretation of the results, s/he may commit the fallacy of the wrong level, which consists of analyzing the data at one level, and formulating conclusions at another level. Probably the best-known fallacy is the ecological fallacy, which is interpreting aggregated data at the individual level. It is also known as the ‘Robinson effect’ after Robinson (1950). Robinson presents aggregated data describing the relationship between the percentage of blacks and the illiteracy level in nine geographic regions in 1930. 

The ecological correlation, that is, the correlation between the aggregated variables at the region level is 0.95. In contrast, the individual-level correlation between these global variables is 0.20. Robinson concludes that in practice an ecological correlation is almost certainly not equal to its corresponding individual-level correlation. For a statistical explanation, see Robinson (1950) or Kreft and de Leeuw (1987). Formulating inferences at a higher level based on analyses performed at a lower level is just as misleading. This fallacy is known as the atomistic fallacy. 

A better way to look at multilevel data is to realize that there is not one ‘proper’ level at which the data should be analyzed. Rather, all levels present in the data are important in their own way. This becomes clear when we investigate cross-level hypotheses, or multilevel problems. A multilevel problem is a problem that concerns the relationships between variables that are measured at a number of different hierarchical levels. For example, a common question is how a number of individual and group variables influence one single individual outcome variable. Typically, some of the higher-level explanatory variables may be structural variables, for example the aggregated group means of lower-level global (individual) variables. 

The goal of the analysis is to determine the direct effect of individual- and group-level explanatory variables, and to determine if the explanatory variables at the group level serve as moderators of individual-level relationships. If group-level variables moderate lower-level relationships, this shows up as a statistical interaction between explanatory variables from different levels. In the past, such data were analyzed using conventional multiple regression analysis with one dependent variable at the lowest (individual) level and a collection of disaggregated explanatory variables from all available levels (cf. Boyd & Iversen, 1979). This approach is completely outdated, since it analyzes all available data at one single level, it suffers from all of the conceptual and statistical problems mentioned above.” 

Source: Multilevel Analysis: Techniques and Applications, Third Edition (Hox et al.)


Article by channel:

Read more articles tagged: Analytics, Featured, Marketing Analytics, Statistics