If you have ever run an ordinary least squares (OLS) regression for an equation \(y_i=\beta_0+\sum_{j=1}^p \beta_j x_i+\epsilon_i\), you have probably come across the problem of identifying how much a particular regressor \(x_j\) contributes to the overall explanation of variation in the model. The overall goodness of fit is measured with the \(R^2\) ("R-squared"), the ratio of the explained sum of squares to the total sum of squares. But is it possible to decompose the \(R^2\) of the full model into partial \(R^2_j\) so that \(R^2=\sum_j R^2_j\)? Being able to report such a decomposition is very useful to assess the explanatory power of individual regressors or groups of regressors (such as sets of indicator variables). Previous attempts at providing such influence measures were easy to compute but had particular problems. But there's a new kid in town. It's called the Shapley-Owen value. It provides a theoretically sound decomposition.

The ideas presented here have been circulating for a while.
A useful summary and software in the R language can be found
on the web site of Ulrike Grömping
Relative
Importance of Regressors. A more theoretical discussion is
in Prof. Grömping's article Estimators of relative importance in linear regression based on variance decomposition in the *American
Statistician* 61(2) from May 2007. A more recent contribution
is the paper by Frank Huettner and Marco Sunder of the University of Leipzig, entitled Axiomatic arguments for decomposing goodness of fit according to Shapley and Owen values.
An implementation in Stata is provided as the module rego.

In game theory, the Shapley value is a way to distribute the total gains of a game to players fairly. The properties of this value make it particularly useful for analyzing regression results as well. Calculating the Shapley value for a particular regressor \(j\) requires the computation of all \(2^p\) possible models, one each for each \(k\) combinations of models with \(k\) regressors. It is easily seen that this becomes very large when a model contains many regressors of interest. With anything in excess of about twenty parameters of interest, it becomes quickly very expensive to calculate \(R^2\) for all \(2^p\) models.

So what exactly is the Shapley value, and how is it computed? The Shapley value adds the marginal contribution to the \(R^2\) from adding regressor \(x_k\) to the model, weighted by the number of permutations represented by this submodel. Now the partial \(R^2_j\) for regressor \(j\) is given by \[R^2_j= \sum_{T\subseteq Z\setminus\{x_j\}} \frac{k!\cdot(p-k-1)!}{p!}\left[R^2(T\cup\{x_j\})-R^2(T)\right] \] Here, \(T\) is model with \(k\) regressors but without regressor \(x_j\), and \(T\cup\{x_j\}\) is the same model but with \(x_j\) included. The set \(Z\) contains all models with combinations of regressors.

Computationally, all \(2^p\) of the \(R^2(\cdot)\) values can be computed efficiently from the variance-covariance matrix. Once a vector of these values is available, the Shapley values can be computed by iterating over each regressor and summing the weighted marginal contributions. The SAS code fragment below shows some nifty ways for the housekeeping of the model indices, exploiting the fact that the \(2^p\) combinations can be indexed by bitmasks for inclusion or exclusion of individual regressors.

The Owen value is an extension of the Shapley value for so-called "cooperative games" with coalitions. The key property is that the Owen value works the same way as the Shapley value, but for groups of regressors. When we have a model with dozens of indicator variables, but we are only interested in the influence of the entire group of indicator variables, the Owen value reduces the complexity of the goodness-of-fit decomposition. In most models, the number of regressor groups that we are really interested in is relatively small. For \(p=16\), there are only 65,534 submodels to compute (plus the full model, and the model without any regressors).

The code fragment below provides an implementation in the IML matrix language of the SAS system. It is embedded in a SAS macro with parameters for the database name, the dependent variable, and the regressors and group correspondence. It outputs the OLS results along with the decomposition of the goodness-of-fit. Of course, the code can be used for individual regressors as well as groups of regressors. If group names are identical to variable names, each group has only a single regressor.

In many practical applications, the number of regressor groups will be quite small, and the code will execute in seconds. SAS IML is perhaps not the most efficient way to implement it, but the code is—hopefully—quite transparent.

Expect to see many academic papers to report \(R^2\) decompositions.