Decomposing R-squared

Prof. Werner Antweiler, Ph.D.

If you have ever run an ordinary least squares (OLS) regression for an equation $y_i=\beta_0+\sum_{j=1}^p \beta_j x_i+\epsilon_i$, you have probably come across the problem of identifying how much a particular regressor $x_j$ contributes to the overall explanation of variation in the model. The overall goodness of fit is measured with the $R^2$ ("R-squared"), the ratio of the explained sum of squares to the total sum of squares. But is it possible to decompose the $R^2$ of the full model into partial $R^2_j$ so that $R^2=\sum_j R^2_j$? Being able to report such a decomposition is very useful to assess the explanatory power of individual regressors or groups of regressors (such as sets of indicator variables). Previous attempts at providing such influence measures were easy to compute but had particular problems. But there's a new kid in town. It's called the Shapley-Owen value. It provides a theoretically sound decomposition.

The ideas presented here have been circulating for a while. A useful summary and software in the R language can be found on the web site of Ulrike Grömping Relative Importance of Regressors. A more theoretical discussion is in Prof. Grömping's article Estimators of relative importance in linear regression based on variance decomposition in the American Statistician 61(2) from May 2007. A more recent contribution is the paper by Frank Huettner and Marco Sunder of the University of Leipzig, entitled Axiomatic arguments for decomposing goodness of fit according to Shapley and Owen values. An implementation in Stata is provided as the module rego.

In game theory, the Shapley value is a way to distribute the total gains of a game to players fairly. The properties of this value make it particularly useful for analyzing regression results as well. Calculating the Shapley value for a particular regressor $j$ requires the computation of all $2^p$ possible models, one each for each $k$ combinations of models with $k$ regressors. It is easily seen that this becomes very large when a model contains many regressors of interest. With anything in excess of about twenty parameters of interest, it becomes quickly very expensive to calculate $R^2$ for all $2^p$ models.

So what exactly is the Shapley value, and how is it computed? The Shapley value adds the marginal contribution to the $R^2$ from adding regressor $x_k$ to the model, weighted by the number of permutations represented by this submodel. Now the partial $R^2_j$ for regressor $j$ is given by \[R^2_j= \sum_{T\subseteq Z\setminus\{x_j\}} \frac{k!\cdot(p-k-1)!}{p!}\left[R^2(T\cup\{x_j\})-R^2(T)\right] \] Here, $T$ is model with $k$ regressors but without regressor $x_j$, and $T\cup\{x_j\}$ is the same model but with $x_j$ included. The set $Z$ contains all models with combinations of regressors.

Computationally, all $2^p$ of the $R^2(\cdot)$ values can be computed efficiently from the variance-covariance matrix. Once a vector of these values is available, the Shapley values can be computed by iterating over each regressor and summing the weighted marginal contributions. The SAS code fragment below shows some nifty ways for the housekeeping of the model indices, exploiting the fact that the $2^p$ combinations can be indexed by bitmasks for inclusion or exclusion of individual regressors.

The Owen value is an extension of the Shapley value for so-called "cooperative games" with coalitions. The key property is that the Owen value works the same way as the Shapley value, but for groups of regressors. When we have a model with dozens of indicator variables, but we are only interested in the influence of the entire group of indicator variables, the Owen value reduces the complexity of the goodness-of-fit decomposition. In most models, the number of regressor groups that we are really interested in is relatively small. For $p=16$, there are only 65,534 submodels to compute (plus the full model, and the model without any regressors).

The code fragment below provides an implementation in the IML matrix language of the SAS system. It is embedded in a SAS macro with parameters for the database name, the dependent variable, and the regressors and group correspondence. It outputs the OLS results along with the decomposition of the goodness-of-fit. Of course, the code can be used for individual regressors as well as groups of regressors. If group names are identical to variable names, each group has only a single regressor.

*-------------------------------------------------------------+ | Shapley-Owen Decomposition of R-Squared for OLS Regressions | | (C) 2014 Werner Antweiler, University of British Columbia | +-------------------------------------------------------------; %MACRO SHAPLEYOWEN(DBASE,DVAR,VNAMES); *-------------------------------------------------------------+ | DBASE -- database with dependent variable and regressors | | DVAR -- name of dependent variable | | VNAMES -- database of regressor names with two columns | | vname : name of regressor | | gname : name of group the regressor belongs to | +-------------------------------------------------------------; data vnames; set &VNAMES end=verylast; length _varlist $512; retain _varlist; if _N_=1 then _varlist=trim(vname); else _varlist=trim(_varlist)||' '||trim(vname); if verylast then call symput('VARS',_varlist); drop _varlist; data gnames; set &VNAMES; keep gname; proc sort data=gnames nodup; by gname; run; proc iml; *--------------------------------------------------+ | Read data | +--------------------------------------------------; use &DBASE; read all var { &DVAR } into y; read all var { &VARS } into x; n=nrow(x); k=ncol(x); intercept=J(n,1,1); x=x||intercept; close &DBASE; *--------------------------------------------------+ | Group correspondence | +--------------------------------------------------; use vnames; read all var { vname } into names; read all var { gname } into group; close vnames; use gnames; read all var { gname } into groupname; close gnames; names=names//{"intercept"}; g=nrow(groupname); map=J(g,k,0); do i=1 to k; j=loc(groupname=group[i]); map[j,i]=1; end; *--------------------------------------------------+ | Ordinary Least Squares Regression | | Input: | | y - dependent variable | | x - independent variables | | names - vector of independent variable names | +--------------------------------------------------; xpx=x`*x; xpy=x`*y; ypy=ssq(y); xpxi=inv(xpx); b=xpxi*xpy; u=y-x*b; RSS=ssq(u); df=n-(k+1); s=sqrt(vecdiag(xpxi#(RSS/df))); t=b/s; pv=1-probf(t#t,1,df); est=b||s||t||pv; TSS=ssq(y-sum(y)/n); R2full=1-RSS/TSS; items={"Estimate","Std_Error","Z_Score","P_Value"}; print est[r=names c=items f=8.4]; *--------------------------------------------------+ | RSquare Subset Computation | | Global: | | xpx - x-prime-x matrix [K,K] | | xpy - x-prime-y matrix [K,1] | | ypy - y-prime-y scalar [1,1] | | TSS - total sum of squares | | Input: | | mask - mask to apply | | Output: | | R2 - rsquare | +--------------------------------------------------; start r2sub(mask) global(ypy,xpx,xpy,TSS); xpx_sub=xpx[mask,mask]; xpy_sub=xpy[mask,]; RSS=ypy-xpy_sub`*solve(xpx_sub,xpy_sub); return(1-RSS/TSS); finish r2sub; *--------------------------------------------------+ | Compute all permutations of models and their R2 | +--------------------------------------------------; p=2**g; r2list=J(p,1,.); r2list[1]=0; * model without any regressors ; r2list[p]=r2full; * model with all regressors ; do i=2 to p-1; mask=k+1; * always include intercept in model; m=i-1; do j=1 to g; * check if we need to include regressor-group j ; if (mod(m,2)>0) then do; * include regressors from regressor-group j ; do l=1 to k; if (map[j,l]>0) then mask=mask//l; end; end; m=int(m/2); end; r2list[i]=r2sub(mask); end; *--------------------------------------------------+ | Bitmask computations | | (how many regressors are in a given model) | +--------------------------------------------------; start bitcount(value,n); count=0; x=value; do i=1 to n; if (mod(x,2)=1) then count=count+1; x=int(x/2); end; return(count); finish bitcount; *--------------------------------------------------+ | Compute Shapley-Owen Values | +--------------------------------------------------; shapley=J(g,1,0); fact_g=fact(g); * iterate of each regressor-group ; do j=1 to g; w=0; bitmask=2**(j-1); * consider all models except full model; do i=0 to p-2; * identify if current model excludes regressor-group j ; if (mod(int(i/bitmask),2)=0) then do; * count number of regressor-groups in model ; m=bitcount(i,g); * compute weight ; gamma=fact(m)*fact(g-m-1)/fact_g; * compute Shapley value and add up ; w=w+gamma*(r2list[1+i+bitmask]-r2list[1+i]); end; end; shapley[j]=w; end; print shapley[r=groupname f=8.5]; R2sum=sum(shapley); print R2full[f=8.5] R2sum[f=8.5]; quit; run; %MEND;

In many practical applications, the number of regressor groups will be quite small, and the code will execute in seconds. SAS IML is perhaps not the most efficient way to implement it, but the code is—hopefully—quite transparent.

Expect to see many academic papers to report $R^2$ decompositions.

Posted on Friday, October 10, 2014 at 08:00 — #Econometrics