In economic analysis we often require a measure of welfare that explicitly captures the notion of income inequality. When we introduce an economic policy, such as a carbon tax or a road toll, this policy also has effects on the income distribution. Thus there can be trade-offs between economic efficiency and income inequality. How do we balance such policies? Microeconomic theory has a number of useful tools, although they struggle with one obvious caveat: what is society's preference for income equality, or put another way, what is its tolerance of inequality? Of course, there isn't a single answer to this question. A lot of rich people are probably quite happy with inequality, while a lot of poor people are probably very unhappy about inequality. In democracies these questions are ultimately answered at the ballot box. For policy analysis the more important question is then to describe the status quo effectively and then compare proposed changes against that status quo. But let's start with a rehash of some basic microeconomics first, and then return to policy analysis.

We start by defining a welfare function with
**inequality aversion**. Welfare \(W\) is a function
of individual incomes \(x_i\) of the \(\{1,..,n\}\) members
of society. Let \(y\) denote mean income, and if we assume that
the welfare function is homogeneous of degree one, then it is
possible to write the welfare function as
\[ W = y \cdot V(x_1/y,...,x_n/y) = y \cdot (1-\lambda)\]
where \(V=1\) is normalized and where \(0<\lambda<1\) is a measure
of inequality. Then \(\lambda\cdot y\) represents the cost of
inqequality in society. The inequality measure \(\lambda\)
captures the amount of utility to be gained by complete redistribution
of income. There are a number of different ways to capture
the measure \(\lambda\) based on the particular welfare function
that is chosen as a starting point.

The Atkinson welfare function starts with a functional form
\[W=\frac{1}{n}\sum_i \frac{x_i^{1-\eta}}{1-\eta}
\quad\mathrm{if}\quad \eta\ne1\]
and
\[W=\frac{1}{n}\sum_i \log(x_i)
\quad\mathrm{if}\quad \eta=1\]
where the parameteter \(\eta\) captures the degree of inequality
aversion. The higher the value of \(\eta\), the more society
prefers an equal income distribution. Realistic values are somewhere
between zero and two.
From this welfare function it is possible to derive the **Atkinson
inequality measure**
\[\lambda_A = 1-\left[ \frac{1}{n}\sum_i(x_i/y)^{1-\eta}\right]^{1/(1-\eta)}
\quad\mathrm{if}\quad \eta\ne1\]
and
\[\lambda_A = 1-\prod_i (x_i/y)^{1/n}
\quad\mathrm{if}\quad \eta=1\]
where \(y\) is again mean income.
The key element in the derivation of this measure is the
level of **equally distributed equivalent income**, or EDE
for short. EDE is the level of income \(y^\ast\) obtained by everyone
in the income distribution that allows society to reach the
same level of welfare as with the actual income distribution.
So \(y^\ast=y(1-\lambda_A)=W\).

Another important inequality measure is based on the notion of
entropy and is known as the **Theil index**:
\[\lambda_T=\frac{1}{n}\sum_i \left(\frac{x_i}{y}\right)\ln\left(\frac{x_i}{y}\right)\]
There is a fair bit of economic literature on these
types of measures, and their relative merits.

How can we use these measures to analyze different policy proposals? Policies have two welfare implications. A specific policy is meant to increase economic efficiency, a welfare improvement. We could stop there if policies had no impact on the income distribution. However, most economic policies create winners and losers. If a policy increases efficiency at the cost of making the income distribution less equal, there is a trade-off. Now we can quantify this trade-off.

Consider the simple case of an income tax so that after-tax income is \( \tilde{x_i}= \tau y + (1-\tau) x_i\). So indidivudal \(i\) receives benefit \(\tau y\) and pays \(\tau x_i\) in taxes. It is easily seen that this scheme is revenue neutral for the government. Let us also construct a very simple society with a share \(\mu\) of rich people with income \(1+\nu\) and a share \(1-\mu\) of poor people with income \(1\). The average income in this society is therefore \(1+\mu\nu\) and the total welfare with \(\eta=1\) is \[ W= (1+\mu\nu)\exp\left[ \mu\ln\left(\frac{1+\tau\mu\nu+(1-\tau)\nu}{1+\mu\nu}\right) +(1-\mu)\ln\left(\frac{1+\tau\mu\nu}{1+\mu\nu}\right)\right]\] Average income does not change with this tax policy. The total amount of tax that is collected and redistributed is \(\tau\mu\nu\). If there was no hidden cost to redistribution, we would simply set the tax rate at \(\tau=1\) and everyone would end up with the same after-tax income. Taxation incurs a cost, however. There is administrative overhead and various other economic distortions, and income distributions also reflect legitimate skill differentials—more productive workers receive a higher wage rate. Whatever the reasons, let us assume that the more tax we collect, the larger the distortion. The effective welfare is then \(W-\zeta\tau\mu\nu\) where \(\zeta\) captures the cost of taxation. Determining the optimal level of taxation requires solving the first-order condition of the welfare function with respect to \(\tau\). The welfare function can be simplified a bit, and the net welfare function thus becomes \[W=[1+\tau\mu\nu+(1-\tau)\nu]^\mu[1+\tau\mu\nu]^{1-\mu}-\zeta\tau\mu\nu\] Differentiating this function with respect to \(\tau\) and setting the result to zero unfortunately yields an expression that cannot be solved algebraically for \(\tau\). However, the expression can be solved numerically. For example, assume there are \(\mu=1/5\) rich people whose income is twice as high as that of poor people so that \(\nu=1\), and further assume that the cost of the tax is \(\zeta=1/8\). Then the optimal tax rate is \(\tau=35\%\).

An important practical measure of income distributions is the
**Gini coefficient** that can be calculated from binned income
data where \(\mu_i\) is the population share of cohort \(i\) with
average income \(x_i\). Then the trapezoidal summation provides
an approximation of the Gini coefficient
\[G=1-\sum_i \mu_i (z_{i}+z_{i-1})\]
where \(z_i=\sum_{k=1}^i \mu_k x_k/\sum_{k=1}^n \mu_k x_k\) is the
cumulative share of income and \(z_0=0\). This can be rearranged to
yield
\[G=\frac{\sum_i \mu_i x_i\sum_{k}\mathrm{sgn}(i-k)\mu_k}{\sum_i \mu_i x_i}\]
(This formula is different from what you find in some textbooks, but
the result is indeed the same.)
In the above case of only two cohorts, the Gini coefficient can be
computed directly as
\[G=\frac{\mu(1-\mu)\nu}{1+\mu\nu}\]
With the parameters \(\mu=1/5,\nu=1\), it follows that \(G=0.133\).
This is a rather low coefficient; many OECD countries have much
higher—see my blog from November 2016 Growing income inequality and its political consequences. Canada's
Gini coefficient is just over 0.3 now.

Let us return to policy analysis, however. Given a particular income distribution in society, we can see how policy changes affect net welfare. Consider a road toll \(\tau\) that is borne by everyone in society as everybody commutes the same amount in this model society. After-toll income will therefore be \(1-\tau\) for the poor and \(1+\nu-\tau\) for the rich. The road toll will generate benefits proportional to \(\beta\tau(2\zeta-\tau)\). Think of these gains as diminished road congestion. The functional form suggests that these gains first increase, but as the toll increases beyond its optimal level, the decreased traffic volume hinders economic activity. This inverse-U relationship is a common occurrence in public policy considerations; there are competing effects and there is an optimal level for a policy. Now the net welfare with distributional considerations included will be \[ W=\beta\tau(2\zeta-\tau)+(1-\tau)^{1-\mu}(1+\nu-\tau)^{1-\mu}\] If the policy is introduced without consideration to the distributional effects, the optimal policy is simply \(\tau=\zeta\), which produces net welfare gain \(\beta\zeta^2\). Let us assume \(\beta=40\) and \(\zeta=1/40\), and then the optimal toll is \(\tau=0.025\) and the welfare gain would be \(\Delta W=0.025\). But now take the distributional effect into account. Differentiating the welfare function with respect to \(\tau\) and solving this first-order condition for a welfare maximum yields \(\tau=0.012\), just about half the toll without distributional considerations. The policy gain has shrunk from \(\Delta W=0.025\) to \(\Delta W=0.018\), but the net welfare gain is positive now. Without policy, our welfare stood at 1.1487. With the naive policy (without distributional consideration) our welfare dropped slightly to 1.1478, while with full welfare consideration (and the lower toll) welfare improved to 1.1545. The naive policy would lead to a 0.08% welfare decrease, while the distributionally-aware policy would lead to a 0.5% welfare gain. These sound like small percentage changes but are typical in policy consideratons. In reality, multiplied by GDP, these small percentages amount to very big dollar figures.

If you have endured reading this blog until this point, you may wonder: what are the take-away points from this discussion? Simply put, when economists engage in public policy analysis we are quick to look at the efficiency side of the argument. Whether we like it or not, the design of a policy instrument also has welfare implications. Much of this analysis is done in a partial-equilibrium context without the ability to look at economy-wide general-equilbrium effects. But even without considering spillovers to other markets, many public policies have very distinct distributional effects; taking these into account fully is important. Policies that strive for greater economic efficiency without regard for which parts of society bear the cost are politically toxic.

Welfare analysis is always a bit murky and thus many economists don't like to step into this territory. The murkiness has a simple reason. The policy analyst has to choose a parameter \(\eta\) for society's inequality aversion. However, people with different political beliefs have very different opinions about what level of inequality is tolerable. This debate is as loaded politically as is the debate about the appropriate intertemporal discount rate for climate change policies. The way to deal with this conundrum is to present the results over a range of plausible \(\eta\) values. Leave it to the audience to come clear about where they stand on equality issues.