Blog - Mutual Information

$$I(X;Y) = D_{KL}(P_{(X, Y)} \vert\vert\ P_X \cdot P_Y)$$ $$D_{KL}(P \vert\vert Q) = \sum_{x \in \mathcal{X}} P(x) \log \left( \frac{P(x)}{Q(x)}\right)$$

If the variables are independent then $P_{(X, Y)} = P_X \cdot P_Y$. Thus the KL divergence will be zero so the mutual information will be zero.

On the other hand consider the other most extreme case, such that $P(X) = P(Y)$.

Then this table represents $P(X, Y)$:
X=0 X=1
Y=0 .5 0
Y=1 0 .5

And this table represents $P_X \cdot P_Y$:
X=0 X=1
Y=0 .25 .25
Y=1 .25 .25

Then we only need to consider when the the joint distribution ($P_{(X, Y)}$) is not equal to zero. Thus we get $.5 \log \frac{.5}{.25} + .5 \log \frac{.5}{.25} = .5 \log 2 + .5 \log 2 = 1$. So mutual information can sort of be thought of as how frequently the values of the two variables co-occur or how (in)dependent the two variables are.

Citations

• https://en.wikipedia.org/wiki/Joint_probability_distribution
• https://en.wikipedia.org/wiki/Mutual_information
• https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence