Blog - Mutual Information

$I(X;Y) = D_{KL}(P_{(X, Y)} \vert\vert\ P_X \cdot P_Y)$ $D_{KL}(P \vert\vert Q) = \sum_{x \in \mathcal{X}} P(x) \log \left( \frac{P(x)}{Q(x)}\right)$

If the variables are independent then $P_{(X, Y)} = P_X \cdot P_Y$. Thus the KL divergence will be zero so the mutual information will be zero.

On the other hand consider the other most extreme case, such that $P(X) = P(Y)$.

Then this table represents $P(X, Y)$:

X=0 X=1

Y=0 .5 0

Y=1 0 .5

	X=0	X=1
Y=0	.5	0
Y=1	0	.5

And this table represents $P_X \cdot P_Y$:

X=0 X=1

Y=0 .25 .25

Y=1 .25 .25

	X=0	X=1
Y=0	.25	.25
Y=1	.25	.25

Then we only need to consider when the the joint distribution ($P_{(X, Y)}$) is not equal to zero. Thus we get $.5 \log \frac{.5}{.25} + .5 \log \frac{.5}{.25} = .5 \log 2 + .5 \log 2 = 1 $. So mutual information can sort of be thought of as how frequently the values of the two variables co-occur or how (in)dependent the two variables are.

Citations

https://en.wikipedia.org/wiki/Joint_probability_distribution
https://en.wikipedia.org/wiki/Mutual_information
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence