Blog - Mutual Information
\(I(X;Y) = D_{KL}(P_{(X, Y)} \vert\vert\ P_X \cdot P_Y)\) \(D_{KL}(P \vert\vert Q) = \sum_{x \in \mathcal{X}} P(x) \log \left( \frac{P(x)}{Q(x)}\right)\)
If the variables are independent then $P_{(X, Y)} = P_X \cdot P_Y$. Thus the KL divergence will be zero so the mutual information will be zero.
On the other hand consider the other most extreme case, such that $P(X) = P(Y)$.
Then this table represents $P(X, Y)$:
X=0 | X=1 | |
---|---|---|
Y=0 | .5 | 0 |
Y=1 | 0 | .5 |
And this table represents $P_X \cdot P_Y$:
X=0 | X=1 | |
---|---|---|
Y=0 | .25 | .25 |
Y=1 | .25 | .25 |
Then we only need to consider when the the joint distribution ($P_{(X, Y)}$) is not equal to zero. Thus we get $.5 \log \frac{.5}{.25} + .5 \log \frac{.5}{.25} = .5 \log 2 + .5 \log 2 = 1 $. So mutual information can sort of be thought of as how frequently the values of the two variables co-occur or how (in)dependent the two variables are.
Citations
- https://en.wikipedia.org/wiki/Joint_probability_distribution
- https://en.wikipedia.org/wiki/Mutual_information
- https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence