# Blog - Mutual Information

\(I(X;Y) = D_{KL}(P_{(X, Y)} \vert\vert\ P_X \cdot P_Y)\) \(D_{KL}(P \vert\vert Q) = \sum_{x \in \mathcal{X}} P(x) \log \left( \frac{P(x)}{Q(x)}\right)\)

If the variables are independent then $P_{(X, Y)} = P_X \cdot P_Y$. Thus the KL divergence will be zero so the mutual information will be zero.

On the other hand consider the other most extreme case, such that $P(X) = P(Y)$.

Then this table represents $P(X, Y)$:

X=0 | X=1 | |
---|---|---|

Y=0 |
.5 | 0 |

Y=1 |
0 | .5 |

And this table represents $P_X \cdot P_Y$:

X=0 | X=1 | |
---|---|---|

Y=0 |
.25 | .25 |

Y=1 |
.25 | .25 |

Then we only need to consider when the the joint distribution ($P_{(X, Y)}$) is not equal to zero. Thus we get $.5 \log \frac{.5}{.25} + .5 \log \frac{.5}{.25} = .5 \log 2 + .5 \log 2 = 1 $. So mutual information can sort of be thought of as how frequently the values of the two variables co-occur or how (in)dependent the two variables are.

#### Citations

- https://en.wikipedia.org/wiki/Joint_probability_distribution
- https://en.wikipedia.org/wiki/Mutual_information
- https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence