ML & Math also: Kullback-Leibler

KL divergence

D_KL(P‖Q) = Σ p log(p/q). Extra bits to encode P with Q.

In plain terms

Asymmetric: D_KL(P‖Q) ≠ D_KL(Q‖P). Cross-entropy loss = entropy + KL.

Origin

Solomon Kullback and Richard Leibler, "On Information and Sufficiency," 1951. Fundamental to information theory, variational inference, and modern deep learning loss functions.

Where it shows up in production

Variational autoencoders KL divergence between encoder distribution and prior is half the loss function.
Reinforcement learning (PPO) KL constraint on policy updates keeps the new policy close to the old. The proximal in PPO.

On Semicolony

Foundations foundations

Sources & further reading

Paper Kullback & Leibler — On information and sufficiency (1951)

Found this useful?