ML & Math

Information entropy

H(X) = −Σ p log p. Expected bits per sample.


In plain terms

Shannon, 1948. The compression ceiling for any source. Higher entropy = more uncertainty.

Origin

Claude Shannon, "A Mathematical Theory of Communication," Bell System Technical Journal 1948. Founded information theory and gave compression its theoretical ceiling.

Where it shows up in production
  • gzip / zstd compression Both target the Shannon entropy of the source. Real-world ratios near the entropy bound = excellent compressor.
  • Decision trees Information gain (entropy reduction) is the standard split criterion. CART, C4.5, XGBoost, scikit-learn.
On Semicolony
Sources & further reading
Found this useful?