ML & Math

Information entropy

H(X) = −Σ p log p. Expected bits per sample.

In plain terms

Shannon, 1948. The compression ceiling for any source. Higher entropy = more uncertainty.

Origin

Claude Shannon, "A Mathematical Theory of Communication," Bell System Technical Journal 1948. Founded information theory and gave compression its theoretical ceiling.

Where it shows up in production

gzip / zstd compression Both target the Shannon entropy of the source. Real-world ratios near the entropy bound = excellent compressor.
Decision trees Information gain (entropy reduction) is the standard split criterion. CART, C4.5, XGBoost, scikit-learn.

On Semicolony

Foundations foundations Paper papers

Sources & further reading

Paper Shannon — A Mathematical Theory of Communication (1948)

Found this useful?