How recommendation systems work.
A user has rated 4 movies out of a million. How do you predict the rest? Matrix factorization: approximate the sparse rating matrix as the product of two narrow matrices — one per user, one per item. The narrow vectors capture latent preferences nobody had to label.
In a real recommender, the matrix is millions × millions and 99.99% sparse. Most users haven't rated most items. The recommendation problem: given the few cells that are filled, predict the rest.
What the latent features actually capture
You don\'t tell the algorithm "feature 1 means sci-fi-ness." It discovers it from the data. After training, factor 1 might correlate with action-vs-drama, factor 2 with old-vs-new, factor 47 with something only the model can articulate. The factors are emergent — they\'re whatever directions best explain the rating patterns. This is why matrix factorization beats hand-coded "users who like X also like Y" rules: the algorithm finds dimensions humans would never enumerate.
The cold-start problem
Pure matrix factorization needs ratings to learn. A brand-new user has none. A brand-new item has none. Real systems hybridize: use content features (movie genre, actors, director) for new items; use signup-time questionnaires or demographic priors for new users. Two-tower neural networks (one tower for user features, one for item features) sidestep cold-start because each tower can produce an embedding from features alone.
Beyond matrix factorization · what 2024 recommenders run
YouTube, TikTok, Netflix, Spotify all use deep two-tower architectures plus a reranker. Tower 1 takes user history → 256-dim embedding. Tower 2 takes item features → 256-dim embedding. Dot product = relevance score. Ranker takes top-K candidates and applies a second, more expensive model that also considers diversity, freshness, and business objectives (cold items get a boost, monetized items get a boost, etc.). Matrix factorization is now the classroom version; production runs much bigger machinery on the same fundamental idea — turn users and items into vectors, score with dot products.
Recommender architectures →
SVD / ALS, two-tower nets, sequence models, RL for ranking, calibration, multi-stage candidate generation + ranking + reranking.
Open the Codex →