How recommendation systems work.

A user has rated 4 movies out of a million. How do you predict the rest? Matrix factorization: approximate the sparse rating matrix as the product of two narrow matrices — one per user, one per item. The narrow vectors capture latent preferences nobody had to label.

speed2.2s

step 1 / 5

The user-item matrix · 5 users × 6 movies · mostly empty

In a real recommender, the matrix is millions × millions and 99.99% sparse. Most users haven't rated most items. The recommendation problem: given the few cells that are filled, predict the rest.

What the latent features actually capture

You don\'t tell the algorithm "feature 1 means sci-fi-ness." It discovers it from the data. After training, factor 1 might correlate with action-vs-drama, factor 2 with old-vs-new, factor 47 with something only the model can articulate. The factors are emergent — they\'re whatever directions best explain the rating patterns. This is why matrix factorization beats hand-coded "users who like X also like Y" rules: the algorithm finds dimensions humans would never enumerate.

The cold-start problem

Pure matrix factorization needs ratings to learn. A brand-new user has none. A brand-new item has none. Real systems hybridize: use content features (movie genre, actors, director) for new items; use signup-time questionnaires or demographic priors for new users. Two-tower neural networks (one tower for user features, one for item features) sidestep cold-start because each tower can produce an embedding from features alone.

Beyond matrix factorization · what 2024 recommenders run

YouTube, TikTok, Netflix, Spotify all use deep two-tower architectures plus a reranker. Tower 1 takes user history → 256-dim embedding. Tower 2 takes item features → 256-dim embedding. Dot product = relevance score. Ranker takes top-K candidates and applies a second, more expensive model that also considers diversity, freshness, and business objectives (cold items get a boost, monetized items get a boost, etc.). Matrix factorization is now the classroom version; production runs much bigger machinery on the same fundamental idea — turn users and items into vectors, score with dot products.

Go deeper

Recommender architectures →

SVD / ALS, two-tower nets, sequence models, RL for ranking, calibration, multi-stage candidate generation + ranking + reranking.

Open the Codex →

Found this useful?