Cloudflare's regex of doom
A new WAF rule with catastrophic backtracking spiked CPU to 100% on every edge worldwide.
Most of Cloudflare's edge — every site behind it — returned 502s for 27 minutes. Roughly half the modern web.
A routine WAF rule update intended to block a class of XSS attacks.
A regex with nested quantifiers (.*(?:.*=.*)) deployed to the WAF. Against pathological input it triggered exponential backtracking. The deploy went to every edge simultaneously.
- 13:42 UTC Rule deployed globally.
- 13:42 Edge CPUs spike to 100% across all data centers.
- 13:43 Pages start returning 502 errors at scale.
- 13:45 Internal alerting fires; engineers identify it's the new WAF rule.
- 14:09 Global kill switch flipped on the WAF; CPU drops, traffic recovers.
Reverted the rule. Added: regex complexity limits, staged deployment (canary regions before global), Lua-based execution timeouts, and "kill switches" for every WAF rule type.
- Global simultaneous deploy of any code path is a tail risk. Always canary.
- Regex engines without timeouts are a denial-of-service waiting to happen.
- Test on adversarial inputs, not just the inputs you expect.
- Have a kill switch for every individually-shippable thing.