Lessons Learned
Lessons Learned
System Design
- Avoid hosts failure when one of it’s agents corrupted
- Resilience and Reliability is more important than new features
Testing
- Code Review
Crowdstrike Analysis:
— Zach Vorhies / Google Whistleblower (@Perpetualmaniac) July 19, 2024
It was a NULL pointer from the memory unsafe C++ language.
Since I am a professional C++ programmer, let me decode this stack trace dump for you. pic.twitter.com/uUkXB2A8rm
- QA / Integration Tests
Today the world learned about why it's important to have a robust QA process
— Jem Young (@JemYoung) July 19, 2024
Deployment
- Rolling Deployment
Process
- No Deployment on Friday
Financial
- Stock Price
Will Crowdstrike bricking all these computers have a significant long-term impact on the stock price?
— Dan Luu (@danluu) July 19, 2024
Or will this be like basically every other outage or breach, where "the market" quickly realizes that customers don't care and regulatory action isn't forthcoming?
Public Policy
- Industry Concentration
1. All too often these days, a single glitch results in a system-wide outage, affecting industries from healthcare and airlines to banks and auto-dealers. Millions of people and businesses pay the price.
— Lina Khan (@linakhanFTC) July 19, 2024
These incidents reveal how concentration can create fragile systems.