Code review, QA team, hours of being baked on an internal test network, incremental exponential roll out to the world, starting slow so that any problems can be immediately rolled back. If they didn’t have those basics, they have no business being a tech company, let alone a security company who puts out windows drivers.
Yeah, something this big is absolutely not one engineer’s fault. Even if that engineer maliciously pushed an update, it’s not their fault — it was a complete failure of the organization, and one person having the ability to wreck havoc like this is the failure.
And I actually have some amount of hope that, in this case, it is being recognized as such.
No they won’t, not if they’re in the slightest bit competent.
Blameless post-mortem culture is very common at big IT organizations. For a fuck-up this size, there are going to be dozens of problems identified, from bad QA processes, to bad code review processes, to bad documentation, to bad corner cases in tools.
There will probably be some guy (or gal) who pushed the button, but unless what that person did was utterly reckless (like pushing an update while high or drunk, or pushing a change then turning off her phone and going dark, or whatever) the person who pushed the button will probably be a legend to their peers. Even if they made a big mistake, if they followed standard procedures while doing it, almost everyone will recognize they’re not at fault, they just got to be the unlucky person who pushed the button this time.
I really don’t want to be the guy responsible for this fuck up
For a company this big it would also have to have gotten past a code review and QA team, right? … right? …
Of course, of course. This is how these things are always done.
I like how they kept on pushing the update for hours
And who pushes out production updates on a Friday!
We do.
“If something goes down over the weekend, fewer people see it” - my leadership team.
I guess Asia can report the problem on Sunday and I’ll get a nastygram and fix it that afternoon.
“Security”
Code review, QA team, hours of being baked on an internal test network, incremental exponential roll out to the world, starting slow so that any problems can be immediately rolled back. If they didn’t have those basics, they have no business being a tech company, let alone a security company who puts out windows drivers.
Yeah, something this big is absolutely not one engineer’s fault. Even if that engineer maliciously pushed an update, it’s not their fault — it was a complete failure of the organization, and one person having the ability to wreck havoc like this is the failure.
And I actually have some amount of hope that, in this case, it is being recognized as such.
I agree but they will still blame it all on that one guy.
No they won’t, not if they’re in the slightest bit competent.
Blameless post-mortem culture is very common at big IT organizations. For a fuck-up this size, there are going to be dozens of problems identified, from bad QA processes, to bad code review processes, to bad documentation, to bad corner cases in tools.
There will probably be some guy (or gal) who pushed the button, but unless what that person did was utterly reckless (like pushing an update while high or drunk, or pushing a change then turning off her phone and going dark, or whatever) the person who pushed the button will probably be a legend to their peers. Even if they made a big mistake, if they followed standard procedures while doing it, almost everyone will recognize they’re not at fault, they just got to be the unlucky person who pushed the button this time.
This is an industry wide issue. This is just the first symptom.
What we need is to stop the blind trust
Yeah and that means they won’t nail some poor schmuck to the wall over this?
He’ll just get fired, apply somewhere else, and they’ll only know the dates he worked at CrowdStrike.
If anybody cared, they would have switched away from M$ by now.