I was sitting at work, in a radio station in Melbourne Australia when the Crowdsstrike crash happened. Our entire network of stations was affected around Australia, some worse than others. How bad was it in America?
National news tonight reported major disruptions with airlines, hospitals and a host of retailers. They described it as the worst outage in recent history.I was sitting at work, in a radio station in Melbourne Australia when the Crowdsstrike crash happened. Our entire network of stations was affected around Australia, some worse than others. How bad was it in America?
That's interesting. Considering what was affected was related to Microsoft cloud services (Azure), how were the local station graphics systems, especially those involving local bug-inserters (Evertz, Chyron, etc.) affected at your station?I work for a TV station. Fortunately, our on-air automation doesn’t touch the internet so that was fine. However, almost everything else was affected. We couldn’t air bugs or ID’s. We couldn’t go on the air with our 4:30am news because the news and production computers weren’t back up. Around 5am news had enough gear running to get something on the air. By 5:30am everything was more or less back to normal. Engineers had been working on it since 1am.
Airlines were affected in so much that some use Azure cloud services for their various databases.
Exactly. We've seen it before from Southworst and American when something IT hiccups during a high volume of travel. Domino effect.Plus anytime you throw just a little delay into their system, it multiplies geometrically.
That's kind of a general statement that doesn't fit all instances. From what I read in the datacenter newsletters, CrowdStrike regularly pushes security patches in the background to reduce the possibility of being discovered, exploited or circumvented by bad guys. It kind of makes senses, when you think about it.First of all, anyone running automated updates on a mission-critical system should have their head examined.
Yeah CrowdStrike was the one who publicly and privately had to eat a giant s*it sandwitch. Which is a shame, because CS is one of the better cybersecurity organizations.For radio and TV stations, this means playout systems and the like should be treated the same way utilities treat their control systems or the way banks and brokerages treat their customer-facing systems. Yes, automation is more more efficient but, especially with Windows, every environment has its own peculiarities due to the number of hardware and software configurations that are possible. The consequence is that adequate testing requires some time and effort. It costs money, but losing your customer-facing systems (among others) costs more. Otherwise, you're counting on your vendors to perform adequate testing. Did Crowdstrike do that before pushing yesterday's problem-laden code? One suspects they did not. The pressure in so many software development environments is to ship, and ship fast.
But what about easter eggs or malware that attach to drivers and kernels? Happens all the time.It's probably not widely realized that the systems that control power grids, natural gas distribution, etc., are running on Windows. But the management of those systems prioritizes availability. No EDR systems such as Crowdstrike are used, precisely for the reasons we all learned yesterday. To be more precise, EDR mucks about with kernel drivers, a big no-no in operational technology, where specialized vendor software is tightly integrated with the OS. Updates are made on a very slow cycle (6 months in the case of one major utility I worked for), scheduled well in advance, and always with the involvement of the vendors of the control software that sits on top of the OS, including extensive testing and review. So the problem isn't with Windows (mostly) - it's with the management of the systems.
I think we've all been burned either by Microsoft security patches or driver signing. I got called to a TV station where the chief had let security patches install overnight before waiting and checking MS TechNet for anyone having problems. They called me in because the patches deleted the entire AD database for the entire station. Yep, Active Directory defaulted every IP to 192.168 generic IP's. Once I got everything rebuilt, I installed a NAS that backed up AD every 24 hrs in case they did it again.My parenthetical of "mostly" relates to that kernel driver. Microsoft has to cryptographically* sign those drivers before they can be installed. This means that Microsoft also had a role in this debacle. Microsoft lately has come under fire for talking a good game about security without actually backing it up; this will just add to it, for it's apparent that Microsoft did very little testing or even checking of its own. (* = for avoidance of doubt, this has nothing to do with cryptocurrency)
Yeah I know, blah blah, the outrage. Bottom line is; no mater whether it's Solarwinds, Microsoft, Cisco, or CrowdStrike, code and patches are written by humans on a deadline and installed in a way that is convenient to the customer. At least until something wrong is discovered minutes or hours later.Still, as with fraud prevention, vulnerability detection, etc., the costs of doing solid, careful engineering, risk management, and quality control are outweighed by the revenue to be gained by just pushing things out there and seeing what happens. This is where liability for saddling the technological ecosystem with inadequately engineered and tested software needs strengthening.
Oh I'll bet that customer lawyers are talking to MS and CS lawyers over this recent incident.One thing lawyers are really useful for is getting people to do things that they ought to do but otherwise wouldn't do because of cost...because the lawyers would cost them even more.
First of all, anyone running automated updates on a mission-critical system should have their head examined. For radio and TV stations, this means playout systems and the like should be treated the same way utilities treat their control systems or the way banks and brokerages treat their customer-facing systems.
That's kind of a general statement that doesn't fit all instances.
There's a saying: "security through obscurity" which has the implication that obscurity isn't enough to protect you. Sure, you don't want to tell more than you have to, but for an enterprise that has systems where availability needs to be the top priority, that balance is going to be different. One size doesn't fit all, despite some security pros' (or CISO or CIO) efforts to try to do that. Tools aren't the only solution. They can help, but it's an issue of managing the environment.From what I read in the datacenter newsletters, CrowdStrike regularly pushes security patches in the background to reduce the possibility of being discovered, exploited or circumvented by bad guys. It kind of makes senses, when you think about it.
Right, it was a kernel driver. The fix, as it stands now, is to boot into safe mode and then manually delete the offending driver. A "devmod remove" probably isn't enough, since the behavior with the file present is to get stuck in a loop, where you can't even get to the command prompt. In any event, I feel for the technical staff who are going to burn up their overnights and weekends for the foreseeable future. I don't think they get paid enough to put up with this nonsense.As it relates to this particular bug, drive access both hardware and virtual was blocked specifically using Windows OS.
They were. Then again, I wouldn't want to overstate the long-term outcomes when reputational risk becomes real. Risk managers overstate that (I know I did once upon a time) but the reality is that the stink of the sandwich en merde wears off fairly quickly and top management will go back to watching the stock price closely.Yeah CrowdStrike was the one who publicly and privately had to eat a giant s*it sandwitch. Which is a shame, because CS is one of the better cybersecurity organizations.
In the operational technology world (e.g. Schneider Electric, Rockwell) there are significant interfaces with physical devices (sensors, PLCs, and so on). The OS version-control software versions are tightly controlled. They're also running in environments that, at most, will have heavily mediated network access* and with direct external access only to the vendor. Yes, there is trust of a third-party just as there is with Crowdstrike, but the incentives are different. In the utility field, you simply do not screw up, or you're dead. General IT is more forgiving. Maybe it shouldn't be.But what about easter eggs or malware that attach to drivers and kernels? Happens all the time.
In some cases, such conversations will be at the CEO level.Yeah I know, blah blah, the outrage. Bottom line is; no mater whether it's Solarwinds, Microsoft, Cisco, or CrowdStrike, code and patches are written by humans on a deadline and installed in a way that is convenient to the customer. At least until something wrong is discovered minutes or hours later.
Oh I'll bet that customer lawyers are talking to MS and CS lawyers over this recent incident.
Much of that will depend on how many Congresspersons or their staff missed their flights. And of course, what human being can we publicly burn at the stake? And this of course won't be the last time some public-facing technology issue will happen. Chances are a much larger incident will happen someday that will actually cripple the entire public Internet for longer than anyone would want.Can't wait for the congressional kabuki, um, hearings.
Much of that will depend on how many Congresspersons or their staff missed their flights.
FlexJet and NetJets were very busy the past couple weeks.It was the Friday after the RNC, and there were lots of repubs in Milwaukee. Haven't seen any stories about the situation there.
They aren't attacking tech now that Elon and others have announced not only their endorsements of Trump, but are donating a ton.They all love to attack big tech, but there really isn't a famous face they can blame this on.
I can think of a lot of airports worse than Mitchell. There are likely a lot of available nice hotel rooms at a decent rate now that the RNC is over.Personally I have spent a layover day in Milwaukee and it can be a really nice place.
Well, you don't. But try talking probability or risk to most people. The language risk-management geeks use doesn't help, but look how long it took humankind to develop any sort of concept of probability (the 17th century).The question from some politician in a hearing will be the same: 'How can we make sure something like this never happens again?'