This past week, Cisco posted three security advisories about IOS-based devices (almost all Cisco routers, most of their switches, and a few other types of devices). Each of these advisories basically noted that with a single specially "crafted" packet, one could at least crash their devices, and possibly execute arbitrary code on them. Presumably the issue is buffer overflows in infrequently-used parts of the packet parsing logic. One of the advisories is about TCP packets, another about IPv6 routing headers, and another about IP options. This seems to be exactly the kind of problems that Mike Lynn tried to point out in 2005. Since we have many IOS-based devices (more than 100), we have to upgrade them to non-vulnerable versions as soon as possible.
In order to find out which of our devices need to be upgraded, I first started to write a Perl script that gets the installed IOS version of each device from our RANCID repository, and then checks whether the version is vulnerable. When I wrote the function that checks an IOS version for vulnerability (starting from the table of vulnerable/fixed IOS versions in one of the advisories), I noticed that this was a lot of work, and would have to be repeated for each of the three advisories.
So I decided to go one step further in automation, and write another Perl script that parses a Cisco security advisory in HTML format, and extracts the IOS-version vulnerability information from that. Being a Lisp programmer, I thought that the most natural output for this parser would be a function which would take an IOS version at the input and give either "this version is safe" or an upgrade recommendation as output.
Then the original Perl script would call out to this generated function for each router. It took me almost two days (with lots of interrupts from other work) to write this until I was happy with the result. I now have a crontab entry that runs the script several times per working day and e-mails me a report with a summary (N routers vulnerable, M routers should be upgraded) and a detailed entry for each router. Current status: 31 routers are safe, 80 should be upgraded. Oh well...
The scripts work OK, but aren't documented and have to be adapted for other RANCID installations. If you want them anyway, just send me mail (simon "at" switch "dot" ch).
The advisory-parsing script was the more interesting to write. I decided to use HTML::TreeBuilder to turn the HTML page into a DOM-like tree. That turned out to be a good decision, because that module makes it easy to grovel around in those trees. Outputting the Perl code was quite ugly, and made me want to use Lisp, where this is much more natural. I incrementally added parsers for all cases of vulnerable/fixed version information, and found that this worked quite well, with very few exceptions remaining unhandled. I found a few bugs (inconsistencies or missing information about specific versions) in the advisories as well, but I'm too lazy to find out where to send the reports.
Now we have to decide which IOS versions to upgrade to, schedule maintenance for rebooting the routers, open tickets, and perform the actual upgrades. Each of these tasks could benefit from some automation as well.
One difficulty is to decide which IOS image is suitable for a device (that has to be upgraded). This can depend on many factors, including device type, installed line cards, chassis type, available memory (RAM, flash/disk). This cannot be completely automated, because the set of usable images depends on factors that are hard to formalize, such as experience with these versions in testing and operations, weighted against the amount of risk one can take on a given router, depending on whether it's a redundant router or a single point of failure for a customer, etc. It would be great if one could extract an image's installation requirements from Web pages, but I don't even know how to get from image version to CCO Web page location.
Another part of the job to actually copy the desired IOS image to a suitable flash/disk partition on the router, and chance the boot configuration so that the new version comes up. This is quite a bit of work (especially if you have to do it 80 times), and should be easy enough to automate.
Scheduling good times for the reboots is something where I'm not sure whether automation makes a lot of sense. Many of our routers are completely protected by redundant routers, so they can basically be rebooted at any time. On the other hand, there are a few routers that lack out-of-band access, and one might want to defer upgrading those routers until one has collected experience with upgrades on similar routers that do have out-of-band, just to minimize the risk that something goes wrong.
Opening the tickets will be a lot of work too. A colleague from our team recently rewrote the ticket system in PHP, and I should ask her whether we can somehow write scripts to create tickets in bulk.