Thursday, January 3, 2008

Plan your Updates carefully

Here is a excerpt from a chat I had with a few Colleagues. Names and details may have been changed.

[18:16] Fred> I just got my nosed rubbed in an important systems update rule
of thumb
[18:17] tk> rtfm?
[18:17] tk> Or not before a holiday weekend?
[18:17] Fred> yep you hit it with your 2nd guess
[18:18] Fred> fallout not complete yet, but fortunately I'm not the one who
actually made the mistake
[18:18] tk> I avoid upgrading anything, except when I've got time
[18:19] tk> what did you upgrade?
[18:19] Fred> we have about a dozen checkpoint edge devices and push policy out
to them from a central server
[18:19] Fred> one one of them, the policy did not install correctly
[18:19] Fred> let me guess, not all in 1 location
[18:20] Fred> oh no, they're scattered all over the county, physically
[18:20] Fred> when this happens instead of using the previous policy they
behave as though no policy is present at all
[18:20] tk> Joy
[18:20] Fred> yep
[18:20] Fred> so, after determining the problem, I pushed the policy out to
that one box, and things started behaving normally
[18:21] tk> nice
[18:21] Fred> but, the device had been down for 4 days over the holidays
[18:21] tk> eek
[18:21] KL> good thing you have a layered defense model....
[18:21] Fred> I know - the PD being (ahem) protected was not happy
[18:22] Fred> they should have a fallback
[18:22] Fred> like, EVDO cards
[18:23] tk> or Dial up :)
[18:23] Fred> well, several things happened to make this last longer than it
should have
[18:23] Fred> one, the on-call pager person did not follow up on the initial
[18:24] Fred> this was a new guy, doing it for the first time, so he's
probably going to get off lightly
[18:25] Fred> but, zero, the policy should not have been pushed just before
a major holiday weekend
[18:25] Fred> that's the fundamental rule that was broken

A few "Rules" for Updating systems

  1. Do not update before a holiday weekend, vacation or business trip, unless you plan on working
  2. Communicate with remote locations, make sure they are aware of the upgrade
  3. Plan for stuff to go wrong, it probably will
  4. If your sites are spread over a multiple locations, have a plan to remedy the situation in a timely manner.
  5. Make sure all on site techs know that an upgrade is planned and issues related to the update need to be addressed promptly.
-- Tim Krabec


KL said...

and: never rely on just one layer of defense. Assume it will fail and have compensating controls in place.

Tim Krabec said...

Also test the deployment, as throughly as possibly.
(10:33 PM) Bob> I just noticed yesterday that websense mysteriously stopped filtering web traffic after a maintenance window in November.
(10:33 PM) tk> oops
(10:33 PM) Bob> speaking of controls....
(10:33 PM) tk> and testing
(10:33 PM) Bob> mmmhmmm
(10:34 PM) Bob> and lack of staff to ensure that everything is running as it should
(10:34 PM) tk> *GASP* you're not properly staffed
(10:34 PM) Bob> i know.....*shock*