Read on for ways to help reduce the chance that your vacation will be interrupted by sysadmin issues.
Every year or two my family and I like to take a vacation abroad. Normally,
vacation is a time to unplug, and if you are a sysadmin who’s on an on-call
rotation, someone else on the team typically takes over your on-call duties. Yet as you
progress in your career, you start to gain more expertise and responsibilities
over systems, and even with someone else on-call, there’s a certain class of
emergency where the team might need to reach out to you for help even when you’re on vacation.
I recently took a vacation abroad, and before I left, I went through a set of tasks
to reduce the chance that I would need to jump on an emergency while I was away.
So in this article, I describe some of the steps I take to prepare for a
vacation that will help you unplug on your next trip.
Preparing Your Computer
One of the first questions you should answer before going on vacation is whether you will
need to take your work laptop with you. Depending on your organization and its
security controls, you might be able to perform basic emergency administrative
tasks from your personal computer, tablet or phone, or you may be able
to connect to production only from your work computer. In other cases, you may not need
a computer, because you can just serve an advisory role over the phone or chat with
other people on the team and walk them through what to do in the event of an
If you do need to take your computer, I highly recommend making a full backup
before the trip. Your computer is more likely to be lost, stolen or broken while
than when sitting safely at the office, so I always take a backup of my work
machine before a trip. Even better than taking a backup, leave your expensive work
computer behind and use a cheaper more disposable machine for travel and just
restore your important files and settings for work on it before you leave and wipe
it when you return. If you decide to go the disposable computer route, I recommend
working one or two full work days on this computer before the vacation to make
sure all of your files and settings are in place.
Good documentation is the best way to reduce or eliminate how much you have to
step in when you aren’t on call, whether you’re on vacation or not. Everything from
routine procedures to emergency response should be documented and kept up to date.
Honestly, this falls under standard best practices as a sysadmin, so it’s something
you should have whether or not you are about to go on vacation.
First, all routine procedures from how you deploy code and configuration changes,
how you manage tickets, how you perform security patches, how you add and remove
users, and how the overall environment is structured should be documented in a
clear step-by-step way. If you use automation
tools for routine procedures, whether
it’s as simple as a few scripts or as complex as full orchestration tools, you
should make sure you document not only how to use the automation tools, but also
how to perform the same tasks manually should the automation tools fail.
If you are on call, that means you have a monitoring system in place that
scans your infrastructure for problems and pages you when it finds any. Every
single system check in your monitoring tool should have a corresponding playbook
sysadmin can follow to troubleshoot and fix the problem. If your monitoring tool
allows you to customize the alerts it sends, create corresponding wiki entries for
each alert name, and then customize the alert so that it provides a direct link to
the playbook in the wiki.
If you happen to be the subject-matter expert on a particular system, make sure
that documentation in particular is well fleshed out and understandable. These are
the systems that will pull you out of your vacation, so look through those
documents for any assumptions you may have made when writing them that a junior
member of the team might not understand. Have other members of the team review the
documentation and ask you questions.
One saying about documentation is that if something is documented in two places,
one of them will be out of date. Even if you document something only in one place,
there’s a good chance it is out of date unless you perform routine maintenance.
It’s a good practice to review your documentation from time to time and update it
where necessary and before a vacation is a particularly good time to do it. If you
are the only person that knows about the new way to perform a procedure, you
should make sure your documentation covers it.
Finally, have your team maintain a page to capture anything that happens while
you are gone that they want to tell you about when you get back. If you are the
main maintainer of a particular system, but they had to perform some emergency
maintenance of it while you were gone, that’s the kind of thing you’d like to know
about when you get back. If there’s a central place for the team to capture these
notes, they will be more likely to write things down as they happen and less likely to
forget about things when you get back.
The more stable your infrastructure is before you leave and the more stable it stays while
you are gone, the less likely you’ll be disturbed on your vacation. Right before a
vacation is a terrible time to make a major change to critical systems. If you
can, freeze changes in the weeks leading up to your vacation. Try to encourage
other teams to push off any major changes until after you get back.
Before a vacation is also a great time to perform any preventative maintenance on
your systems. Check for any systems about to hit a disk warning threshold and
clear out space. In general, if you collect trending data, skim through it for any
resources that are trending upward that might go past thresholds while you are
gone. If you have any tasks that might add extra load to your systems while you
are gone, pause or postpone them if you can. Make sure all of your backup scripts
are working and all of your backups are up to date.
Emergency Contact Methods
Although it would be great to unplug completely while on vacation, there’s a chance
that someone from work might want to reach you in an emergency. Depending on where you
plan to travel, some contact options may work better than others. For instance, some
cell-phone plans that work while traveling might charge high rates for calls, but
text messages and data bill at the same rates as at home. If you plan to get a
local sim card, text messages sent over the cell network from home might cost more
than those sent over the data plan. In the event of a local sim card, you will have
to work out some way to communicate that new number to your team.
Discuss with your team what escalation path they should use to contact you in an
emergency. For instance, in my case, I knew my cell-phone plan would provide me with
unlimited text messages and the same data plan as at home, but I also didn’t want
work email to distract me. This presented a problem, as email is the primary
way I’m paged. In my case, I disabled email syncing while I was on vacation and
instructed everyone to contact me via text message in the case of emergency.
I also needed to be on the secondary escalation path for any alerts that
weren’t resolved within a certain amount of time, so I configured my monitoring
tool to use an email-to-SMS gateway as my email address for alerts.
If there are certain days when you know you (or your on-call counterpart at home)
might be in areas with limited cell coverage, work out those dates ahead of time
and put them in your calendar. If nothing else, it might encourage others to wait
on making a risky change if they know they absolutely will not be able to reach
you for the next two days. In general, set expectations on your availability, and
also make sure everyone takes any time zone differences into account.
Overall, a vacation should be a time for you to be completely removed from your
work’s on-call process. Whether that’s possible or not, the more you prepare ahead
of time, the less likely your vacation will be interrupted. Finally, when you get
back, do a post mortem with your team about anything that went wrong and any
documentation that was confusing or incomplete, so you can make improvements for
your next vacation.