Generate enterprise-grade SSH keys and load them into an agent for control
of all kinds of Linux hosts. Script the agent with the Parallel Distributed
Shell (pdsh) to effect rapid changes over your server farm.
Advice for Buying and Setting Up Laptops When You’re Traveling or On-Call
Sun, 01/28/2018 – 08:08
Why stress over losing that expensive personal or work laptop? Buy a cheap
one for risky situations.
In a previous article, I wrote about how to prepare for a vacation so you aren’t
disturbed by a work emergency. As part of that article, I described how to
prepare your computer:
Even better than taking a backup, leave your expensive work computer behind
and use a cheaper more disposable machine for travel, and just restore your
important files and settings for work on it before you leave and wipe it when
you return. If you decide to go the disposable computer route, I recommend
working one or two full work days on this computer before the vacation to
make sure all of your files and settings are in place.
It turns out that this advice works not just for travel but also for a laptop you
take with you while on call. So in this article, I elaborate on the above advice
and describe some strategies for choosing and setting up an appropriate laptop
to take with you while on call or traveling.
I was faced with the dilemma of choosing a travel laptop when I went on vacation a
few months ago. I needed to be reachable while on vacation, just in case, but I
knew I didn’t want to lug around and cross borders with an expensive company
laptop. There are a number of reasons why this is a good idea, and most of the
reasons you would want to use a separate, cheap laptop for travel also apply
for an on-call laptop.
Although it’s true that your laptop might get lost, stolen or damaged while you
commute to work, it’s much more likely to happen outside your normal work
routine. While you are on call, you might take your laptop to restaurants, bars,
events or a friend’s house, and because you are outside your normal routine,
it’s more likely that it will be stolen or that you might accidentally leave it
when you are commuting to work, you likely have some kind of backpack or case for
your laptop, but outside work, you may be more likely just to throw your laptop
in the trunk of your car.
While traveling, especially traveling abroad, you are most definitely outside
your normal routine, and a laptop is even more likely to get lost, damaged or
stolen. The more expensive laptop you have with you, the more enticing of a
target, and the more you have to lose. Also, with increased security around
airports and customs these days, laptops are more likely to be inspected,
confiscated or forced into checked luggage. Plus, if you do have to put
your laptop in checked luggage, these days, you must lock your luggage with keys
that security agents can unlock. Unfortunately, there are many stories of
unscrupulous airport employees who have taken advantage of this fact to steal
high-value items from luggage while it’s out of its owner’s possession.
Having a second laptop that’s ready at any moment to take over work duties adds an
extra backup in case your work laptop itself breaks. Instead of being out of
commission while you are waiting for a replacement, you immediately can resume
work on your backup. It also provides you with a backup in case you leave your
work laptop at the office.
The key to a good on-call or travel laptop is to get
something cheap. As computers
have continued to get faster, the fact is that many people can get their general
work done (especially in a pinch) with laptops that are many years old. This is
especially true on a Linux desktop, even if you aren’t someone who spends a decent
amount of time on a terminal.
Used Thinkpads are a great choice for travel laptops, because they have good Linux
compatibility and are rugged and easy to repair with replacement parts that are
easy to find. Because so many organizations have used them as company laptops,
you almost always can find a used one cheap on an auction site. Keep an eye out
for a model that is listed as having no OS. Those laptops tend to be cheaper
because people want to avoid having to install an OS, but as Linux users, we would
just overwrite the OS anyway! I’ve consistently found that if I’m patient, I can
get a Thinkpad with reasonable specs for less than $50 on auction sites. If you are
willing to splurge on extra RAM or an SSD, these old machines can be surprisingly
Another option, especially if you want a more portable laptop, is a Chromebook.
Although these machines normally are designed to run a limited, secured OS that
centers on Google services, they also can run Linux well once you switch into
developer mode. Some people use cheap Chromebooks as their default travel
computers since they just want to check Gmail and browse the web while traveling.
I found a used Acer C710 for $40 and was able to add RAM and an SSD from a spare
Thinkpad, and it turned out to be a rather capable Qubes-compatible machine.
I use Qubes both on my work and personal laptops, and I’ve long used its
built-in backup and restore tool whenever I travel to make sure I have a fresh
backup in case my laptop is lost or stolen. Now that I rely on a separate laptop
for travel, I just restore that fresh backup onto my travel machine and test it
by working on it for a day before the trip. This also means I can selectively
restore only the files and settings (appVMs in my case) that are relevant for the
situation. In the case of its use as an on-call computer, I don’t have to worry as
much about fresh backups as long as all of my VPN, SSH and other credentials are
kept up to date.
Since most people don’t use Qubes, just take advantage of whatever tool you prefer
to back up your laptop (you do back up your laptop regularly, don’t you?) and restore
onto your spare computer as regularly as you need to keep important files up to
date. Given that you are doing this to protect against the laptop being lost or stolen,
be sure to enable full disk encryption when you install the OS to help protect
your sensitive files just in case. For those of you who are extra
security-conscious, you can take the additional step of wiping and re-installing your OS
whenever you return from a long trip, just in case you are worried about any
malware that found its way on your computer while you were on untrusted
In general, I highly recommend selecting a cheap laptop for your on-call and travel
computer. You will find you have extra peace of mind knowing that not only
will it be inexpensive to replace your laptop if it’s lost, broken or
stolen, but also that you
when you return home, you can get on your regular computer
and get right back to work.
Avoiding Server Disaster
Reuven M. Lerner
Wed, 01/17/2018 – 08:46
Worried that your server will go down? You should be. Here are some
disaster-planning tips for server owners.
If you own a car or a house, you almost certainly have
insurance. Insurance seems like a huge waste of money. You pay it
every year and make sure that you get the best possible price for the
best possible coverage, and then you hope you never need to use the
insurance. Insurance seems like a really bad deal—until you have a
disaster and realize that had it not been for the insurance, you
might have been in financial ruin.
Unfortunately, disasters and mishaps are a fact of life in the
computer industry. And so, just as you pay insurance and hope never to
have to use it, you also need to take time to ensure the safety and
reliability of your systems—not because you want disasters to happen,
or even expect them to occur, but rather because you have to.
If your website is an online brochure for your company and then goes
down for a few hours or even days, it’ll be embarrassing and
annoying, but not financially painful. But, if your website is your
business, when your site goes down, you’re losing money. If
that’s the case, it’s crucial to ensure that your server and
software are not only unlikely to go down, but also easily recoverable if
and when that happens.
Why am I writing about this subject? Well, let’s just say that this
particular problem hit close to home for me, just before I started to
write this article. After years of helping clients around the world to
ensure the reliability of their systems, I made the mistake of not
being as thorough with my own. (“The shoemaker’s children go
barefoot”, as the saying goes.) This means that just after launching
my new online product for Python developers, a seemingly trivial
upgrade turned into a disaster. The precautions I put in place,
it turns out, weren’t quite enough—and as I write this, I’m still
putting my web server together. I’ll survive, as will my server and
business, but this has been a painful and important lesson—one that
I’ll do almost anything to avoid repeating in the future.
So in this article, I describe a number of techniques I’ve used to keep
servers safe and sound through the years, and to reduce the chances of a
complete meltdown. You can think of these techniques as insurance for
your server, so that even if something does go wrong, you’ll be
able to recover fairly quickly.
I should note that most of the advice here assumes no redundancy in
your architecture—that is, a single web server and (at most) a
single database server. If you can afford to have a bunch of servers
of each type, these sorts of problems tend to be much less frequent.
However, that doesn’t mean they go away entirely. Besides, although
people like to talk about heavy-duty web applications that require massive
iron in order to run, the fact is that many businesses run on small,
one- and two-computer servers. Moreover, those businesses don’t need
more than that; the ROI (return on investment) they’ll get from
additional servers cannot be justified. However, the ROI from a good
backup and recovery plan is huge, and thus worth the investment.
Before I can talk about disaster preparation and recovery, it’s
important to consider the different parts of a web application and
what those various parts mean for your planning.
For many years, my website was trivially small and simple. Even if it
contained some simple programs, those generally were used for sending
email or for dynamically displaying different assets to visitors.
CSS. No database or other excitement was necessary.
At the other end of the spectrum, many people have full-blown web
applications, sitting on multiple servers, with one or more databases
and caches, as well as HTTP servers with extensively edited
But even when considering those two extremes, you can see that a web
application consists of only a few parts:
The application software itself.
Static assets for that application.
Configuration file(s) for the HTTP server(s).
Database configuration files.
Database schema and contents.
Assuming that you’re using a high-level language, such as Python, Ruby
turned into one. (All databases make it possible to “dump” their
contents onto disk, into a format that then can be loaded back into
the database server.)
Consider a site containing only application software, static assets
and configuration files. (In other words, no database is involved.)
In many cases, such a site can be backed up reliably in Git. Indeed,
I prefer to keep my sites in Git, backed up on a commercial hosting
service, such as GitHub or Bitbucket, and then deployed using a system
In other words, you develop the site on your own development machine.
Whenever you are happy with a change that you’ve made, you commit the
change to Git (on your local machine) and then do a
git to your
central repository. In order to deploy your application, you then use
Capistrano to do a
cap deploy, which reads the data from the central
repository, puts it into the appropriate place on the server’s
filesystem, and you’re good to go.
This system keeps you safe in a few different ways. The code itself is
located in at least three locations: your development machine, the
server and the repository. And those central repositories tend to be
fairly reliable, if only because it’s in the financial interest of the
hosting company to ensure that things are reliable.
I should add that in such a case, you also should include the HTTP
server’s configuration files in your Git repository. Those files
aren’t likely to change very often, but I can tell you from experience,
if you’re recovering from a crisis, the last thing you want
to think about is how your Apache configuration files should look.
Copying those files into your Git repository will work just fine.
You could argue that the difference between a “website” and a “web
application” is a database. Databases long have powered the back ends
of many web applications and for good reason—they allow you to
store and retrieve data reliably and flexibly. The power that modern
open-source databases provides was unthinkable just a decade or two
ago, and there’s no reason to think that they’ll be any less reliable
in the future.
And yet, just because your database is pretty reliable doesn’t mean
that it won’t have problems. This means you’re going to want to
keep a snapshot (“dump”) of the database’s contents around, in case
the database server corrupts information, and you need to roll back to
a previous version.
My favorite solution for such a problem is to dump the database on a
regular basis, preferably hourly. Here’s a shell script I’ve
used, in one form or another, for creating such regular database
#!/bin/sh BACKUP_ROOT="/home/database-backups/" YEAR=`/bin/date +'%Y'` MONTH=`/bin/date +'%m'` DAY=`/bin/date +'%d'` DIRECTORY="$BACKUP_ROOT/$YEAR/$MONTH/$DAY" USERNAME=dbuser DATABASE=dbname HOST=localhost PORT=3306 /bin/mkdir -p $DIRECTORY /usr/bin/mysqldump -h $HOST --databases $DATABASE -u $USERNAME ↪| /bin/gzip --best --verbose > ↪$DIRECTORY/$DATABASE-dump.gz
The above shell script starts off by defining a bunch of variables,
from the directory in which I want to store the backups, to the parts
of the date (stored in $YEAR, $MONTH and $DAY). This is so I
can have a separate directory for each day of the month. I could, of
course, go further and have separate directories for each hour, but
I’ve found that I rarely need more than one backup from a day.
Once I have defined those variables, I then use the
mkdir command to
create a new directory. The
-p option tells
mkdir that if
necessary, it should create all of the directories it needs such that
the entire path will exist.
Finally, I then run the database’s “dump” command. In this
particular case, I’m using MySQL, so I’m using the
command. The output from this command is a stream of SQL that can be
used to re-create the database. I thus take the output from
mysqldump and pipe it into
gzip, which compresses the output
file. Finally, the resulting dumpfile is placed, in compressed form,
inside the daily backup directory.
Depending on the size of your database and the amount of disk space
you have on hand, you’ll have to decide just how often you want to run
dumps and how often you want to clean out old ones. I know from
experience that dumping every hour can cause some load problems. On
one virtual machine I’ve used, the overall administration team was
unhappy that I was dumping and compressing every hour, which they saw
as an unnecessary use of system resources.
If you’re worried your system will run out of disk space,
you might well want to run a space-checking program that’ll alert you
when the filesystem is low on free space. In addition, you can run a
cron job that uses
find to erase all dumpfiles from before a certain
cutoff date. I’m always a bit nervous about programs that
automatically erase backups, so I generally prefer not to do this.
Rather, I run a program that warns me if the disk usage is going above
85% (which is usually low enough to ensure that I can fix the problem
in time, even if I’m on a long flight). Then I can go in and remove
the problematic files by hand.
When you back up your database, you should be sure to back up the
configuration for that database as well. The database schema and
data, which are part of the dumpfile, are certainly important.
However, if you find yourself having to re-create your server from
scratch, you’ll want to know precisely how you configured the database
server, with a particular emphasis on the filesystem configuration and
memory allocations. I tend to use PostgreSQL for most of my work, and
although postgresql.conf is simple to understand and configure, I still
like to keep it around with my dumpfiles.
Another crucial thing to do is to check your database
dumps occasionally to be sure that they are working the way you want. It turns
out that the backups I thought I was making weren’t actually
happening, in no small part because I had modified the shell script
and hadn’t double-checked that it was creating useful backups.
Occasionally pulling out one of your dumpfiles and restoring it to a
separate (and offline!) database to check its integrity is a good
practice, both to ensure that the dump is working and that you
remember how to restore it in the case of an emergency.
But wait. It might be great to have these backups, but what if the
server goes down entirely? In the case of the code, I mentioned
to ensure that it was located on more than one machine, ensuring its
integrity. By contrast, your database dumps are now on the server,
such that if the server fails, your database dumps will be
This means you’ll want to have your database dumps stored
elsewhere, preferably automatically. How can you do that?
There are a few relatively easy and inexpensive solutions to this
problem. If you have two servers—ideally in separate physical
locations—you can use
rsync to copy the files from one to the
rsync the database’s actual files, since those might get
corrupted in transfer and aren’t designed to be copied when the
server is running. By contrast, the dumpfiles that you have created
are more than able to go elsewhere. Setting up a remote server, with
a user specifically for handling these backup transfers, shouldn’t be
too hard and will go a long way toward ensuring the safety of your
I should note that using
rsync in this way basically requires that you
set up passwordless SSH, so that you can transfer without having to be
physically present to enter the password.
Another possible solution is Amazon’s Simple Storage Server (S3),
which offers astonishing amounts of disk space at very low prices. I
know that many companies use S3 as a simple (albeit slow) backup
system. You can set up a cron job to run a program that copies the
contents of a particular database dumpfile directory onto a particular
server. The assumption here is that you’re not ever going to use
these backups, meaning that S3’s slow searching and access will not be
an issue once you’re working on the server.
Similarly, you might consider using Dropbox. Dropbox is best known
for its desktop client, but it has a “headless”, text-based client
that can be used on Linux servers without a GUI connected. One nice
advantage of Dropbox is that you can share a folder with any number of
people, which means you can have Dropbox
distribute your backup databases everywhere automatically, including to a number of
people on your team. The backups arrive in their Dropbox folder, and
you can be sure that the LAMP is conditional.
Finally, if you’re running a WordPress site, you might want to
consider VaultPress, a for-pay backup system. I must admit that in
the weeks before I took my server down with a database backup error, I
kept seeing ads in WordPress for VaultPress. “Who would buy
asked myself, thinking that I’m smart enough to do backups myself. Of
course, after disaster occurred and my database was ruined, I realized
that $30/year to back up all of my data is cheap, and I should have
done it before.
When it comes to your servers, think less like an optimistic
programmer and more like an insurance agent. Perhaps disaster won’t
strike, but if it does, will you be able to recover? Making sure that
even if your server is completely unavailable, you’ll be able to bring
up your program and any associated database is crucial.
My preferred solution involves combining a Git repository for code and
configuration files, distributed across several machines and services.
For the databases, however, it’s not enough to dump your database;
you’ll need to get that dump onto a separate machine, and preferably
test the backup file on a regular basis. That way, even if things go
wrong, you’ll be able to get back up in no time.
Sysadmin Tips on Preparing for Vacation
Thu, 01/11/2018 – 08:35
Read on for ways to help reduce the chance that your vacation will be interrupted by sysadmin issues.
Every year or two my family and I like to take a vacation abroad. Normally,
vacation is a time to unplug, and if you are a sysadmin who’s on an on-call
rotation, someone else on the team typically takes over your on-call duties. Yet as you
progress in your career, you start to gain more expertise and responsibilities
over systems, and even with someone else on-call, there’s a certain class of
emergency where the team might need to reach out to you for help even when you’re on vacation.
I recently took a vacation abroad, and before I left, I went through a set of tasks
to reduce the chance that I would need to jump on an emergency while I was away.
So in this article, I describe some of the steps I take to prepare for a
vacation that will help you unplug on your next trip.
One of the first questions you should answer before going on vacation is whether you will
need to take your work laptop with you. Depending on your organization and its
security controls, you might be able to perform basic emergency administrative
tasks from your personal computer, tablet or phone, or you may be able
to connect to production only from your work computer. In other cases, you may not need
a computer, because you can just serve an advisory role over the phone or chat with
other people on the team and walk them through what to do in the event of an
If you do need to take your computer, I highly recommend making a full backup
before the trip. Your computer is more likely to be lost, stolen or broken while
than when sitting safely at the office, so I always take a backup of my work
machine before a trip. Even better than taking a backup, leave your expensive work
computer behind and use a cheaper more disposable machine for travel and just
restore your important files and settings for work on it before you leave and wipe
it when you return. If you decide to go the disposable computer route, I recommend
working one or two full work days on this computer before the vacation to make
sure all of your files and settings are in place.
Good documentation is the best way to reduce or eliminate how much you have to
step in when you aren’t on call, whether you’re on vacation or not. Everything from
routine procedures to emergency response should be documented and kept up to date.
Honestly, this falls under standard best practices as a sysadmin, so it’s something
you should have whether or not you are about to go on vacation.
First, all routine procedures from how you deploy code and configuration changes,
how you manage tickets, how you perform security patches, how you add and remove
users, and how the overall environment is structured should be documented in a
clear step-by-step way. If you use automation
tools for routine procedures, whether
it’s as simple as a few scripts or as complex as full orchestration tools, you
should make sure you document not only how to use the automation tools, but also
how to perform the same tasks manually should the automation tools fail.
If you are on call, that means you have a monitoring system in place that
scans your infrastructure for problems and pages you when it finds any. Every
single system check in your monitoring tool should have a corresponding playbook
sysadmin can follow to troubleshoot and fix the problem. If your monitoring tool
allows you to customize the alerts it sends, create corresponding wiki entries for
each alert name, and then customize the alert so that it provides a direct link to
the playbook in the wiki.
If you happen to be the subject-matter expert on a particular system, make sure
that documentation in particular is well fleshed out and understandable. These are
the systems that will pull you out of your vacation, so look through those
documents for any assumptions you may have made when writing them that a junior
member of the team might not understand. Have other members of the team review the
documentation and ask you questions.
One saying about documentation is that if something is documented in two places,
one of them will be out of date. Even if you document something only in one place,
there’s a good chance it is out of date unless you perform routine maintenance.
It’s a good practice to review your documentation from time to time and update it
where necessary and before a vacation is a particularly good time to do it. If you
are the only person that knows about the new way to perform a procedure, you
should make sure your documentation covers it.
Finally, have your team maintain a page to capture anything that happens while
you are gone that they want to tell you about when you get back. If you are the
main maintainer of a particular system, but they had to perform some emergency
maintenance of it while you were gone, that’s the kind of thing you’d like to know
about when you get back. If there’s a central place for the team to capture these
notes, they will be more likely to write things down as they happen and less likely to
forget about things when you get back.
The more stable your infrastructure is before you leave and the more stable it stays while
you are gone, the less likely you’ll be disturbed on your vacation. Right before a
vacation is a terrible time to make a major change to critical systems. If you
can, freeze changes in the weeks leading up to your vacation. Try to encourage
other teams to push off any major changes until after you get back.
Before a vacation is also a great time to perform any preventative maintenance on
your systems. Check for any systems about to hit a disk warning threshold and
clear out space. In general, if you collect trending data, skim through it for any
resources that are trending upward that might go past thresholds while you are
gone. If you have any tasks that might add extra load to your systems while you
are gone, pause or postpone them if you can. Make sure all of your backup scripts
are working and all of your backups are up to date.
Although it would be great to unplug completely while on vacation, there’s a chance
that someone from work might want to reach you in an emergency. Depending on where you
plan to travel, some contact options may work better than others. For instance, some
cell-phone plans that work while traveling might charge high rates for calls, but
text messages and data bill at the same rates as at home. If you plan to get a
local sim card, text messages sent over the cell network from home might cost more
than those sent over the data plan. In the event of a local sim card, you will have
to work out some way to communicate that new number to your team.
Discuss with your team what escalation path they should use to contact you in an
emergency. For instance, in my case, I knew my cell-phone plan would provide me with
unlimited text messages and the same data plan as at home, but I also didn’t want
work email to distract me. This presented a problem, as email is the primary
way I’m paged. In my case, I disabled email syncing while I was on vacation and
instructed everyone to contact me via text message in the case of emergency.
I also needed to be on the secondary escalation path for any alerts that
weren’t resolved within a certain amount of time, so I configured my monitoring
tool to use an email-to-SMS gateway as my email address for alerts.
If there are certain days when you know you (or your on-call counterpart at home)
might be in areas with limited cell coverage, work out those dates ahead of time
and put them in your calendar. If nothing else, it might encourage others to wait
on making a risky change if they know they absolutely will not be able to reach
you for the next two days. In general, set expectations on your availability, and
also make sure everyone takes any time zone differences into account.
Overall, a vacation should be a time for you to be completely removed from your
work’s on-call process. Whether that’s possible or not, the more you prepare ahead
of time, the less likely your vacation will be interrupted. Finally, when you get
back, do a post mortem with your team about anything that went wrong and any
documentation that was confusing or incomplete, so you can make improvements for
your next vacation.
Ansible: the Automation Framework That Thinks Like a Sysadmin
Fri, 01/05/2018 – 07:02
I’ve written about and trained folks on various DevOps tools through the years, and
although they’re awesome, it’s obvious that most of them are designed from the
mind of a developer. There’s nothing wrong with that, because approaching
configuration management programmatically is the whole point. Still,
it wasn’t until I started playing with Ansible that I felt like it was
something a sysadmin quickly would appreciate.
Part of that appreciation comes from the way Ansible communicates with its
client computers—namely, via SSH. As sysadmins, you’re all very familiar
with connecting to computers via SSH, so right from the word
have a better understanding of Ansible than the other alternatives.
With that in mind, I’m planning
to write a few articles exploring how to take advantage of
Ansible. It’s a great system, but when I was first exposed to it, it wasn’t
clear how to start. It’s not that the learning curve is steep. In fact,
if anything, the problem was that I didn’t really have that much to learn
before starting to use Ansible, and that made it confusing. For example,
if you don’t have to install an agent program (Ansible doesn’t have any
software installed on the client computers), how do you start?
The reason Ansible was so difficult for me at first is because it’s so
flexible with how to configure the server/client relationship, I
didn’t know what I was supposed to do. The truth is that Ansible doesn’t
really care how you set up the SSH system; it will utilize whatever
configuration you have. There are just a couple things to consider:
Ansible needs to connect to the client computer via SSH.
Once connected, Ansible needs to elevate privilege so it can configure
the system, install packages and so on.
Unfortunately, those two considerations really open a can of
worms. Connecting to a remote computer and elevating privilege is a
scary thing to allow. For some reason, it feels less vulnerable when you
simply install an agent on the remote computer and let Chef or Puppet
handle privilege escalation. It’s not that Ansible is any less secure,
but rather, it puts the security decisions in your hands.
Next I’m going to
list a bunch of potential configurations, along with the pros and cons
of each. This isn’t an exhaustive list, but it should get you thinking
along the right lines for what will be ideal in your environment. I
also should note that I’m not going to mention systems like Vagrant,
because although Vagrant is wonderful for building a quick infrastructure
for testing and developing, it’s so very different from a bunch of
servers that the considerations are too dissimilar really to compare.
1) SSHing into remote computer as root with password in Ansible
I started with a terrible idea. The “pros” of this setup is that it
eliminates the need for privilege escalation, and there are no other user
accounts required on the remote server. But, the cost for such convenience
isn’t worth it. First, most systems won’t let you SSH in as root without
changing the default configuration. Those default configurations are
there because, quite frankly, it’s just a bad idea to allow the root
user to connect remotely. Second, putting a root password in a plain-text configuration file on the Ansible machine is mortifying. Really,
I mentioned this possibility because it is a
possibility, but it’s one
that should be avoided. Remember, Ansible allows you to configure the
connection yourself, and it will let you do really dumb things. Please
2) SSHing into a remote computer as a regular user, using a password
stored in the Ansible config.
An advantage of this scenario is that it doesn’t require much
configuration of the clients. Most users are able to SSH in by default,
so Ansible should be able to use credentials and log in fine. I
personally dislike the idea of a password being stored in plain text
in a configuration file, but at least it isn’t the root password. If
you use this method, be sure to consider how privilege escalation
will take place on the remote server. I know I haven’t talked about
escalating privilege yet, but if you have a password in the config file,
that same password likely will be used to gain sudo access. So with one
slip, you’ve compromised not only the remote user’s account, but also
potentially the entire system.
3) SSHing into a remote computer as a regular user, authenticating with
a key pair that has an empty passphrase.
This eliminates storing passwords in a configuration file, at least for
the logging in part of the process. Key pairs without passphrases aren’t
ideal, but it’s something I often do in an environment like my house. On
my internal network, I typically use a key pair without a passphrase to
automate many things like cron jobs that require authentication. This
isn’t the most secure option, because a compromised private key means
unrestricted access to the remote user’s account, but I like it better
than a password in a config file.
4) SSHing into a remote computer as a regular user, authenticating with
a key pair that is secured by a passphrase.
This is a very secure way of handling remote access, because it
requires two different authentication factors: 1) the private key
and 2) the passphrase to decrypt it. If you’re just running Ansible
interactively, this might be the ideal setup. When you run a command,
Ansible should prompt you for the private key’s passphrase, and then it’ll use
the key pair to log in to the remote system. Yes, the same could be done
by just using a standard password login and not specifying the password
in the configuration file, but if you’re going to be typing a password
on the command line anyway, why not add the layer of protection a key
5) SSHing with a passphrase-protected key pair, but using ssh-agent to
“unlock” the private key.
This doesn’t perfectly answer the question of unattended, automated
Ansible commands, but it does make a fairly secure setup convenient
as well. The ssh-agent program authenticates the passphrase one time
and then uses that authentication to make future connections. When
I’m using Ansible, this is what I think I’d like to be doing. If I’m
completely honest, I still usually use key pairs without passphrases,
but that’s typically because I’m working on my home servers, not something
prone to attack.
There are some other considerations to keep in mind when configuring your SSH
environment. Perhaps you’re able to restrict the Ansible user (which
is often your local user name) so it can log in only from a specific IP
address. Perhaps your Ansible server can live in a different subnet,
behind a strong firewall so its private keys are more difficult to access
remotely. Maybe the Ansible server doesn’t have an SSH server installed on
itself so there’s no incoming access at all. Again, one of the strengths
of Ansible is that it uses the SSH protocol for communication, and it’s
a protocol you’ve all had years to tweak into a system that works best in
your environment. I’m not a big fan of proclaiming what the “best
is, because in reality, the best practice is to consider your environment
and choose the setup that fits your situation the best.
Once your Ansible server connects to its clients via SSH, it needs
to be able to escalate privilege. If you chose option 1 above,
you’re already root, and this is a moot point. But since no one
chose option 1 (right?), you need to consider how a regular user on
the client computer gains access. Ansible supports a wide variety of
escalation systems, but in Linux, the most common options are sudo and
su. As with SSH, there are a few situations to consider, although
there are certainly other options.
1) Escalate privilege with su.
For Red Hat/CentOS users, the instinct might be to use su in order
to gain system access. By default, those systems configure the root
password during install, and to gain privileged access, you need to
type it in. The problem with using su is that although it gives you total
access to the remote system, it also gives you total access to the remote
system. (Yes, that was sarcasm.) Also, the su program doesn’t have the
ability to authenticate with key pairs, so the password either must be
interactively typed or stored in the configuration file. And since it’s
literally the root password, storing it in the config file should sound
like a horrible idea, because it is.
2) Escalate privilege with sudo.
This is how Debian/Ubuntu systems are configured. A user in the
correct group has access to
sudo a command and execute it with root
privileges. Out of the box, this still has the problem of password
storage or interactive typing. Since storing the user’s password in the
configuration file seems a little less horrible, I guess this is a step
up from using su, but it still gives complete access to a system if the
password is compromised. (After all, typing
sudo su will allow
users to become root just as if they had the root password.)
3) Escalate privilege with sudo and configure NOPASSWD in the sudoers
Again, in my local environment, this is what I do. It’s not perfect,
because it gives unrestricted root access to the user account and doesn’t
require any passwords. But when I do this, and use SSH key pairs without
passphrases, it allows me to automate Ansible commands easily. I’ll note
again, that although it is convenient, it is not a terribly secure idea.
4) Escalate privilege with sudo and configure NOPASSWD on specific
This idea might be the best compromise of security and
convenience. Basically, if you know what you plan to do with Ansible,
you can give NOPASSWD privilege to the remote user for just those
applications it will need to use. It might get a little confusing,
since Ansible uses Python for lots of things, but with enough trial
and error, you should be able to figure things out. It is more work,
but does eliminate some of the glaring security holes.
Once you decide how you’re going to handle Ansible authentication and
privilege escalation, you need to set it up. After you become well versed
at Ansible, you might be able to use the tool itself to help
new clients, but at first, it’s important to configure clients manually so
you know what’s happening. It’s far better to automate a process you’re
familiar with than to start with automation from the beginning.
I’ve written about SSH key pairs in the past, and there are countless
articles online for setting it up. The short version, from your Ansible
computer, looks something like this:
# ssh-keygen # ssh-copy-id -i .ssh/id_dsa.pub firstname.lastname@example.org # ssh email@example.com
If you’ve chosen to use no passphrase when creating your key pairs,
that last step should get you into the remote computer without typing
a password or passphrase.
In order to set up privilege escalation in sudo, you’ll need to edit
the sudoers file. You shouldn’t edit the file directly, but rather use:
# sudo visudo
This will open the sudoers file and allow you to make changes safely
(it error-checks when you save, so you don’t accidentally lock yourself
out with a typo). There are examples in the file, so you should be able
to figure out how to assign the exact privileges you want.
Once it’s all configured, you should test it manually before bringing
Ansible into the picture. Try SSHing to the remote client, and then try
escalating privilege using whatever methods you’ve chosen. Once you have
configured the way you’ll connect, it’s time to install Ansible.
Since the Ansible program gets installed only on the single computer,
it’s not a big chore to get going. Red Hat/Ubuntu systems do package
installs a bit differently, but neither is difficult.
In Red Hat/CentOS, first
enable the EPEL repository:
sudo yum install epel-release
Then install Ansible:
sudo yum install ansible
In Ubuntu, first
enable the Ansible PPA:
sudo apt-add-repository spa:ansible/ansible (press ENTER to access the key and add the repo)
Then install Ansible:
sudo apt-get update sudo apt-get install ansible
The Ansible system has no way of knowing which clients you want it to
control unless you give it a list of computers. That list is very simple,
and it looks something like this:
# file /etc/ansible/hosts [webservers] blogserver ansible_host=192.168.1.5 wikiserver ansible_host=192.168.1.10 [dbservers] mysql_1 ansible_host=192.168.1.22 pgsql_1 ansible_host=192.168.1.23
The bracketed sections are specifying groups. Individual hosts can be
listed in multiple groups, and Ansible can refer either to individual
hosts or groups. This is also the configuration file where things like
plain-text passwords would be stored, if that’s the sort of setup you’ve
planned. Each line in the configuration file configures a single host, and
you can add multiple declarations after the
ansible_host statement. Some
useful options are:
ansible_ssh_pass ansible_become ansible_become_method ansible_become_user ansible_become_pass
I also should note that although the setup is more complex, and not something
you’ll likely do during your first foray into the world of Ansible,
the program does offer a way to encrypt passwords in a vault. Once
you’re familiar with Ansible and you want to put it into production,
storing those passwords in an encrypted Ansible vault is ideal. But in
the spirit of learning to crawl before you walk, I recommend starting in
a non-production environment and using passwordless methods at first.
Finally, you should test your system to make sure your clients are
connecting. The ping test will make sure the Ansible computer can ping
ansible -m ping all
After running, you should see a message for each defined host showing
ping: pong if the ping was successful. This doesn’t actually test
authentication, just the network connectivity. Try this to test your
ansible -m shell -a 'uptime' webservers
You should see the results of the
uptime command for each host in the
In a future article, I plan start to dig in to Ansible’s ability to manage the
remote computers. I’ll look at various modules and how you can use
the ad-hoc mode to accomplish in a few keystrokes what would take a
long time to handle individually on the command line. If you didn’t
get the results you expected from the sample Ansible commands above,
take this time to make sure authentication is working. Check
out the Ansible docs for more help if you get stuck.
I’ve written about and trained folks on various DevOps tools through the years, and
although they’re awesome, it’s obvious that most of them are designed from the
mind of a developer. There’s nothing wrong with that, because approaching
configuration management programmatically is the whole point. more>>
A few articles ago, I started a Sysadmin 101 series to pass down some fundamental
knowledge about systems administration that the current generation of junior
sysadmins, DevOps engineers or “full stack” developers might not
I had thought that I was done with the series, but then the WannaCry
malware came out and exposed some of the poor patch management practices still more>>