Category Archives: Linux Journal Sysadmin

Linux Journal Sysadmin

Ansible, Part IV: Putting It All Together

Ansible, Part IV: Putting It All Together

Image

Shawn Powers
Fri, 03/02/2018 – 09:31


Roles are the most complicated and yet simplest aspect of
Ansible to learn.

I’ve mentioned before that Ansible’s ad-hoc mode often is overlooked
as just a way to learn how to use Ansible. I couldn’t disagree with
that mentality any more fervently than I already do. Ad-hoc mode is
actually what I tend to use most often on a day-to-day basis. That said,
using playbooks and roles are very powerful ways to utilize Ansible’s
abilities. In fact, when most people think of Ansible, they tend to
think of the roles feature, because it’s the way most Ansible code is
shared. So first, it’s important to understand the relationship
between ad-hoc mode, playbooks and roles.

Ad-hoc Mode

This is a bit of a review, but it’s easy to forget once you start creating
playbooks. Ad-hoc mode is simply a one-liner that uses an Ansible module
to accomplish a given task on a set of computers. Something like:


ansible cadlab -b -m yum -a "name=vim state=latest"

will install vim on every computer in the cadlab group. The
-b
signals to elevate privilege (“become” root), the
-m means to use the
yum module, and the -a says
what actions to take. In this case,
it’s installing the latest version of vim.

Usually when I use ad-hoc mode to install packages, I’ll follow up with
something like this:


ansible cadlab -b -m service -a "name=httpd state=started
 ↪enabled=yes"

That one-liner will make sure that the httpd service is running and set
to start on boot automatically (the latter is what “enabled” means). Like
I said at the beginning, I most often use Ansible’s ad-hoc mode on a
day-to-day basis. When a new rollout or upgrade needs to happen though,
that’s when it makes sense to create a playbook, which is a text file
that contains a bunch of Ansible commands.

Playbook Mode

I described playbooks in my last article. They are YAML- (Yet Another Markup
Language) formatted text files that contain a list of things for Ansible
to accomplish. For example, to install Apache on a lab full of computers, you’d create
a file something like this:


---

- hosts: cadlab
  tasks:
  - name: install apache2 on CentOS
    yum: name=httpd state=latest
    notify: start httpd
    ignore_errors: yes

  - name: install apache2 on Ubuntu
    apt: update_cache=yes name=apache2 state=latest
    notify: start apache2
    ignore_errors: yes

  handlers:
  - name: start httpd
    service: name=httpd enable=yes state=started

  - name: start apache2
    service: name=apache2 enable=yes state=started

Mind you, this isn’t the most elegant playbook. It contains a single
play that tries to install httpd with yum and apache2 with apt. If
the lab is a mix of CentOS and Ubuntu machines, one or the other of the
installation methods will fail. That’s why the
ignore_errors command
is in each task. Otherwise, Ansible would quit when it encountered an
error. Again, this method works, but it’s not pretty. It would be much
better to create conditional statements that would allow for a graceful
exit on incompatible platforms. In fact, playbooks that are more complex
and do more things tend to evolve into a “role” in Ansible.

Roles

Roles aren’t really a mode of operation. Actually, roles are an integral
part of playbooks. Just like a playbook can have tasks, variables and
handlers, they can also have roles. Quite simply, roles are just a way
to organize the various components referenced in playbooks. It starts
with a folder layout:


roles/
  webserver/
    tasks/
      main.yml
    handlers/
      main.yml
    vars/
      main.yml
    templates/
      index.html.j2
      httpd.conf.j2
    files/
      ntp.conf

Ansible looks for a roles folder in the current directory, but also
in a system-wide location like /etc/ansible/roles, so you can store your
roles to keep them organized and out of your home folder. The advantage
of using roles is that your playbooks can look as simple as this:


---

- hosts: cadlab
  roles:
    - webserver

And then the “webserver” role will be applied to the group
“cadlab”
without needing to type any more information inside your playbook. When
a role is specified, Ansible looks for a folder matching the name
“webserver” inside your roles folder (in the current directory or
the system-wide directory). It then will execute the tasks inside
webserver/tasks/main.yml. Any handlers mentioned in that playbook will
be searched for automatically in webserver/handlers/main.yml. Also,
any time files are referenced by a template module or file/copy module,
the path doesn’t need to be specified. Ansible automatically will look
inside webserver/files/ or /webserver/templates/ for the files.

Basically, using roles will save you lots of path declarations and include
statements. That might seem like a simple thing, but the organization
creates a standard that not only makes it easy to figure out what a
role does, but also makes it easy to share your code with others. If you always
know any files must be stored in roles/rolename/files/, it means you can
share a “role” with others and they’ll know exactly what to do with
it—namely, just plop it in their own roles folder and start using it.

Sharing Roles: Ansible Galaxy

One of the best aspects of current DevOps tools like Chef, Puppet and
Ansible is that there is a community of people willing to share their
hard work. On a small scale, roles are a great way to share with your
coworkers, especially if you have roles that are customized specifically
for your environment. Since many of environments are similar, roles
can be shared with an even wider audience—and that’s where Ansible
Galaxy comes into play.

I’ll be honest, part of the draw for me with Ansible is the sci-fi theme
in the naming convention. I know I’m a bit silly in that regard, but
just naming something Ansible or Ansible Galaxy gets my attention. This
might be one of those “built by nerds, for nerds” sort of things. I’m
completely okay with that. If you head over to the Galaxy site,
you’ll find the online repository for shared roles—and there are a ton.

For simply downloading and using other people’s roles, you don’t need any
sort of account on Ansible Galaxy. You can search on the website by going
to Galaxy and clicking “Browse
Roles” on the left
side of the page (Figure 1). There are more than 13,000 roles currently uploaded
to Ansible Galaxy, so I highly recommend taking advantage of the search
feature! In Figure 2, you’ll see I’ve searched for “apache” and sorted by
“downloads” in order to find the most popular roles.

Figure 1. Click that link to browse and search for roles.

Figure 2. Jeff Geerling’s roles are always worth checking out.

Many of the standard roles you’ll find that are very popular are written
by Jeff Geerling, whose user name is geerlingguy. He’s an Ansible developer
who has written at least one Ansible book that I’ve read and possibly
others. He shares his roles, and I encourage you to check them out—not
only for using them, but also for seeing how he codes around issues like
conditionally choosing the correct module for a given distribution and
things like that. You can click on the role name and see all the code
involved. You might notice that if you want to examine the code, you need
to click on the GitHub link. That’s one of the genius moves of Ansible
Galaxy—all roles are stored on a user’s GitHub page as opposed
to an Ansible Galaxy server. Since most developers keep their code on
GitHub, they don’t need to remember to upload to Ansible Galaxy as well.

Incidentally, if you ever desire to share your own Ansible roles,
you’ll need to use a GitHub user name to upload them, because again,
roles are all stored on GitHub. But that’s getting ahead of things;
first you need to learn how to use roles in your environment.

Using ansible-galaxy to Install Roles

It’s certainly possible to download an entire repository and then unzip
the contents into your roles folder. Since they’re just text files
and structured folders, there’s not really anything wrong with doing
it that way. It’s just far less convenient than using the tools built in
to Ansible.

There is a search mechanism on the Ansible command line for searching
the Ansible Galaxy site, but in order to find a role I want to use, I
generally go to the website and find it, then use the command-line tools
to download and install it. Here’s an example of Jeff Geerling’s
“apache” role. In order to use Ansible to download a role, you
need to do this:


sudo ansible-galaxy install geerlingguy.apache

Notice two things. First, you need to execute this command with root
privilege. That’s because the ansible-galaxy command will install roles
in your system-wide roles folder, which isn’t writable (by default)
by your regular user account. Second, take note of the format of roles
named on Ansible Galaxy. The format is username.rolename, so in this
case, geerlingguy.apache, which is also how you reference the role
inside your playbooks.

If you want to see roles listed with the correct format, you can use
ansible-galaxy‘s search command, but like I said, I find it less than
useful because it doesn’t sort by popularity. In fact, I can’t figure
out what it sorts by at all. The only time I use the command-line
search feature is if I also use grep to narrow down roles by a single
person. Anyway, Figure 3 shows what the results of
ansible-galaxy search
look like. Notice the username.rolename format.

Figure 3. I love the command line, but these search results are
frustrating.

Once you install a role, it is immediately available for you to use
in your own playbooks, because it’s installed in the system-wide roles
folder. In my case, that’s /etc/ansible/roles (Figure 4). So now,
if I create a playbook like this:


---
- hosts: cadlab
  roles:
    - geerlingguy.apache

Apache will be installed on all my cadlab computers, regardless of
what distribution they’re using. If you want to see how the role (which
is just a bunch of tasks, handlers and so forth) works, just pick through the
folder structure inside /etc/ansible/roles/geerlingguy.apache/. It’s
all right there for you to use or modify.

Figure 4. Easy Peasy, Lemon Squeezy

Creating Your Own Roles

There’s really no magic here, since you easily can create a roles folder
and then create your own roles manually inside it, but
ansible-galaxy
does give you a shortcut by creating a skeleton role for you. Make sure
you have a roles folder, then just type:


ansible-galaxy init roles/rolename

and you’ll end up with a nicely created folder structure for your new
role. It doesn’t do anything magical, but as someone who has misspelled
“Templates” before, I can tell you it will save you a lot of frustration
if you have clumsy fingers like me.

Sharing Your Roles

If you get to the point where you want to share you roles on Ansible Galaxy,
it’s fairly easy to do. Make sure you have your role on GitHub (using
git is beyond the scope of this article, but using git and GitHub is a
great way to keep track of your code anyway). Once you have your roles
on GitHub, you can use ansible-galaxy to “import” them into the publicly
searchable Ansible Galaxy site. You first need to authenticate:


ansible-galaxy login

Before you try to log in with the command-line tool, be sure you’ve
visited the Ansible Galaxy website and logged in with your GitHub
account. You can see in Figure 5 that at first I was unable to log in. Then I
logged in on the website, and after that, I was able to log
in with the command-line tool successfully.

Figure 5. It drove me nuts trying to figure out why I couldn’t
authenticate.

Once you’re logged in, you can add your role by typing:


ansible-galaxy import githubusername githubreponame

The process takes a while, so you can add the
-no-wait option if you
want, and the role will be imported in the background. I really don’t
recommend doing this until you have created roles worth sharing. Keep
in mind, there are more than 13,000 roles on Ansible Galaxy, so there are many
“re-inventions of the wheel” happening.

From Here?

Well, it’s taken me four articles, but I think if you’ve been following
along, you should be to the point where you can
take it from here. Playbooks and roles are usually where people focus
their attention in Ansible, but I also encourage you to take advantage
of ad-hoc mode for day-to-day maintenance tasks. Ansible in some ways is
just another DevOps configuration management tool, but for me, it feels
the most like the traditional problem-solving solution that I used Bash
scripts to accomplish for decades. Perhaps I just like Ansible because
it thinks the same way I do. Regardless of your motivation, I encourage
you to learn Ansible enough so you can determine whether it fits into your
workflow as well as it fits into mine.

Resources

Here are links to the first three articles in this series by Shawn Powers:

Read More

Ansible, Part III: Playbooks

Ansible, Part III: Playbooks

Image

Shawn Powers
Mon, 02/19/2018 – 10:50

Playbooks make Ansible even more powerful than before.

To be quite honest, if Ansible had nothing but its ad-hoc mode, it
still would be a powerful and useful tool for automating large numbers of
computers. In fact, if it weren’t for a few features, I might consider
sticking with ad-hoc mode and adding a bunch of those ad-hoc commands to
a Bash script and be done with learning. Those few additional features,
however, make the continued effort well worth it.

Tame the Beast with YAML

Ansible goes out of its way to use an easy-to-read configuration file for
making “playbooks”, which are files full of separate Ansible
“tasks”. A
task is basically an ad-hoc command written out in a configuration file
that makes it more organized and easy to expand. The configuration
files use YAML, which stands for “Yet Another Markup Language”.
It’s an easy-to-read markup language, but it does rely on whitespace,
which isn’t terribly common with most config files. A simple playbook
looks something like this:


---

- hosts: webservers
  become: yes
  tasks:
    - name: this installs a package
      apt: name=apache2 update_cache=yes state=latest

    - name: this restarts the apache service
      service: name=apache2 enabled=yes state=restarted

The contents should be fairly easy to identify. It’s basically two
ad-hoc commands broken up into a YAML configuration file. There are a
few important things to notice. First, every filename ends with .yaml,
and every YAML file must begin with three hyphen characters. Also, as
mentioned above, whitespace matters. Finally, when a hyphen should
precede a section and when it should just be spaced appropriately often
is confusing. Basically every new section needs to start with a – symbol,
but it’s often hard to tell what should be its own section. Nevertheless,
it starts to feel natural as you create more and more playbooks.

The above playbook would be executed by typing:


ansible-playbook filename.yaml

And that is the equivalent of these two commands:


ansible webservers -b -m apt -a "name=apache2
 ↪update_cache=yes state=latest"
ansible webservers -b -m service -a "name=apache2
 ↪enabled=yes state=restarted"

Handling Your Handlers

But a bit of organization is really only the beginning of why playbooks
are so powerful. First off, there’s the idea of “Handlers”, which are
tasks that are executed only when “notified” that a task has made a
change. How does that work exactly? Let’s rewrite the above YAML file
to make the second task a handler:


---

- hosts: webservers
  become: yes
  tasks:
    - name: this installs a package
      apt: name=apache2 update_cache=yes state=latest
      notify: enable apache

  handlers:
    - name: enable apache
      service: name=apache2 enabled=yes state=started

On the surface, this looks very similar to just executing multiple
tasks. When the first task (installing Apache) executes, if a change
is made, it notifies the “enable apache” handler, which makes sure
Apache is enabled on boot and currently running. The significance is that
if Apache is already installed, and no changes are made, the handler
never is called. That makes the code much more efficient, but it also
means no unnecessary interruption of the already running Apache process.

There are other subtle time-saving issues with handlers too—for example, multiple
tasks can call a handler, but it executes only a single time regardless of
how many times it’s called. But the really significant thing to remember
is that handlers are executed (notified) only when an Ansible task makes
a change on the remote system.

Just the Facts, Ma’am

Variable substitution works quite simply inside a playbook. Here’s a simple
example:


---

- hosts: webservers
  become: yes
  vars:
    package_name: apache2
  tasks:
    - name: this installs a package
      apt: "name={{ package_name }} update_cache=yes state=latest"
      notify: enable apache

  handlers:
    - name: enable apache
      service: "name={{ package_name }} enabled=yes state=started"

It should be fairly easy to understand what’s happening above. Note
that I did put the entire module action section in quotes. It’s not
always required, but sometimes Ansible is funny about unquoted variable
substitutions, so I always try to put things in quotes when variables
are involved.

The really interesting thing about variables, however, are the “Gathered
Facts” about every host. You might notice when executing a playbook that
the first thing Ansible does is “Gathering Facts…”, which completes
without error, but doesn’t actually seem to do anything. What’s really
happening is that system information is getting populated into variables
that can be used inside a playbook. To see the entire list of “Gathered
Facts”, you can execute an ad-hoc command:


ansible webservers -m setup

You’ll get a huge list of facts generated from the individual hosts. Some
of them are particularly useful. For example,
ansible_os_family will
return something like “RedHat” or “Debian” depending on which distribution
you’re using. Ubuntu and Debian systems both return “Debian”, while
Red Hat
and CentOS will return “RedHat”. Although that’s certainly interesting
information, it’s really useful when different distros use different
tools—for example, apt vs. yum.

Getting Verbose

One of the frustrations of moving from Ansible ad-hoc commands to playbooks
is that in playbook mode, Ansible tends to keep fairly quiet with regard
to output. With ad-hoc mode, you often can see what is going on,
but with a playbook, you know only if it finished okay, and if a change
was made. There are two easy ways to change that. The first is just
to add the -v flag when executing
ansible-playbook. That adds verbosity
and provides lots of feedback when things are executed. Unfortunately,
it often gives so much information, that usefulness gets lost in the
mix. Still, in a pinch, just adding the -v flag helps.

If you’re creating a playbook and want to be notified of things along
the way, the debug module is really your friend. In ad-hoc mode, the
debug module doesn’t make much sense to use, but in a playbook, it can
act as a “reporting” tool about what is going on. For example:


---

- hosts: webservers
  tasks:
   - name: describe hosts
     debug: msg="Computer {{ ansible_hostname }} is running
      ↪{{ ansible_os_family }} or equivalent"

The above will show you something like Figure 1, which is incredibly useful when
you’re trying to figure out the sort of systems you’re using. The nice
thing about the debug module is that it can display anything you want,
so if a value changes, you can have it displayed on the screen so you can
troubleshoot a playbook that isn’t working like you expect it to work. It
is important to note that the debug module doesn’t do anything other than
display information on the screen for you. It’s not a logging system;
rather, it’s just a way to have information (customized information,
unlike the verbose flag) displayed during execution. Still, it can be
invaluable as your playbooks become more complex.

Figure 1. Debug mode is the best way to get some information on what’s
happening inside your playbooks.

If This Then That

Conditionals are a part of pretty much every programming language. Ansible
YAML files also can take advantage of conditional execution, but the
format is a little wacky. Normally the condition comes
first, and then if it evaluates as true, the following code executes. With
Ansible, it’s a little backward. The task is completely spelled out, then
a when statement is added at the end. It makes the code very readable,
but as someone who’s been using if/then mentality his entire career,
it feels funny. Here’s a slightly more complicated playbook. See if you
can parse out what would happen in an environment with both Debian/Ubuntu
and Red Hat/CentOS systems:


---

- hosts: webservers
  become: yes
  tasks:
    - name: install apache this way
      apt: name=apache2 update_cache=yes state=latest
      notify: start apache2
      when: ansible_os_family == "Debian"

    - name: install apache that way
      yum: name=httpd state=latest
      notify: start httpd
      when: ansible_os_family == "RedHat"

  handlers:
    - name: start apache2
      service: name=apache2 enabled=yes state=started

    - name: start httpd
      service: name=httpd enabled=yes state=started

Hopefully the YAML format makes that fairly easy to read. Basically,
it’s a playbook that will install Apache on hosts using either yum or
apt based on which type of distro they have installed. Then handlers
make sure the newly installed packages are enabled and running.

It’s easy to see how useful a combination of gathered facts and
conditional statements can be. Thankfully, Ansible doesn’t stop there. As
with other configuration management systems, it includes most features
of programming and scripting languages. For example, there are loops.

Play It Again, Sam

If there is one thing Ansible does well, it’s
loops. Quite frankly,
it supports so many different sorts of loops, I can’t cover them
all here. The best way to figure out the perfect sort of loop for
your situation is to read the Ansible
documentation
directly.

For simple lists, playbooks use a familiar, easy-to-read method for
doing multiple tasks. For example:


---

- hosts: webservers
  become: yes

  tasks:
    - name: install a bunch of stuff
      apt: "name={{ item }} state=latest update_cache=yes"
      with_items:
        - apache2
        - vim
        - chromium-browser

This simple playbook will install multiple packages using the apt
module. Note the special variable named item, which is replaced with
the items one at a time in the with_items section. Again, this is
pretty easy to understand and utilize in your own playbooks. Other
loops work in similar ways, but they’re formatted differently. Just check
out the documentation for the wide variety of ways Ansible can repeat
similar tasks.

Templates

One last module I find myself using often is the template module. If
you’ve ever used mail merge in a word processor, templating works
similarly. Basically, you create a text file and then use variable
substitution to create a custom version on the fly. I most often do
this for creating HTML files or config files. Ansible uses the Jinja2
templating language, which is conveniently similar to standard variable
substitution in playbooks themselves. The example I almost always use
is a custom HTML file that can be installed on a remote batch of web
servers. Let’s look at a fairly complex playbook and an accompanying HTML
template file.

Here’s the playbook:


---

- hosts: webservers
  become: yes

  tasks:
   - name: install apache2
     apt: name=apache2 state=latest update_cache=yes
     when: ansible_os_family == "Debian"

   - name: install httpd
     yum: name=httpd state=latest
     when: ansible_os_family == "RedHat"

   - name: start apache2
     service: name=apache2 state=started enable=yes
     when: ansible_os_family == "Debian"

   - name: start httpd
     service: name=httpd state=started enable=yes
     when: ansible_os_family == "RedHat

   - name: install index
     template:
       src: index.html.j2
       dest: /var/www/html/index.html

Here’s the template file, which must end in .j2 (it’s the file referenced
in the last task above):



This computer is running {{ ansible_os_family }}, and its hostname is:

{{ ansible_hostname }}

{# this is a comment, which won't be copied to the index.html file #}

This also should be fairly easy to understand. The playbook takes a
few different things it learned and installs Apache on the remote
systems, regardless of whether they are Red Hat- or Debian-based. Then, it
starts the web servers and makes sure the web server starts on system
boot. Finally, the playbook takes the template file, index.html.j2,
and substitutes the variables while copying the file to the remote
system. Note the {# #} format for making comments. Those comments are
completely erased on the remote system and are visible only in the .j2
file on the Ansible machine.

The Sky Is the Limit!

I’ll finish up this series in my next article, where I plan to cover how to build
on your
playbook knowledge to create entire roles and take advantage of the
community contributions available. Ansible is a very powerful tool that
is surprisingly simple to understand and use. If you’ve been experimenting
with ad-hoc commands, I encourage you to create playbooks
that will allow you to do multiple tasks on a multitude of computers with
minimal effort. At the very least, play around with the “Facts” gathered
by the ansible-playbook app, because those are things unavailable to
the ad-hoc mode of Ansible. Until next time, learn, experiment, play
and have fun!

Resources

Part I: “Ansible: the Automation Framework That Thinks Like a Sysadmin” by Shawn Powers

Part II: “Ansible: Making Things Happen” by Shawn Powers

Read More

Learning IT Fundamentals

Learning IT Fundamentals

Image

Kyle Rankin
Sat, 02/17/2018 – 08:42

Where do IT fundamentals fit in our modern, cloud- and abstraction-driven
engineering culture?

I was recently discussing the Sysadmin/DevOps/IT industry with a colleague,
and we started marveling at just how few of the skills we learned when we
were starting out are actually needed today. It seems like every year a
tool, abstraction layer or service makes it so you no longer need to know
how this or that technology works. Why compile from source when all of the
software you could want is prepackaged, tested and ready to install? Why
figure out how a database works when you can just point to a pre-configured
database service? Why troubleshoot a malfunctioning Linux server when you
can nuke it from orbit and spawn a new one and hope the problem goes away?

This is not to say that automation is bad or that abstractions are bad. When
you automate repetitive tasks and make complex tasks easier, you end up
being able to accomplish more with a smaller and more junior team. I’m
perfectly happy to take a tested and validated upstream kernel from my
distribution instead of spending hours making the same thing and hoping I
remembered to include all of the right modules. Have you ever compiled a
modern web browser? It’s not fun. It’s handy being able to automate myself
out of jobs using centralized configuration management tools.

As my colleague and I were discussing the good old days, what worried us
wasn’t that modern technology made things easier or that past methods
obsolete—learning new things is what drew us to this career in the first
place—but that in many ways, modern technology has obscured so much of
what’s going on under the hood, we found ourselves struggling to think
of how we’d advise someone new to the industry to approach a modern
career in IT. The kind of opportunities for on-the-job training that taught
us the fundamentals of how computers, networks and Linux worked are
becoming rarer and rarer, if they exist at all.

My story into IT mirrors many of my colleagues who started their careers
somewhere between the mid-1990s and early 2000s. I started out in a kind of
hybrid IT and sysadmin jack-of-all-trades position for a small business. I
did everything from installing and troubleshooting Windows desktops to
setting up
Linux file and web servers to running and crimping network wires. I also ran a
Linux desktop, and in those days, it hid very little of the underpinnings from
you, so you were instantly exposed to networking, software and hardware
fundamentals whether you wanted them or not.

Being exposed to and responsible for all of that technology as “the computer
guy”, you learn pretty quickly that you just have to dive in and figure out
how things work to fix them. It was that experience that cemented the Linux
sysadmin and networking skills I continued to develop as I transitioned away
from the help desk into a full-time Linux sysadmin. Yet these days, small
businesses are more likely to farm out most of their IT functions to the cloud,
and sysadmins truly may not need to know almost anything about how Linux or
networking works to manage Linux servers (and might even manage them from a
Mac). So how do they learn what’s going on under the hood?

This phenomenon isn’t limited to IT. Modern artists, writers and musicians
also are often unschooled in the history and unskilled in the fundamentals
of their craft. While careers in science still seem to stress a deep
understanding of everything that has come before, in so many other fields, it
seems we are content to skip that part of the lesson and just focus on
what’s new. The problem when it comes to IT, however, isn’t that you need to
understand the fundamentals to get a good job—you don’t—but when something
goes wrong, without understanding what’s happening behind the scenes at
least to some degree, it’s almost impossible to troubleshoot. When you can’t
fix the problem yourself, you are left rebooting, respawning or calling your
vendor’s support line. Without knowing about the technologies of the past
and their features and failings, you are more likely to repeat their
mistakes when someone new to the industry convinces you they just invented
them.

Fortunately the openness of Linux still provides us with one way out of this
problem. Although you can use modern Linux desktops and servers without knowing
almost anything about how computers, networks or Linux itself works, unlike
with other systems, Linux still will show you everything that’s going on
behind the scenes if you are willing to look. You can set up complex
networks of Linux servers running the same services that power the
internet—all for free (and with the power of virtualization, all from a
single machine). For the budding engineer who is willing to dive deep into
Linux, you will have superior knowledge and an edge over all of your peers.

Read More

Ansible: Making Things Happen

Ansible: Making Things Happen

Image

Shawn Powers
Tue, 01/30/2018 – 10:07

Finally, an automation framework that thinks like a sysadmin. Ansible,
you’re hired.

In my last article, I described how to configure your server and clients so
you could
connect to each client from the server. Ansible is a push-based automation
tool, so the connection is initiated from your “server”, which is usually
just a workstation or a server you ssh in to from
your workstation. In this article, I explain
how modules work and how you can use Ansible
in ad-hoc mode from the command line.

Ansible is supposed to make your job easier, so the first thing you
need to learn is how to do familiar tasks. For most sysadmins, that
means some simple command-line work. Ansible has a few quirks when it
comes to command-line utilities, but it’s worth learning the nuances,
because it makes for a powerful system.

Command Module

This is the safest module to execute remote commands on the client
machine. As with most Ansible modules, it requires Python to be installed
on the client, but that’s it. When Ansible executes commands using the
Command Module, it does not process those commands through the user’s
shell. This means some variables like $HOME are not available. It also
means stream functions (redirects, pipes) don’t work. If you don’t
need to redirect output or to reference the user’s home directory as
a shell variable, the Command Module is what you want to use. To invoke
the Command Module in ad-hoc mode, do something like this:


ansible host_or_groupname -m command -a "whoami"

Your output should show SUCCESS for each host referenced and then return
the user name that the user used to log in. You’ll notice that the user is not
root, unless that’s the user you used to connect to the client computer.

If
you want to see the elevated user, you’ll add another argument to the
ansible command. You can add
-b in order to “become” the elevated user
(or the sudo user). So, if you were to run the same command as above with a
“-b” flag:


ansible host_or_groupname -b -m command -a "whoami"

you should see a similar result, but the whoami
results should say root
instead of the user you used to connect. That flag is important to use,
especially if you try to run remote commands that require root access!

Shell Module

There’s nothing wrong with using the Shell Module to execute remote
commands. It’s just important to know that since it uses the remote
user’s environment, if there’s something goofy with the user’s account,
it might cause problems that the Command Module avoids. If you use the
Shell Module, however, you’re able to use redirects and pipes. You can
use the whoami example to see the difference. This command:


ansible host_or_groupname -m command -a "whoami > myname.txt"

should result in an error about > not being a valid argument. Since
the Command Module doesn’t run inside any shell, it interprets the
greater-than character as something you’re trying to pass to the
whoami
command. If you use the Shell Module, however, you have no problems:


ansible host_or_groupname -m shell -a "whom > myname.txt"

This should execute and give you a SUCCESS message for each host,
but there should be nothing returned as output. On the remote machine,
however, there should be a file called myname.txt in the user’s home
directory that contains the name of the user. My personal policy is to
use the Command Module whenever possible and to use the Shell Module
if needed.

The Raw Module

Functionally, the Raw Module works like the Shell Module. The key
difference is that Ansible doesn’t do any error checking, and
STDERR,
STDOUT and Return Code is returned. Other than that, Ansible has no idea
what happens, because it just executes the command over SSH directly. So
while the Shell Module will use /bin/sh by default, the Raw Module just
uses whatever the user’s personal default shell might be.

Why would a person decide to use the Raw
Module? It doesn’t require
Python on the remote computer—at all. Although it’s true that most servers
have Python installed by default, or easily could have it installed,
many embedded devices don’t and can’t have Python installed. For most
configuration management tools, not having an agent program installed
means the remote device can’t be managed. With Ansible, if all you have is
SSH, you still can execute remote commands using the Raw Module. I’ve used
the Raw Module to manage Bitcoin miners that have a very minimal embedded
environment. It’s a powerful tool, and when you need it, it’s invaluable!

Copy Module

Although it’s certainly possible to do file and folder manipulation with
the Command and Shell Modules, Ansible includes a module specifically
for copying files to the server. Even though it requires learning a new
syntax for copying files, I like to use it because Ansible will check
to see whether a file exists, and whether it’s the same file. That means it
copies the file only if it needs to, saving time and bandwidth. It even
will make backups of existing files! I can’t tell you how many times I’ve
used scp and sshpass in a
Bash FOR loop and dumped files on servers,
even if they didn’t need them. Ansible makes it easy and doesn’t
require FOR loops and IP iterations.

The syntax is a little more complicated than with Command, Shell or
Raw. Thankfully, as with most things in the Ansible world, it’s easy to
understand—for example:


ansible host_or_groupname -b -m copy 
    -a "src=./updated.conf dest=/etc/ntp.conf 
        owner=root group=root mode=0644 backup=yes"

This will look in the current directory (on the Ansible
server/workstation) for a file called updated.conf and then copy it to
each host. On the remote system, the file will be put in /etc/ntp.conf,
and if a file already exists, and it’s different, the original will
be backed up with a date extension. If the files are the same, Ansible
won’t make any changes.

I tend to use the Copy Module when updating configuration files. It
would be perfect for updating configuration files on Bitcoin miners,
but unfortunately, the Copy Module does require that the remote machine has
Python installed. Nevertheless, it’s a great way to update common files
on many remote machines with one simple command. It’s also important
to note that the Copy Module supports copying remote files to other
locations on the remote filesystem using the
remote_src=true directive.

File Module

The File Module has a lot in common with the Copy Module, but if you try
to use the File Module to copy a file, it doesn’t work as expected. The
File Module does all its actions on the remote machine, so
src and dest
are all references to the remote filesystem. The File Module often
is used for creating directories, creating links or deleting remote files
and folders. The following will simply create a folder named /etc/newfolder
on the remote
servers and set the mode:


ansible host_or_groupname -b -m file 
       -a "path=/etc/newfolder state=directory mode=0755"

You can, of course, set the owner and group,
along with a bunch of other options, which you can learn about on the
Ansible doc site. I find I most often will either create a folder
or symbolically link a file using the File Module. To create a symlink:


sensible host_or_groupname -b -m file 
         -a "src=/etc/ntp.conf dest=/home/user/ntp.conf 
             owner=user group=user state=link"

Notice that the state directive is how you inform
Ansible what you actually want to
do. There are several state options:

  • link — create symlink.

  • directory — create directory.

  • hard — create hardlink.

  • touch — create empty file.

  • absent — delete file or directory recursively.

This might seem a bit complicated, especially when you easily could do
the same with a Command or Shell Module command, but the clarity of using
the appropriate module makes it more difficult to make mistakes. Plus,
learning these commands in ad-hoc mode will make playbooks, which consist
of many commands, easier to understand (I plan to cover this in my next
article).

File Management

Anyone who manages multiple distributions knows it can be tricky to
handle the various package managers. Ansible handles this in a couple
ways. There are specific modules for apt and yum, but there’s also a
generic module called “package” that will install on the remote computer
regardless of whether it’s Red Hat- or Debian/Ubuntu-based.

Unfortunately, while Ansible usually can detect the type of package
manager it needs to use, it doesn’t have a way to fix packages with
different names. One prime example is Apache. On Red Hat-based systems,
the package is “httpd”, but on Debian/Ubuntu systems, it’s
“apache2”. That
means some more complex things need to happen in order to
install the correct package automatically. The individual modules, however, are very
easy to use. I find myself just using apt or yum as appropriate, just
like when I manually manage servers. Here’s an apt example:


ansible host_or_groupname -b -m apt 
          -a "update_cache=yes name=apache2 state=latest"

With this one simple line, all the host machines will run
apt-get update
(that’s the update_cache directive at work), then install apache2’s latest
version including any dependencies required. Much like the File Module,
the state directive has a few options:

  • latest — get the latest version, upgrading
    existing if needed.

  • absent — remove package if installed.

  • present — make sure package is installed, but
    don’t upgrade existing.

The Yum Module works similarly to the Apt Module, but I generally
don’t bother with the update_cache directive, because yum updates
automatically. Although very similar, installing Apache on a Red Hat-based
system looks like this:


ansible host_or_groupname -b -m yum 
      -a "name=httpd state=present"

The difference with this example is that if Apache is already installed,
it won’t update, even if an update is available. Sometimes updating to
the latest version isn’t want you want, so this stops that from accidentally
happening.

Just the Facts, Ma’am

One frustrating thing about using Ansible in ad-hoc mode is that you don’t
have access to the “facts” about the remote systems. In my
next article, where I plan to explore
creating playbooks full of various tasks, you’ll see how you can reference the facts Ansible
learns about the systems. It makes Ansible far more powerful, but again,
it can be utilized only in playbook mode. Nevertheless, it’s possible to use
ad-hoc mode to peek at the sorts information Ansible gathers. If you run
the setup module, it will show you all the details from a remote system:


ansible host_or_groupname -b -m setup

That command will spew a ton of variables on your screen. You can
scroll through them all to see the vast amount of information Ansible pulls
from the host machines. In fact, it shows so much information, it can
be overwhelming. You can filter the results:


ansible host_or_groupname -b -m setup -a "filter=*family*"

That should just return a single variable,
ansible_os_family, which
likely will be Debian or Red Hat. When you start building more complex
Ansible setups with playbooks, it’s possible to insert some logic and
conditionals in order to use yum where appropriate and apt where the
system is Debian-based. Really, the facts variables are incredibly useful
and make building playbooks that much more exciting.

But, that’s for another article, because you’ve come to the end of the second
installment. Your assignment for now is to get comfortable using
Ansible in ad-hoc mode, doing one thing at a time. Most people think
ad-hoc mode is just a stepping stone to more complex Ansible setups,
but I disagree. The ability to configure hundreds of servers consistently
and reliably with a single command is nothing to scoff at. I love making
elaborate playbooks, but just as often, I’ll use an ad-hoc command in
a situation that used to require me to ssh in to a bunch of servers to
do simple tasks. Have fun with Ansible; it just gets more
interesting from here!

Read More

Rapid, Secure Patching: Tools and Methods

Rapid, Secure Patching: Tools and Methods

Image

Charles Fisher
Mon, 01/29/2018 – 10:45


Generate enterprise-grade SSH keys and load them into an agent for control
of all kinds of Linux hosts. Script the agent with the Parallel Distributed
Shell (pdsh) to effect rapid changes over your server farm.

Servers, HOW-TOs, Security, SysAdmin

It was with some measure of disbelief that the computer science community
greeted the recent EternalBlue-related exploits that have torn through
massive numbers of vulnerable systems.
The SMB exploits have kept coming
(the most recent being SMBLoris presented at the last DEF CON, which impacts
multiple SMB protocol versions, and for which Microsoft will issue no
corrective patch.
Attacks with these tools incapacitated critical
infrastructure
to the point that patients were even turned away from the British
National Health Service.

It is with considerable sadness that, during this SMB catastrophe, we
also have come to understand that the famous Samba server presented an
exploitable attack surface on the public internet in sufficient numbers for
a worm to propagate successfully. I previously have
discussed SMB security

in Linux Journal, and I am no longer of the opinion that SMB server processes should run on
Linux.

In any case, systems administrators of all architectures must be able to
down vulnerable network servers and patch them quickly. There is often a
need for speed and competence when working with a large collection of Linux
servers. Whether this is due to security situations or other concerns is
immaterial—the hour of greatest need is not the time to begin to build
administration tools. Note that in the event of an active intrusion by
hostile parties, forensic
analysis
may be a legal requirement, and no steps
should be taken on the compromised server without a careful plan and
documentation.
Especially in this new era of the black hats, computer
professionals must step up their game and be able to secure vulnerable
systems quickly.

Secure SSH Keypairs

Tight control of a heterogeneous UNIX environment must begin with
best-practice use of SSH authentication keys. I’m going to open this section with
a simple requirement. SSH private keys must be one of three types: Ed25519,
ECDSA using the E-521 curve or RSA keys of 3072 bits. Any key that does not
meet those requirements should be retired (in particular, DSA keys must be
removed from service immediately).

The Ed25519 key format
is associated with Daniel J. Bernstein, who has such
a preeminent reputation in modern cryptography that the field is becoming a
DJB monoculture.
The Ed25519 format is deigned for speed, security and size
economy. If all of your SSH servers are recent enough to support Ed25519,
then use it, and consider nothing else.

Guidance on
creating Ed25519 keys
suggests 100 rounds for a work factor in
the “-o” secure format.
Raising the number of rounds raises the strength of
the encrypted key against brute-force attacks (should a file copy of the
private key fall into hostile hands), at the cost of more work and time in
decrypting the key when ssh-add is executed. Although there always
is controversy and
discussion
with security advances,
I will repeat the
guidance here and suggest that the best format for a newly created SSH key
is this:


ssh-keygen -a 100 -t ed25519

Your systems might be too old to support Ed25519—Oracle/CentOS/Red Hat 7
have this problem (the 7.1 release introduced support). If you cannot
upgrade your old SSH clients and servers, your next best option is
likely E-521, available in the ECDSA key format.

The ECDSA curves came from the US government’s National Institute of
Standards (NIST). The best known and most implemented of all of the NIST
curves are P-256, P-384 and E-521. All three curves are approved for secret
communications by a variety of government entities, but a number of
cryptographers have expressed growing
suspicion
that the P-256 and P-384
curves are tainted.
Well known
cryptographer Bruce Schneier has remarked:
“I
no longer trust the constants. I believe the NSA has manipulated them
through their relationships with industry.” However, DJB has expressed
limited praise of the E-521 curve:
“To be fair I should mention that there’s
one standard NIST curve using a nice prime, namely
2521 – 1; but the sheer
size of this prime makes it much slower than NIST P-256.” All of the NIST
curves have greater issues with “side channel” attacks than
Ed25519—P-521
is certainly a step down, and many assert that none of the NIST curves are
safe. In summary, there is a slight risk that a powerful adversary exists
with an advantage over the P-256 and P-384 curves, so one is
slightly inclined to avoid them. Note that even if your OpenSSH (source)
release is capable of E-521, it may be disabled by your vendor due to patent
concerns,
so E-521 is not an option in this case. If you cannot use DJB’s
2255 – 19 curve, this command will generate an E-521 key on a capable
system:


ssh-keygen -o -a 100 -b 521 -t ecdsa

And, then there is the unfortunate circumstance with SSH servers that
support neither ECDSA nor Ed25519. In this case, you must fall back to RSA
with much larger key sizes. An absolute minimum is the modern default of
2048 bits, but 3072 is a wiser choice:


ssh-keygen -o -a 100 -b 3072 -t rsa

Then in the most lamentable case of all, when you must use old SSH clients
that are not able to work with private keys created with the
-o option,
you can remove the password on id_rsa and create a naked key, then use
OpenSSL to encrypt it with AES256 in the PKCS#8 format, as first documented
by Martin Kleppmann
.
Provide a blank new password for the keygen utility
below, then supply a new password when OpenSSL reprocesses the key:


$ cd ~/.ssh

$ cp id_rsa id_rsa-orig

$ ssh-keygen -p -t rsa
Enter file in which the key is (/home/cfisher/.ssh/id_rsa):
Enter old passphrase:
Key has comment 'cfisher@localhost.localdomain'
Enter new passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved with the new passphrase.

$ openssl pkcs8 -topk8 -v2 aes256 -in id_rsa -out id_rsa-strong
Enter Encryption Password:
Verifying - Enter Encryption Password:

mv id_rsa-strong id_rsa
chmod 600 id_rsa

After creating all of these keys on a newer system, you can compare the file
sizes:


$ ll .ssh
total 32
-rw-------. 1 cfisher cfisher  801 Aug 10 21:30 id_ecdsa
-rw-r--r--. 1 cfisher cfisher  283 Aug 10 21:30 id_ecdsa.pub
-rw-------. 1 cfisher cfisher  464 Aug 10 20:49 id_ed25519
-rw-r--r--. 1 cfisher cfisher  111 Aug 10 20:49 id_ed25519.pub
-rw-------. 1 cfisher cfisher 2638 Aug 10 21:45 id_rsa
-rw-------. 1 cfisher cfisher 2675 Aug 10 21:42 id_rsa-orig
-rw-r--r--. 1 cfisher cfisher  583 Aug 10 21:42 id_rsa.pub

Although they are relatively enormous, all versions of OpenSSH that I have used
have been compatible with the RSA private key in PKCS#8 format. The Ed25519
public key is now small enough to fit in 80 columns without word wrap, and
it is as convenient as it is efficient and secure.

Note that PuTTY may have problems using various versions of these keys, and
you may need to remove passwords for a successful import into the PuTTY
agent.

These keys represent the most secure formats available for various OpenSSH
revisions. They really aren’t intended for PuTTY or other general
interactive activity. Although one hopes that all users create strong keys for all
situations, these are enterprise-class keys for major systems activities. It
might be wise, however, to regenerate your system host keys to conform to
these guidelines.

These key formats may soon change. Quantum computers are causing increasing
concern for their ability to run Shor’s Algorithm,
which can be used to find
prime factors to break these keys in reasonable time. The largest
commercially available quantum computer, the D-Wave 2000Q,
effectively
presents
under 200 qubits
for this activity,
which is not (yet) powerful
enough for a successful attack. NIST announced
a competition
for a new
quantum-resistant public key system with a deadline of November 2017
In
response, a team including DJB has released source code for NTRU Prime.
It
does appear that we will likely see a post-quantum public key format for
OpenSSH (and potentially TLS 1.3) released within the next two years, so
take steps to ease migration now.

Also, it’s important for SSH servers to restrict their allowed ciphers,
MACs and key exchange lest strong keys be wasted on broken crypto (3DES,
MD5 and arcfour should be long-disabled). My previous
guidance
on the
subject
involved the following (three) lines in the SSH client and server
configuration (note that formatting in the sshd_config file requires all
parameters on the same line with no spaces in the options; line breaks have
been added here for clarity):


Ciphers chacha20-poly1305@openssh.com,
        aes256-gcm@openssh.com,
        aes128-gcm@openssh.com,
        aes256-ctr,
        aes192-ctr,
        aes128-ctr

MACs    hmac-sha2-512-etm@openssh.com,
        hmac-sha2-256-etm@openssh.com,
        hmac-ripemd160-etm@openssh.com,
        umac-128-etm@openssh.com,
        hmac-sha2-512,
        hmac-sha2-256,
        hmac-ripemd160,
        umac-128@openssh.com

KexAlgorithms curve25519-sha256@libssh.org,
              diffie-hellman-group-exchange-sha256

Since the previous publication, RIPEMD160 is likely no longer safe and
should be removed. Older systems, however, may support only SHA1, MD5 and
RIPEMD160. Certainly remove MD5, but users of PuTTY likely will want to
retain SHA1 when newer MACs are not an option. Older servers can present a
challenge in finding a reasonable Cipher/MAC/KEX when working with modern
systems.

At this point, you should have strong keys for secure clients and servers.
Now let’s put them to use.

Scripting the SSH Agent

Modern OpenSSH distributions contain the ssh-copy-id shell script for easy
key distribution. Below is an example of installing a specific, named key in
a remote account:


$ ssh-copy-id -i ~/.ssh/some_key.pub person@yourserver.com
ssh-copy-id: INFO: Source of key(s) to be installed:
   "/home/cfisher/.ssh/some_key.pub"
ssh-copy-id: INFO: attempting to log in with the new key(s),
   to filter out any that are already installed
ssh-copy-id: INFO: 1 key(s) remain to be installed --
   if you are prompted now it is to install the new keys
person@yourserver.com's password:

Number of key(s) added: 1

Now try logging into the machine, with:
   "ssh 'person@yourserver.com'"
and check to make sure that only the key(s) you wanted were added.

If you don’t have the ssh-copy-id script, you can install a key
manually with the following command:


$ ssh person@yourserver.com 'cat >> ~/.ssh/authorized_keys' < 
      ~/.ssh/some_key.pub

If you have SELinux enabled, you might have to mark a newly created
authorized_keys file with a security type; otherwise, the sshd server
dæmon
will be prevented from reading the key (the syslog may report this issue):


$ ssh person@yourserver.com 'chcon -t ssh_home_t
 ↪~/.ssh/authorized_keys'

Once your key is installed, test it in a one-time use with the
-i option
(note that you are entering a local key password, not a remote
authentication password):


$ ssh -i ~/.ssh/some_key person@yourserver.com
Enter passphrase for key '/home/v-fishecj/.ssh/some_key':
Last login: Wed Aug 16 12:20:26 2017 from 10.58.17.14
yourserver $

General, interactive users likely will cache their keys with an agent. In
the example below, the same password is used on all three types of keys that
were created in the previous section:


$ eval $(ssh-agent)
Agent pid 4394

$ ssh-add
Enter passphrase for /home/cfisher/.ssh/id_rsa:
Identity added: ~cfisher/.ssh/id_rsa (~cfisher/.ssh/id_rsa)
Identity added: ~cfisher/.ssh/id_ecdsa (cfisher@init.com)
Identity added: ~cfisher/.ssh/id_ed25519 (cfisher@init.com)

The first command above launches a user agent process, which injects
environment variables (named SSH_AGENT_SOCK and
SSH_AGENT_PID) into the
parent shell (via eval). The shell becomes aware of the agent and passes
these variables to the programs that it runs from that point forward.

When launched, the ssh-agent has no credentials and is unable to facilitate
SSH activity. It must be primed by adding keys, which is done with
ssh-add.
When called with no arguments, all of the default keys will be read. It
also can be called to add a custom key:


$ ssh-add  ~/.ssh/some_key
Enter passphrase for /home/cfisher/.ssh/some_key:
Identity added: /home/cfisher/.ssh/some_key
 ↪(cfisher@localhost.localdomain)

Note that the agent will not retain the password on the key.
ssh-add uses
any and all passwords that you enter while it runs to decrypt keys that it
finds, but the passwords are cleared from memory when
ssh-add terminates
(they are not sent to ssh-agent). This allows you to upgrade to new key
formats with minimal inconvenience, while keeping the keys reasonably safe.

The current cached keys can be listed with ssh-add
-l
(from, which you can deduce
that “some_key” is an Ed25519):


$ ssh-add -l
3072 SHA256:cpVFMZ17oO5n/Jfpv2qDNSNcV6ffOVYPV8vVaSm3DDo
     /home/cfisher/.ssh/id_rsa (RSA)
521 SHA256:1L9/CglR7cstr54a600zDrBbcxMj/a3RtcsdjuU61VU
     cfisher@localhost.localdomain (ECDSA)
256 SHA256:Vd21LEM4lixY4rIg3/Ht/w8aoMT+tRzFUR0R32SZIJc
     cfisher@localhost.localdomain (ED25519)
256 SHA256:YsKtUA9Mglas7kqC4RmzO6jd2jxVNCc1OE+usR4bkcc
     cfisher@localhost.localdomain (ED25519)

While a “primed” agent is running, the SSH clients may use (trusting) remote
servers fluidly, with no further prompts for credentials:


$ sftp person@yourserver.com
Connected to yourserver.com.
sftp> quit

$ scp /etc/passwd person@yourserver.com:/tmp
passwd                              100% 2269    65.8KB/s   00:00

$ ssh person@yourserver.com
   (motd for yourserver.com)
$ ls -l /tmp/passwd
-rw-r--r--  1 root  wheel  2269 Aug 16 09:07 /tmp/passwd
$ rm /tmp/passwd
$ exit
Connection to yourserver.com closed.

The OpenSSH agent can be locked, preventing any further use of the
credentials that it holds (this might be appropriate when suspending a
laptop):


$ ssh-add -x
Enter lock password:
Again:
Agent locked.

$ ssh yourserver.com
Enter passphrase for key '/home/cfisher/.ssh/id_rsa': ^C

It will provide credentials again when it is unlocked:


$ ssh-add -X
Enter lock password:
Agent unlocked.

You also can set ssh-agent to expire keys after a
time limit with the -t
option, which may be useful for long-lived agents that must clear keys after
a set daily shift.

General shell users may cache many types of keys with a number of differing
agent implementations. In addition to the standard OpenSSH agent, users may
rely upon PuTTY’s pageant.exe, GNOME keyring or KDE Kwallet, among others
(the use of the PUTTY agent could likely fill an article on its own).

However, the goal here is to create “enterprise” keys for critical server
controls. You likely do not want long-lived agents in order to limit the risk
of exposure. When scripting with “enterprise” keys, you will run an
agent only for the duration of the activity, then kill it at completion.

There are special options for accessing the root account with
OpenSSH—the
PermitRootLogin parameter can be added to the sshd_config file (usually
found in /etc/ssh). It can be set to a simple yes or
no,
forced-commands-only, which will allow only explicitly-authorized programs to
be executed, or the equivalent options
prohibit-password or
without-password,
both of which will allow access to the keys generated here.

Many hold that root should not be allowed any access. Michael W. Lucas
addresses the question in SSH Mastery:

Sometimes, it seems that you need to allow users to SSH in to the system as
root. This is a colossally bad idea in almost all environments. When users
must log in as a regular user and then change to root, the system logs
record the user account, providing accountability. Logging in as root
destroys that audit trail….It is possible to override the security
precautions and make sshd permit a login directly as root. It’s such a bad
idea that I’d consider myself guilty of malpractice if I told you how to do
it. Logging in as root via SSH almost always means you’re solving the wrong
problem. Step back and look for other ways to accomplish your goal.

When root action is required quickly on more than a few servers, the above
advice can impose painful delays. Lucas’ direct criticism can be addressed
by allowing only a limited set of “bastion” servers to issue root commands
over SSH. Administrators should be forced to log in to the bastions with
unprivileged accounts to establish accountability.

However, one problem with remotely “changing to root” is the statistical use
of the Viterbi algorithm

Short passwords, the su - command and remote
SSH calls that use passwords to establish a trinary network configuration
are all uniquely vulnerable to timing attacks on a user’s keyboard
movement.
Those with the highest security concerns will need to compensate.

For the rest of us, I recommend that PermitRootLogin
without-password
be set
for all target machines.

Finally, you can easily terminate ssh-agent
interactively with the -k option:


$ eval $(ssh-agent -k)
Agent pid 4394 killed

With these tools and the intended use of them in mind, here is a
complete script that runs an agent for the duration of a set of commands
over a list of servers for a common named user (which is not necessarily
root):


# cat artano

#!/bin/sh

if [[ $# -lt 1 ]]; then echo "$0 - requires commands"; exit; fi

R="-R5865:127.0.0.1:5865" # set to "-2" if you don't want
 ↪port forwarding

eval $(ssh-agent -s)

function cleanup { eval $(ssh-agent -s -k); }

trap cleanup EXIT

function remsh { typeset F="/tmp/${1}" h="$1" p="$2";
 ↪shift 2; echo "#$h"
 if [[ "$ARTANO" == "PARALLEL" ]]
 then ssh "$R" -p "$p" "$h" "$@" >"${F}.out"
  ↪2>>"${F}.err" &
 else ssh "$R" -p "$p" "$h" "$@"
 fi }    # HOST                                          PORT CMD

if ssh-add ~/.ssh/master_key
then remsh yourserver.com                                  22 "$@"
     remsh container.yourserver.com                      2200 "$@"
     remsh anotherserver.com                               22 "$@"
     # Add more hosts here.
else echo Bad password - killing agent. Try again.
fi

wait

#######################################################################
# Examples:           # Artano is an epithet of a famous mythical being
# artano 'mount /patchdir'      # you will need an fstab entry for this
# artano 'umount /patchdir'
# artano 'yum update -y 2>&1'
# artano 'rpm -Fvh /patchdir/*.rpm'
#######################################################################

This script runs all commands in sequence on a collection of hosts by
default. If the ARTANO environment variable is set
to PARALLEL, it
instead will launch them all as background processes simultaneously and append
their STDOUT and STDERR to files in /tmp (this should be no problem when
dealing with fewer than a hundred hosts on a reasonable server). The
PARALLEL setting is useful not only for pushing changes faster, but also for
collecting audit results.

Below is an example using the yum update agent. The source of this
particular invocation had to traverse a firewall and relied on a proxy
setting in the /etc/yum.conf file, which used the port-forwarding option
(-R)
above:


# ./artano 'yum update -y 2>&1'
Agent pid 3458
Enter passphrase for /root/.ssh/master_key:
Identity added: /root/.ssh/master_key (/root/.ssh/master_key)
#yourserver.com
Loaded plugins: langpacks, ulninfo
No packages marked for update
#container.yourserver.com
Loaded plugins: langpacks, ulninfo
No packages marked for update
#anotherserver.com
Loaded plugins: langpacks, ulninfo
No packages marked for update
Agent pid 3458 killed

The script can be used for more general maintenance functions. Linux
installations running the XFS filesystem should “defrag”
periodically.
Although this normally would be done with cron, it can be a centralized
activity, stored in a separate script that includes only on the appropriate
hosts:


&1'
Agent pid 7897
Enter passphrase for /root/.ssh/master_key:
Identity added: /root/.ssh/master_key (/root/.ssh/master_key)
#yourserver.com
#container.yourserver.com
#anotherserver.com
Agent pid 7897 killed

An easy method to collect the contents of all authorized_keys files for all
users is the following artano script (this is useful for system auditing and is coded to remove file duplicates):


artano 'awk -F: {print$6"/.ssh/authorized_keys"} 
     /etc/passwd | sort -u | xargs grep . 2> /dev/null'

It is convenient to configure NFS mounts for file distribution to remote
nodes. Bear in mind that NFS is clear text, and sensitive content
should not traverse untrusted networks while unencrypted. After configuring
an NFS server on host 1.2.3.4, I add the following line to the /etc/fstab
file on all the clients and create the /patchdir directory. After the
change, the artano script can be used to mass-mount the directory if the
network configuration is correct:


# tail -1 /etc/fstab
1.2.3.4:/var/cache/yum/x86_64/7Server/ol7_latest/packages
 ↪/patchdir nfs4 noauto,proto=tcp,port=2049 0 0

Assuming that the NFS server is mounted, RPMs can be upgraded from images
stored upon it (note that Oracle Spacewalk or Red Hat Satellite might be a
more capable patch method):


# ./artano 'rpm -Fvh /patchdir/*.rpm'
Agent pid 3203
Enter passphrase for /root/.ssh/master_key:
Identity added: /root/.ssh/master_key (/root/.ssh/master_key)
#yourserver.com
Preparing...                          ########################
Updating / installing...
xmlsec1-1.2.20-7.el7_4                ########################
xmlsec1-openssl-1.2.20-7.el7_4        ########################
Cleaning up / removing...
xmlsec1-openssl-1.2.20-5.el7          ########################
xmlsec1-1.2.20-5.el7                  ########################
#container.yourserver.com
Preparing...                          ########################
Updating / installing...
xmlsec1-1.2.20-7.el7_4                ########################
xmlsec1-openssl-1.2.20-7.el7_4        ########################
Cleaning up / removing...
xmlsec1-openssl-1.2.20-5.el7          ########################
xmlsec1-1.2.20-5.el7                  ########################
#anotherserver.com
Preparing...                          ########################
Updating / installing...
xmlsec1-1.2.20-7.el7_4                ########################
xmlsec1-openssl-1.2.20-7.el7_4        ########################
Cleaning up / removing...
xmlsec1-openssl-1.2.20-5.el7          ########################
xmlsec1-1.2.20-5.el7                  ########################
Agent pid 3203 killed

I am assuming that my audience is already experienced with package tools for
their preferred platforms. However, to avoid criticism that I’ve included
little actual discussion of patch tools, the following is a quick reference of RPM
manipulation commands, which is the most common package format on enterprise
systems:

  • rpm -Uvh package.i686.rpm — install or upgrade a package file.

  • rpm -Fvh package.i686.rpm — upgrade a package file, if an older version is installed.

  • rpm -e package — remove an
    installed package.

  • rpm -q package — list installed package name and version.

  • rpm -q --changelog package
    — print full changelog for installed package
    (including CVEs).

  • rpm -qa — list all installed packages on the
    system.

  • rpm -ql package — list all
    files in an installed package.

  • rpm -qpl package.i686.rpm — list files
    included in a package file.

  • rpm -qi package — print
    detailed description of installed package.

  • rpm -qpi package — print
    detailed description of package file.

  • rpm -qf /path/to/file — list package
    that installed a particular file.

  • rpm --rebuild package.src.rpm
    — unpack and build a binary RPM under
    /usr/src/redhat.

  • rpm2cpio package.src.rpm | cpio
    -icduv
    — unpack all package files in the
    current directory.

Another important consideration for scripting the SSH agent is limiting the
capability of an authorized key. There is a specific
syntax
for such
limitations
Of
particular interest is the from="" clause, which will
restrict logins on a key to a limited set of hosts. It is likely wise to
declare a set of “bastion” servers that will record non-root logins that
escalate into controlled users who make use of the enterprise keys.

An example entry might be the following (note that I’ve broken this line, which is not
allowed syntax but done here for clarity):


from="*.c2.security.yourcompany.com,4.3.2.1" ssh-ed25519
 ↪AAAAC3NzaC1lZDI1NTE5AAAAIJSSazJz6A5x6fTcDFIji1X+
↪svesidBonQvuDKsxo1Mx

A number of other useful restraints can be placed upon
authorized_keys entries. The
command="" will restrict a key to a single
program or script and will set the
SSH_ORIGINAL_COMMAND environment
variable to the client’s attempted call—scripts can set alarms if the
variable does not contain approved contents. The
restrict option also
is worth consideration, as it disables a large set of SSH features that can be
both superfluous and dangerous.

Although it is possible to set server identification keys in the known_hosts
file to a @revoked status, this cannot be done with the contents of
authorized_keys. However, a system-wide file for forbidden keys can be set
in the sshd_config with RevokedKeys. This file overrides any user’s
authorized_keys. If set, this file must exist and be readable by the sshd
server process; otherwise, no keys will be accepted at all (so use care if
you configure it on a machine where there are obstacles to physical access).
When this option is set, use the artano script to append forbidden
keys to the file quickly when they should be disallowed from the network. A clear
and convenient file location would be /etc/ssh/revoked_keys.

It is also possible to establish a local Certificate Authority (CA) for OpenSSH
that will allow keys to be registered
with an authority
with expiration
dates.
These CAs can become
quite elaborate
in their control over an
enterprise.
Although the maintenance of an SSH CA is beyond the scope of this
article, keys issued by such CAs should be strong by adhering to the
requirements for Ed25519/E-521/RSA-3072.

pdsh

Many higher-level tools for the control of collections of servers exist
that are much more sophisticated than the script I’ve presented here.
The most famous is likely Puppet,
which is a Ruby-based configuration
management system for enterprise control. Puppet has a somewhat short list
of supported operating systems. If you are looking for low-level control of
Android, Tomato, Linux smart terminals or other “exotic” POSIX, Puppet
is likely not the appropriate tool. Another popular Ruby-based tool is Chef,
which is known for its complexity. Both Puppet and Chef require Ruby
installations on both clients and servers, and they both will catalog any
SSH keys that they find, so this key strength discussion is completely
applicable to them.

There are several similar Python-based tools, including Ansible,
Bcfg2,
Fabric
and SaltStack.
Of these, only Ansible can run
“agentless” over a
bare SSH connection; the rest will require agents that run on target nodes
(and this likely includes a Python runtime).

Another popular configuration management tool is CFEngine,
which is coded in
C and claims very high performance. Rudder has evolved from portions of
CFEngine and has a small but growing user community.

Most of the previously mentioned packages are licensed commercially and
some are closed source.

The closest low-level tool to the activities presented here is the Parallel
Distributed Shell (pdsh), which can be found in the EPEL repository.
The
pdsh utilities grew out of an IBM-developed package named dsh designed for
the control of compute clusters. Install the following packages from the
repository to use pdsh:


# rpm -qa | grep pdsh
pdsh-2.31-1.el7.x86_64
pdsh-rcmd-ssh-2.31-1.el7.x86_64

An SSH agent must be running while using pdsh with encrypted keys, and there
is no obvious way to control the destination port on a per-host basis as was
done with the artano script. Below is an example using pdsh to run a command
on three remote servers:


# eval $(ssh-agent)
Agent pid 17106

# ssh-add  ~/.ssh/master_key
Enter passphrase for /root/.ssh/master_key:
Identity added: /root/.ssh/master_key (/root/.ssh/master_key)

# pdsh -w hosta.com,hostb.com,hostc.com uptime
hosta: 13:24:49 up 13 days,  2:13, 6 users, load avg: 0.00, 0.01, 0.05
hostb: 13:24:49 up  7 days, 21:15, 5 users, load avg: 0.05, 0.04, 0.05
hostc: 13:24:49 up  9 days,  3:26, 3 users, load avg: 0.00, 0.01, 0.05

# eval $(ssh-agent -k)
Agent pid 17106 killed

The -w option above defines a host list. It allows for limited arithmetic
expansion and can take the list of hosts from standard input if the
argument is a dash (-). The PDSH_SSH_ARGS and
PDSH_SSH_ARGS_APPEND
environment variables can be used to pass custom options to the SSH call. By
default, 32 sessions will be launched in parallel, and this “fanout/sliding
window” will be maintained by launching new host invocations as existing
connections complete and close. You can adjust the size of the “fanout”
either with the -f option or the
FANOUT environment variable. It’s interesting to
note that
there are two file copy commands: pdcp and
rpdcp, which are analogous to
scp.

Even a low-level utility like pdsh lacks some flexibility that is available
by scripting OpenSSH, so prepare to feel even greater constraints as more
complicated tools are introduced.

Conclusion

Modern Linux touches us in many ways on diverse platforms. When the security
of these systems is not maintained, others also may touch our platforms
and turn them against us. It is important to realize the maintenance
obligations when you add any Linux platform to your environment. This
obligation always exists, and there are consequences when it is not met.

In a security emergency, simple, open and well understood tools are best.
As tool complexity increases, platform portability certainly declines, the
number of competent administrators also falls, and this likely impacts speed
of execution. This may be a reasonable trade in many other aspects, but in a
security context, it demands a much more careful analysis. Emergency measures
must be documented and understood by a wider audience than is required for
normal operations, and using more general tools facilitates that discussion.

I hope the techniques presented here will prompt that discussion for
those who have not yet faced it.

Disclaimer

The views and opinions expressed in this article are those of
the author and do not necessarily reflect those of Linux
Journal
.

Note:

An exploit compromising
Ed25519
was recently demonstrated that relies upon
custom hardware changes to derive a usable portion of a secret key.
Physical hardware security is a basic requirement for encryption integrity,
and many common algorithms are further vulnerable to cache timing or other
side channel attacks that can be performed by the unprivileged processes of
other users. Use caution when granting access to systems that process
sensitive data.

Read More