Ansible : the basics - Simple IT Automation

mail

Ansible best practices

Table of contents

no manual change

If you change anything without letting Ansible know (for example editing a configuration file manually), it is very likely that this change will be overwritten by the next Ansible execution.

idempotence

There are 2 ways to consider idempotence. There is no good / bad method —both have their pros and cons— but you must make your choice early in the project and stick to it.

method 1 : idempotence is paramount

  • you can (almost) read the PLAY RECAP only :
    • nothing changes if it doesn't have to
    • you know what to expect (where + when changes occur) when altering the code of your playbooks
  • gives you confidence that you're actually controlling machines with Ansible, not only "doing actions" on them
  • this comes at the cost of added complexity (especially with shell and command modules)

method 2 : we don't care much about idempotence (i.e. every playbook execution causes changes)

  • when used like this, Ansible is a distributed scripting language (like shell scripting on steroids which handles the SSH part for you). This is still great and extremely powerful
  • on the cons side, you never know whether everything is going extremely well (i.e. as described in your code) since all executions cause changes. This means you have to analyze the execution log to find out

the check mode

The execution in check mode :
  • must NOT make any change to the slaves
    • except if a very specific change is mandatory to let a very specific task be executed afterwards during the check mode execution. In such case, use check_mode: no.
    • if the exception above causes a truckload of check_mode: no, read these rules again : you may be trying to do too much in check mode
  • must NOT be used to setup / check / validate commands (especially shell and command ones) or tasks. These are expected to be :
    • skipped safely during the check mode
    • executed normally otherwise

modules

Use modules as much as possible instead of re-coding things yourself :
  • this is quicker + cleaner + more readable + more maintainable
  • guarantees idempotence
  • avoids questionable hacks

variables

Don't over-use set_fact to declare numerous internal / intermediary variables :
  • a set_fact is a regular task, which implies :
    • 1 more task in a play
    • some execution time + CPU + RAM
    • a log entry
  • it is possible to pipe data along : at the expense of readability + complexity
  • if you need to set some variables, use :

includes

  • include_role + task_from or include_tasks :
    • is similar to a goto (i.e. : ugly and cheating )
    • is often the consequence of trying to loop over a group of tasks (which can't be done with a block)
    • _may_ be avoided (sometimes). Check if :
      • there's a module already doing what you're about to code (have I already said to use modules ?)
      • one of the tasks / commands you use in your included code accepts lists / dicts as input : this can save an explicit loop and make the whole include unnecessary
  • regarding includes in general (this is not Ansible-specific), there are different approaches :
    • not-my-approach : some people consider code should be placed in a dedicated file and included only if it's included at least twice. Otherwise :
      • it "costs" 1 include during execution
      • the developer has to open 1 more file
      • _may_ decrease readability
    • my-approach :
      • I think it is worth grouping tasks
        • that belong together
        • for which you don't need to know the very detail
        while working on the calling file. Indeed, code stating :
        include configureDatabase
        include addDbUsers
        is explicit enough and doesn't require opening extra files
      • too much nested blocks of code (hence too much indents) decreases readability, which is why grouping + moving code in distinct files helps
      • the logic about including / not including code should be in the calling file :
        DON'T :
        callingFile :
        include includedFile
        includedFile :
        if condition do things
        instead DO :
        callingFile :
        if condition include includedFile
        includedFile :
        do things
        This way :
        • the code logic appears clearly without leaving callingFile
        • the include is processed only if condition is true, not everytime
      • with modern IDEs and decent editors (Vi, Emacs, ) opening 1 extra file shouldn't be that of a hassle

comments

  • why do people do this :
    # I explain what the code below does
    - moduleName :
        arg1: foo
        arg2: bar
    instead of :
    - name: "Description of what's happening below"
      moduleName :
        arg1: foo
        arg2: bar
    ⇒ use name:, this is what it's for.
  • a comment must explain things while still being generic : it must stay relevant when the context changes (things added / removed / moved). Indeed, code changes but people "forget" to update comments and delete irrelevant ones. Those obsolete comments may give WRONG information about what's going on. This is subtle, but as a general rule, remember that a comment should not explicitly mention :
    • the data being processed (configuration values, path, ...)
    • a machine name
mail

What's the difference between the vars and defaults role directories ?

Variables declared in vars have a higher precedence than those from defaults.

Many articles describe the purpose of these directories explaining that they are :

IMHO, this is not completely wrong, but not totally right either. It makes sense to discriminate variables on criteria like those above, but the Ansible documentation does not enforce such rules.
Regarding the vars and defaults directories, it's only about variable precedence.

As a summary :
mail

Typical Ansible project file tree

[root] ...well, this is the playbook root directory
ansible.cfg Ansible main configuration file
playbooks/
main.yml the main playbook file
myPlaybook1.yml other playbook (if needed)
myPlaybook2.yml other playbook (if needed)
hosts inventory file
databases.yml variables for all hosts that are members of the databases group
webservers.yml variables for all hosts that are members of the webservers group
vault.yml Ansible-vault encrypted variables file
host_vars/
mysql1.yml variables for the mysql1 host
mysql2.yml variables for the mysql2 host
apache1.yml variables for the apache1 host
apache2.yml variables for the apache2 host
myRole1
main.yml default/ variables of myRole1 (read more about variable precedence)
files/ files that will be copied to the slaves (is it supposed to mimic the destination file tree ?)
myFile1
myFile2
myFile3
handlers/ Changing a configuration file may involve restarting the corresponding daemon. However, multiple changes to the same file must restart the daemon only once. Handlers, triggered by the notify directive, implement this event-driven behavior.
main.yml
meta/ this is where role dependencies are described. More about Conditional role dependencies.
main.yml list of roles (and related parameters) to play before playing myRole1
tasks/
main.yml tasks of myRole1
templates/ Files or snippets that will be used to generate files on the slaves via the templating engine (is it supposed to mimic the destination file tree ?)
myTemplate1.j2
myTemplate2.j2
myTemplate3.j2
main.yml vars/ variables of myRole1 (read more about variable precedence)
myRole1_variables1.yml other variables (if needed)
myRole1_variables2.yml other variables (if needed)
myRole2 another role, with a similar structure
mail

Make Ansible role file tree easily : makeAnsibleRoleSkeleton.sh

#!/usr/bin/env bash
######################################### makeAnsibleRoleSkeleton.sh ################################
# Create the directory structure and some files for an Ansible role
########################################## ##########################################################

rolesDirectory='/opt/ansible/roles'
newRoleName=$1

[ -z "$newRoleName" ] && {
	echo "Usage : $0 <new role name>"
	exit 1
	}

mkdir -p "$rolesDirectory/$newRoleName/"{tasks,handlers,templates,files,vars,meta}
echo '---' | tee "$rolesDirectory/$newRoleName/"{tasks,handlers,vars,meta}/main.yml > /dev/null

  1. enter the "new role" directory
  2. for subDir in files handlers meta tasks templates vars; do mkdir -p "$subDir"; newFile="$subDir/main.yml"; echo '---' > "$newFile"; git add "$newFile"; done
  3. don't forget to commit
mail

Ansible glossary

play
block of code associating tasks to one or more hosts. Example :
- hosts: 127.0.0.1
  connection: local
  gather_facts: no
  tasks:

  - debug:
      msg: "hello, world"

  - debug:
      msg: "farewell, world"
playbook
group of one or more plays
mail

Ansible playbooks

Definitions :

As seen in the Introduction to Ansible article, Ansible can be used to perform ad-hoc tasks. But it can also execute procedures called playbooks (examples).

Playbooks :
  • are written in YAML.
  • are composed of one or more plays. A play is a list of hosts with associated roles and tasks. A task is, basically speaking, calling an Ansible module, as seen in the Introduction to Ansible article.

Important things about playbooks :

  • When running the playbook, which runs top to bottom, hosts with failed tasks are taken out of the rotation for the entire playbook. If things fail, simply correct the playbook file and rerun.
  • Modules (hence playbooks) are idempotent : if you run them again, they will make only the changes they must in order to bring the system to the desired state. Ansible relies on facts before taking any action; and playbooks must be designed :
    • NOT as a list of actions to do
    • but as the description of a desired state
    For example, if you instruct Ansible to install a package, it will first detect whether this package is already installed or not (during the facts gathering preliminary step), then act accordingly. This makes it very safe to rerun the same playbook multiple times : it won't change things unless it has to do so.
    except for shell and command modules, where idempotence cannot be guaranteed automatically (details).
  • Each task has a name parameter which is included in the output from running the playbook. This is for humans only and should be as descriptive as possible.
  • It is usually wise to track playbooks in SCM tools such as Git.

My first playbook :

  1. Save this as playbook.yml :
    ---
    # this is my 1st playbook
    
    - hosts: groupSlaves
      tasks:
      - name: test connection
        ping:
    groupSlaves is the group name defined in the inventory file.
  2. Launch it : ansible-playbook playbook.yml
  3. It may return :
    PLAY [groupSlaves] *****************************************************************
    
    GATHERING FACTS ***************************************************************
    ok: [192.168.105.80]
    ok: [192.168.105.114]
    
    TASK: [test connection] *******************************************************
    ok: [192.168.105.80]
    ok: [192.168.105.114]
    
    PLAY RECAP ********************************************************************
    192.168.105.114		: ok=2	changed=0	unreachable=0	failed=0
    192.168.105.80		: ok=2	changed=0	unreachable=0	failed=0
    

A full playbook :

To know what's performed by this playbook, just read the name lines.
---
# playbook_web.yml

- hosts: myGroup1:myGroup2

  vars:
    apacheUser:        'www-data'
    apacheGroup:       'www-data'
    documentRoot:      '/var/www/test/' # final '/' expected
    websiteLocalPath:  '/root/'
    websiteConfigFile: 'test.conf'

  tasks:
  - name: install Apache
    apt: name=apache2 state=present

  - name: disable default Apache website
    shell: a2dissite default

  - name: define Apache FQDN
    shell: echo "ServerName localhost" > /etc/apache2/conf.d/fqdn

  - name: create docRoot
    file: state=directory path={{ documentRoot }} owner={{ apacheUser }} group={{ apacheGroup }}

  - name: deploy website
    copy: src={{ websiteLocalPath }}index.html dest={{ documentRoot }} owner={{ apacheUser }} group={{ apacheGroup }}

  - name: deploy website conf
    copy: src={{ websiteLocalPath }}{{ websiteConfigFile }} dest=/etc/apache2/sites-available/

  - name: enable website
    shell: a2ensite {{ websiteConfigFile }}

  - name: reload Apache
    service: name=apache2 state=reloaded enabled=yes

- hosts: 127.0.0.1
  connection: local
  tasks:
  - name: check everything is ok on webserver 'myGroup1'
    shell: wget -S -O - -Y off http://192.168.105.114/index.html

  - name: check everything is ok on webserver 'myGroup2'
    shell: wget -S -O - -Y off http://192.168.105.80/index.html

How to run actions on the master in a playbook (source) :

Let's say you just deployed a new web server + a web application. Wouldn't it be great if you could run some checks at the end of the playbook, just to make sure everything's responding as expected ? To do so, you'd have to run some commands from the master host : use this code as the last play of your playbook :

- hosts: 127.0.0.1
  connection: local
  tasks:
  - name: make sure blah is blah.
    shell: 'myCheckCommand'

If myCheckCommand returns a Unix success :

Testing with myCheckCommand being true, execution of this specific play returns :

PLAY [127.0.0.1] **************************************************************

GATHERING FACTS ****************************************************************
ok: [127.0.0.1]

TASK: [make sure blah is blah.] ***********************************************
changed: [127.0.0.1]

PLAY RECAP *********************************************************************
127.0.0.1	: ok=2	changed=1	unreachable=0	failed=0

If myCheckCommand returns a Unix failure :

Now with false, output becomes :

PLAY [127.0.0.1] **************************************************************

GATHERING FACTS ****************************************************************
ok: [127.0.0.1]

TASK: [make sure blah is blah.] ***********************************************
failed: [127.0.0.1] => {"changed": true, "cmd": "false", "delta": "0:00:00.015746",
	"end": "2014-10-16 15:11:49.153606", "rc": 1, "start": "2014-10-16 15:11:49.137860"}

FATAL: all hosts have already failed -- aborting

PLAY RECAP *********************************************************************
	to retry, use: --limit @/root/fileNameOfMyPlaybook.retry

127.0.0.1	: ok=1	changed=0	unreachable=0	failed=1

Playbooks with roles (source, role file tree) :

Initial Ansible syntax (early versions) :

- hosts: webservers
  roles:
    - role_X
    - role_Y
These are processed as static imports.

Updated syntax (for Ansible 2.4+) :

- hosts: webservers
  tasks:
    - import_role:
        name: role_X
    - include_role:
        name: role_Y
You may choose between import_role and include_role considering the static or dynamic import that will be performed.

Old vs new syntax :

"Old" syntax (compact mode) :
- hosts: webservers
  roles:
  - { role: role_X, myVariable: "42", tags: "tag1, tag2" }
"Old" syntax (verbose mode) :
- hosts: webservers
  roles:
    - role: role_X
      vars:
        myVariable: "42"
      tags:
        - tag1
        - tag2
"New" syntax :
- hosts: webservers
  tasks:
    - import_role:
        name: role_X
      vars:
        myVariable: "42"
      tags:
        - tag1
        - tag2
mail

The inventory file

The inventory file :

Default groups

  • The implicit group all includes all slaves.
  • There is also another group named ungrouped. The logic behind Ansible is that all slaves must belong to at least 2 groups : all and "an other one". If there is no such "other one", ungrouped will be that one.
Both groups will always exist and don't need to be explicitly declared.
mail

Introduction to Ansible

Usage

Ansible is installed on a master host to rule them all. There's nothing to install on slaves (except SSH keys ).

Setup Ansible master on a Debian Buster (inspired by) :

  1. as root :
    apt install python3-pip
  2. as a non-root user : setup + activate a Python virtual environment
  3. still as a non-root user, and from within the virtual environment (if present) :
    pip3 install -U ansible

Setup SSH on the master (source) :

  1. Create a new key : ssh-keygen -t rsa will generate the 2048-bit /root/.ssh/id_rsa RSA private key.
  2. Deploy it to the slave(s)
  3. Configure SSH accordingly (/root/.ssh/config) :
    Host slave1
    	hostname	192.168.105.114
    	user		root
    	IdentityFile	~/.ssh/id_rsa
    
    Host slave2
    	hostname	192.168.105.80
    	user		root
    	IdentityFile	~/.ssh/id_rsa
  4. List slave(s) into the inventory file :
    192.168.105.114	# slave1
    192.168.105.80	# slave2
  5. Check communication between master and slave(s) :
    ansible all -m ping -u root
    192.168.105.114 | success >> {
    	"changed": false,
    	"ping": "pong"
    }
    
    192.168.105.80 | success >> {
    	"changed": false,
    	"ping": "pong"
    }
  6. It works !!!
  7. Define groups of hosts in the inventory file

Flags

CLI flags are common to several Ansible commands / tools. See this dedicated article.

Example

Get information about slaves :

ansible all -m setup
This will output a VERY long list of inventory information (aka facts) about the target(s). To get detailed information on a specific topic, you can apply a filter :
ansible myGroup2 -m setup -a 'filter=ansible_processor*'
192.168.105.80 | success >> {
	"ansible_facts": {
		"ansible_processor": [
			"Intel(R) Core(TM)2 Duo CPU	 E8400 @ 3.00GHz"
		],
		"ansible_processor_cores": 1,
		"ansible_processor_count": 1,
		"ansible_processor_threads_per_core": 1,
		"ansible_processor_vcpus": 1
	},
	"changed": false
}

Run shell commands on slaves :

ansible all -a "hostname"
run a basic command on all slaves :
192.168.105.114 | success | rc=0 >>
ansibleSlave
This is for basic commands (single binary, no options).
ansible myGroup1 -a "echo $(hostname)"
This is executed on the master because double quotes are interpreted locally :
192.168.105.114 | success | rc=0 >>
ansibleMaster
ansible myGroup1 -a 'echo $(hostname)'
This is sent to the right slave but not executed, because shell/subshell commands are not interpreted :
192.168.105.114 | success | rc=0 >>
$(hostname)
ansible myGroup1 -m shell -a 'echo $(hostname)'
Thanks to the shell module, this command is executed as expected :
192.168.105.114 | success | rc=0 >>
ansibleSlave
ansible all -m shell -a 'echo $(hostname) | grep -e "[a-z]"'
It's possible to run "complex" shell commands now :
192.168.105.80 | success | rc=0 >>
ansibleSlave2

192.168.105.114 | success | rc=0 >>
ansibleSlave

File transfer :

Ansible can scp files from the master to its slaves :

ansible all -m copy -a "src=/home/test.txt dest=/home/"

It's possible to rename the file during the copy by specifying a different destination name : ... "src=/home/test.txt dest=/home/otherName"

Manage packages :

Ansible can query its slaves about software using some dedicated packages :

  • apt for Debianoids. This module is part of the default install.
  • yum for Red Hatoids.

Possible values : installed, latest, removed, absent, present.

Make sure the package openssh-server is installed :
ansible all -m apt -a "name=openssh-server state=installed"
192.168.105.114 | success >> {
	"changed": false
}

192.168.105.80 | success >> {
	"changed": false
}
If the specified package was not already installed, this will install it. The FULL command output (install procedure) will be reported by Ansible.
Make sure the package apache2 is absent :
ansible all -m apt -a "name=apache2 state=absent"
192.168.105.80 | success >> {
	"changed": false
}

192.168.105.114 | success >> {
	"changed": false
}

Users and groups, the user module :

Create a user account for Bob :

ansible myGroup1 -m user -a "name=bob state=present"
192.168.105.114 | success >> {
	"changed": true,
	"comment": "",
	"createhome": true,
	"group": 1001,
	"home": "/home/bob",
	"name": "bob",
	"shell": "/bin/sh",
	"state": "present",
	"system": false,
	"uid": 1001
}
And if I run the same command again, whereas Bob's account already exists :
192.168.105.114 | success >> {
	"append": false,
	"changed": false,
	"comment": "",
	"group": 1001,
	"home": "/home/bob",
	"move_home": false,
	"name": "bob",
	"shell": "/bin/sh",
	"state": "present",
	"uid": 1001
}

Delete Bob's user account :

ansible myGroup1 -m user -a "name=bob state=absent remove=yes"
192.168.105.114 | success >> {
	"changed": true,
	"force": false,
	"name": "bob",
	"remove": true,
	"state": "absent"
	"stderr": "userdel: bob mail spool (/var/mail/bob) not found\n"
}
Running the same command again (no user account named Bob anymore) :
192.168.105.114 | success >> {
	"changed": false,
	"name": "bob",
	"state": "absent"
}
remove=yes instructs Ansible to delete the homedir as well.
remove=no is equivalent to not using the "remove" option at all (defaults to no), and leaves the homedir untouched.

List existing user accounts :

  • ansible myGroup1 -m shell -a 'less /etc/passwd | cut -d ":" -f 1'
  • ansible myGroup1 -m shell -a 'sed -r "s/^([^:]+).*/\1/" /etc/passwd'
  • ansible myGroup1 -m shell -a 'awk -F ":" "{print \$1}" /etc/passwd'
This gets complex because of escaping quotes and some special characters
192.168.105.114 | success | rc=0 >>
root
daemon
bin

nobody
libuuid
messagebus
bob

Deploying from Git : the git module :

ansible webservers -m git -a "repo=git://foo.example.org/repo.git dest=/srv/myapp version=HEAD"

Managing services, the service module :

Start a service : ansible all -m service -a "name=ssh state=started"

192.168.105.114 | success >> {
	"changed": false,
	"name": "ssh",
	"state": "started"
}

192.168.105.80 | success >> {
	"changed": false,
	"name": "ssh",
	"state": "started"
}
Accepted states :
  • started : start service if not running
  • stopped : stop service if running
  • restarted : always restart
  • reloaded : always reload
  • running : ?