Ansible rollback on multiple hosts when one of the roles fail. How to pass variables between different plays?

44 views Asked by At

I need to run a set of roles on multiple servers defined by "all". I won't know the name of the serves as these are derived from dynamic inventory of Ansible tower.

If any of the tasks in these roles fail, I would need to do a rollback on all the hosts on which the plays have been executed - successful or not.

For example, if there is host1, host2 and host3, I execute my roles on these hosts. Host1 all roles pass. Host2 one task fails and the host is marked "failed". Host 3 also executes failed.

  • I would ideally not want the play to proceed to execute the roles on Host 3, as Host 2 has failed.
  • I would want to execute a rollback role on Host 1 and Host 2 alone, which would restore the system to the previous state.

I created two plays:

First play to execute the roles where I record the host which failed and would ideally want to stop play on rest of hosts when one of the roles fail. This does not seem to happen.

I would want to pass the value to the second play without using Hostvars (as I won't know the name of hosts on which execution have happened.

Second play, where I would expect the rollback to happen only on the hosts on which the execution have happened.

- hosts: all
  serial: 1
  max_failure_percentage: 0
  any_errors_fatal: true
  vars:
    successful_hosts: []
  tasks:
    - name: Record successfully executed hosts
      set_fact:
        host_failed: "{{ false if ansible_failed is not defined else ansible_failed }}"
 
    - name: Record successfully executed hosts
      set_fact:
        successful_hosts: "{{ successful_hosts|default([]) + [inventory_hostname] }}"

    - block:              
      - name: Check if previous roles succeeded
        assert:
          that:
            - "{{ host_failed is true }}"
        ignore_errors: true

      - name: Import multiple roles
        include_role:
          name: "{{ roles }}"
        loop:
          - roles/role_download_artifact
          - roles/role_create_config
          - roles/role_deploy_app
        loop_control:
              loop_var: roles
        when: not host_failed
        
      rescue:
        - name: "Host failed: {{ inventory_hostname }}"
          set_fact:
            host_failed: true
            successful_hosts: "{{ successful_hosts|default([]) + [inventory_hostname] }}"
          delegate_to: {{ inventory_hostname }}
          delegate_facts: true
  vars_files:
    - ../roles/default/vars.yml

- hosts: all
  serial: 1
  any_errors_fatal: true
  vars_files:
    - ../roles/default/vars.yml
  tasks:
    - name: run rollback
      include_role: 
        name: ../roles/role_rollback
      when: host_failed
1

There are 1 answers

2
Matt Blaha On

I think you've got some tunnel vision and are in anti-pattern territory. While I hate to answer a question that wasn't asked, I really think you should step back and try to think about making your plays/roles more Ansible-ish.

If you want a cleanup role to execute in the case that something fails, don't try to make some magic happen on the controller, the role should just handle the failures and call the cleanup role. Good roles are idempotent and self-contained.

Then Ansible and Tower will use all of their builtin stuff to report the status and changes of each host nice and cleanly.

As to the other part you seem to be looking for, you want to bail on the rest of the run once one host fails, that's in the meta module.

There are various options, end_host, end_play, end_serial.

From your description, I believe you want end_play.

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/meta_module.html