Terraform AWS ALB http/https listener creation/destruction unstable and caused errors for dependencies

2.4k views Asked by At

This was asked in Terraform terraform-providers/terraform-provider-aws but seems the provider issues don't get much of attention (I know terraform team quite a small team so they couldn't take care of all issues, especially regarding provider) - but this is really a blocking issue for AWS provider and very easy to reproduce.

Below is our test configuration

variable "domain_name" {
  default = "mytest.com"
}

variable "ssl_policy" {
  default = "ELBSecurityPolicy-2016-08"
}

data "aws_acm_certificate" "mytest_certificate" {
  domain = "*.${var.domain_name}"
}

resource "aws_alb" "alb" {
  name = "khiem-test-alb"
  internal = false
  security_groups = ["sg-35482152"]
  subnets = ["subnet-04c29a60", "subnet-d05915a6"]

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_target_group" "author_target_group" {
  name = "khiem-author-target-group"
  port = 8080
  protocol = "HTTP"
  vpc_id   = "vpc-32c75856"

  health_check = {
    protocol = "HTTP"
    path = "/.healthcheck/"
    port = 8080
    healthy_threshold = 5
    unhealthy_threshold = 2
    timeout = 5
    interval = 30
    matcher = "200"
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_target_group_attachment" "author_target_group_att" {
  target_group_arn = "${aws_alb_target_group.author_target_group.arn}"
  target_id = "i-0b305d179d6aacf57"
  port = 8080

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_target_group" "public_target_group" {
  name = "khiem-public-target-group"
  port = 8080
  protocol = "HTTP"
  vpc_id   = "vpc-32c75856"

  health_check = {
    protocol = "HTTP"
    path = "/.healthcheck/"
    port = 8080
    healthy_threshold = 5
    unhealthy_threshold = 2
    timeout = 5
    interval = 30
    matcher = "200"
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_target_group_attachment" "public_target_group_att" {
  target_group_arn = "${aws_alb_target_group.public_target_group.arn}"
  target_id = "i-0b305d179d6aacf57"
  port = 8080

  lifecycle {
    create_before_destroy = true
  }
}

# http listener
resource "aws_alb_listener" "alb_http_listener" {
  load_balancer_arn = "${aws_alb.alb.arn}"
  port = "80"
  protocol = "HTTP"

  default_action {
    target_group_arn = "${aws_alb_target_group.public_target_group.arn}"
    type             = "forward"
  }

  lifecycle {
    create_before_destroy = true
  }
}

# http listener rules
resource "aws_alb_listener_rule" "alb_http_public_rule" {
  listener_arn = "${aws_alb_listener.alb_http_listener.arn}"
  priority = 100

  action {
    type = "forward"
    target_group_arn = "${aws_alb_target_group.public_target_group.arn}"
  }

  condition {
    field = "host-header"
    values = ["public-khiem.${var.domain_name}"]
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_listener_rule" "alb_http_author_rule" {
  listener_arn = "${aws_alb_listener.alb_http_listener.arn}"
  priority = 99

  action {
    type = "forward"
    target_group_arn = "${aws_alb_target_group.author_target_group.arn}"
  }

  condition {
    field = "host-header"
    values = ["author-khiem.${var.domain_name}"]
  }

  lifecycle {
    create_before_destroy = true
  }
}

# https listener
resource "aws_alb_listener" "alb_https_listener" {
  load_balancer_arn = "${aws_alb.alb.arn}"
  port = "443"
  protocol = "HTTPS"

  ssl_policy        = "${var.ssl_policy}"
  certificate_arn   = "${data.aws_acm_certificate.mytest_certificate.arn}"

  default_action {
    target_group_arn = "${aws_alb_target_group.public_target_group.arn}"
    type             = "forward"
  }

  lifecycle {
    create_before_destroy = true
  }
}

# https listener rules
resource "aws_alb_listener_rule" "alb_https_public_rule" {
  listener_arn = "${aws_alb_listener.alb_https_listener.arn}"
  priority = 100

  action {
    type = "forward"
    target_group_arn = "${aws_alb_target_group.public_target_group.arn}"
  }

  condition {
    field = "host-header"
    values = ["public-khiem.${var.domain_name}"]
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_listener_rule" "alb_https_author_rule" {
  listener_arn = "${aws_alb_listener.alb_https_listener.arn}"
  priority = 99

  action {
    type = "forward"
    target_group_arn = "${aws_alb_target_group.author_target_group.arn}"
  }

  condition {
    field = "host-header"
    values = ["author-khiem.${var.domain_name}"]
  }

  lifecycle {
    create_before_destroy = true
  }
}

Basically the configuration just creates an Application Load Balancer, 2 target groups and http/https listeners to route requests to each target group based on domain.

This simple setup should (and it did in the past) work properly - just recently we found that it becomes unstable both for creating/destroying, the HTTP or HTTPS listener resources somehow are not recorded properly in the Terraform state and it caused error for other resources depend on them (like aws_alb_listener_rule), below is the error when creating

Error applying plan:

2 error(s) occurred:

* aws_alb_listener_rule.alb_http_public_rule: Resource 'aws_alb_listener.alb_http_listener' does not have attribute 'arn' for variable 'aws_alb_listener.alb_http_listener.arn'
* aws_alb_listener_rule.alb_http_author_rule: Resource 'aws_alb_listener.alb_http_listener' does not have attribute 'arn' for variable 'aws_alb_listener.alb_http_listener.arn'

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

This does not happen everytime but getting more often recently, could be reproducible easier by running a sequence of commands like

terraform apply && terraform destroy -force && terraform apply && terraform destroy -force

We tested and got the same error of instability with Terraform 0.9.8 and 0.10.7, when getting error if we run the same command again it mostly worked - but this is a blocking for our automation process.

0

There are 0 answers