I have a terraform code where I am trying to add a bunch of AD users to a databricks workspace from certain AD groups. The ad_group_names
variable is a simple list of names of AD groups.
# Make sure the names are correct and groups are present in AD
data "azuread_groups" "this" {
display_names = var.ad_group_names
}
# Get all of the AD groups details
data "azuread_group" "this" {
for_each = toset(data.azuread_groups.this.display_names)
display_name = each.key
}
# Get all users from all groups
data "azuread_users" "this" {
object_ids = flatten([for group in data.azuread_group.this : [group.members]])
}
# Add all distinct users to databricks
resource "databricks_user" "this" {
for_each = { for user in distinct(data.azuread_users.this.users) : lower(user.mail) => user }
user_name = lower(each.value.mail)
external_id = each.value.object_id
display_name = each.value.display_name
databricks_sql_access = true
workspace_access = true
}
When running the code I am getting the following error:
Error: Invalid for_each argument
│
│ on modules/module-databricks-ad-sync/main.tf line 43, in resource "databricks_user" "this":
│ 43: for_each = { for user in distinct(data.azuread_users.this.users) : lower(user.mail) => user }
│ ├────────────────
│ │ data.azuread_users.this.users is a list of object, known only after apply
│
│ The "for_each" map includes keys derived from resource attributes that
│ cannot be determined until apply, and so Terraform cannot determine the
│ full set of keys that will identify the instances of this resource.
│
│ When working with unknown values in for_each, it's better to define the map
│ keys statically in your configuration and place apply-time results only in
│ the map values.
│
│ Alternatively, you could use the -target planning option to first apply
│ only the resources that the for_each value depends on, and then apply a
│ second time to fully converge.
╵
Now I understand that according to documentation there are limits as to what you can use to serve as a keys of map for the for-each. The map of keys must be made of known values. However I have an environment where this code has already been deployed and continues to work whenever I run it and it does add new users from these groups. The plan of the working code looks like so:
What I am failing to understand is why is this code working in the other environment and it refuses to work in the new one. The are no differences in terms of Service principal or AD groups used. All versions of providers and terraform are the same (although most likely upon original deployment of the working env the versions were slightly different to what are they now). How do I make this work in the new env?
EDIT:
I've decided to avoid this weird and unclear terraform behaviour and have created a sepearate python script that queries the Azure AD and creates a JSON file with all the required groups and users. This file is then supplied to terraform and it does its magic.
In the
databricks_user
resource block, I see afor_each
argument which is attempting to iterate over a dynamic set of values derived fromdata.azuread_users.this.users
: they are not known until after apply.In this line, you are trying to create a map for
for_each
using keys derived from resource attributes (user.mail
) that are not yet known to Terraform at the planning phase, hence the error message you received about invalidfor_each
argument.The Terraform documentation ("Limitations on values used in
for_each
") mentions that the map of keys used infor_each
must be made of known values.To address this, you would need to restructure your code such that the keys of the map used in
for_each
are determined statically from your configuration, or use the-target
option as a workaround.For instance:
Here, a
locals
block is used to create a map of users keyed by their email addresses, which is then used in thefor_each
argument of thedatabricks_user
resource. That way, the map keys are determined statically from your configuration.You get the workflow:
The IDs are known, and the emails are not, because of how Terraform's dependency graph works and how it evaluates and fetches data from resources and data sources.
Resource IDs are immediate: When you fetch a list of resources, like AD groups, the identifiers for these resources are typically available immediately. These are static, unchanging identifiers that Terraform can know at plan time because they are how the resource is indexed in the provider's API.
Attributes depend on Read/Apply: Other resource attributes, like emails in the AD users' case, are not known until the read operation is performed against the provider's API, which usually happens during the
apply
phase. That is because these attributes could change and are not used by Terraform to uniquely identify the resource during planning.When you chain data blocks, you introduce a dependency from one block to the next. Terraform plans out the read operations for data sources in a way that respects these dependencies. If a data source depends on the output of another data source or resource, and that output is not known until the apply phase, Terraform cannot resolve this dependency during the planning phase.
Terraform's planning stage is designed to predict and outline what changes will be made before any real changes are applied. While the information might be available in AD, Terraform operates conservatively to make sure it does not make any changes or external API calls that could alter the state of your resources until it is in the apply stage. That is to avoid any side effects or unexpected changes.
To use the user's mail attribute in your
for_each
, you would need to first make sure Terraform can resolve these emails during the planning phase. However, becausedata.azuread_users
fetches user details that might not be statically known ahead of time, Terraform errs on the side of caution and requires these details to be known before it can proceed with planning.That is why, when you switch to using user IDs which are known ahead of time, your code starts to work; Terraform can plan out the resource creation because it is not dependent on any values that need to be fetched at apply time.
To work around this, you can use user IDs as keys for the
for_each
and then looks up the email addresses inside the resource block where Terraform can handle the unknown values during the apply phase.You can adjust the Terraform code to use the user IDs as the keys in your
for_each
map, usingazuread_user
data sources to resolve the user IDs into their respective mail addresses properly.Each
azuread_user
is fetched using its ID, which is known during the planning phase.A local map (
users_map
) is created to associate each user ID with the correspondingazuread_user
data object.The
databricks_user
resource uses theusers_map
for itsfor_each
, ensuring that the iteration is based on known user IDs.User properties, such as
mail
anddisplay_name
, are resolved at apply time and used to configure eachdatabricks_user
.This should work around the issue of Terraform not knowing the user emails during the planning stage, by using the user IDs which are known.
The original success of the Terraform configuration in a different environment might have been due to: