bash compile output into a csv using jq?

79 views Asked by At

I am writing a bash script to create a csv from a sample inconsistent json data but running into a bunch of issues :

My json data looks like this :

json_data='[
  {
    "ResourceARN": "arn",
    "Tags": [
      {
        "Key": "STAGE",
        "Value": "st"
      }
    ]
  },
  {
    "ResourceARN": "arn",
    "Tags": [
      {
        "Key": "STAGE",
        "Value": "st"
      }
    ]
  },
  {
    "ResourceARN": "arn",
    "Tags": [
      {
        "Key": "aws:cloudformation:stack-name",
        "Value": "aud"
      },
      {
        "Key": "CostCenter",
        "Value": "Development"
      },
      {
        "Key": "aws:cloudformation:stack-id",
        "Value": "arn"
      },
      {
        "Key": "STAGE",
        "Value": "ec"
      },
      {
        "Key": "Environment",
        "Value": "development"
      },
      {
        "Key": "Region",
        "Value": "us-west-2"
      },
      {
        "Key": "Service",
        "Value": "MP"
      },
      {
        "Key": "aws:cloudformation:logical-id",
        "Value": "ApiGatewayRestApi"
      },
      {
        "Key": "Team",
        "Value": "VNP"
      }
    ]
  } and so on
]'

I am trying to create a csv with the following column names : ARN, Stage, CostCenter, Service, Domain, Team and the values should be what is found in the key value pairs of the Tags array and whatever key value pairs do not exist then just leave the column as blank.

My code looks like this :

# Define the columns
columns=("ARN" "STAGE" "CostCenter" "Service" "Domain" "Team")

# Create CSV header
header=$(IFS=','; echo "${columns[*]}")

# Function to get value for a specific key in the Tags array
get_value() {
  key="$1"
  tags="$2"
  value=$(echo "$tags" | jq -r --arg key "$key" '.[] | select(.Key == $key) | .Value')
  if [ "$value" == "null" ]; then
    echo ""
  else
    echo "$value"
  fi
}

# Create the CSV data
csv_data=""
for item in "${json_data[@]}"; do
  resource_arn=$(echo "$item" | jq -r '.ResourceARN')
  row="$resource_arn"
  for col in "${columns[@]}"; do
    value=$(get_value "$col" "$(echo "$item" | jq -c '.Tags')")
    row="$row,$value"
  done
  csv_data="$csv_data$row\n"
done

# Output the CSV to a file
echo -e "$header\n$csv_data" > output.csv

echo "CSV data has been written to output.csv"

But i keep running into

jq: error (at <stdin>:57): Cannot index array with string "ResourceARN"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"
jq: error (at <stdin>:57): Cannot index array with string "Tags"

Basically the final CSV should look like

ARN STAGE CostCenter Service Domain Team
arn st.   ""          ""      ""     ""
arn st    ""          ""      ""     "" 
arn ec    Development MP.     ""     VNP
....

Can someone please point out what i'm doing wrong or if there is a simple way of doing it? i would imagine jq would simply leave the values blank and output the ones having values? Thank you.

1

There are 1 answers

1
peak On BEST ANSWER

Since you state you wish to produce CSV, let me start with a filter that does just that:

["ARN", "STAGE", "CostCenter", "Service", "Domain", "Team"] as $columns
| $columns, 
  (.[]
   | .ResourceARN as $ARN
   |  .Tags | from_entries
   | [$ARN, .[ $columns[1:][]]] )
| @csv

Your illustrative output seems to correspond more closely to TSV, and you indicate you want missing fields to be represented by "", so we could tweak the above accordingly:

["ARN", "STAGE", "CostCenter", "Service", "Domain", "Team"] as $columns
| "\"\"" as $q
| $columns, 
  (.[]
   | .ResourceARN as $ARN
   |  .Tags | from_entries
   | [$ARN, (.[ $columns[1:][]] | (. // $q ) ) ] )
| @tsv

(The trick here is using STREAM | (. // $q) instead of STREAM // $q.)

With your sample input, this produces:

ARN STAGE   CostCenter  Service Domain  Team
arn st  ""  ""  ""  ""
arn st  ""  ""  ""  ""
arn ec  Development MP  ""  VNP

You might wish to consider other variants, e.g. using join(",") instead of @csv or @tsv.