I run a Glue Crawler over a nested JSON data-source on S3 and I tried to query nested fields as per documentation via Redshift Spectrum:
select c.id , c.my_nested_column.MyField
from my_external_schema.my_table c;
But as per title I was getting the error message
[42703] ERROR: column "my_nested_column" does not exist
which doesn't really make sense as from metadata I can see the field exists. But because of this I'm unable to unnest fields from "my_nested_column".
How to fix this?
After some investigations, I noticed that one of the fields of the JSONs being parsed contained a colon within the field name and it was something like
my:field.This clashes with the JSON logic and doesn't work really well. I removed this field from the Glue Catalog and afterwards I was able to query the field correctly.
The initial error message I was getting really didn't help but the problem was caused by malformed JSON field name.