Kinesis Firehose demo has no delimiters; Redshift can't parse without. Which is wrong?

Question

Kinesis Firehose demo has no delimiters; Redshift can't parse without. Which is wrong?

1k views Asked by John Haugeland At 05 January 2017 at 01:18

When I went through first learning steps with Kinesis, Firehose, and Redshift today, I was pleased to discover that Amazon had a "try our demo data producer" setup.

I was frustrated to learn that it does not seem to actually work.

So, I went digging. And I found STL_LOAD_ERROR contained errors suggesting that a delimiter was expected, and records' fronts that looked like {field:val,field:val}{field:val,field:val}.

...{"TICKER_SYMBOL": |     1214 | Delimiter not found

"Must be stripping newlines somewhere," I thought.

After digging, I found that there are production records in the relevant S3 bucket, in a surprising format:

{field:val,field:val}{field:val,field:val}...

That is, there are no delimiters between the apparent records, which are single line files of several dozen K each.

Other SO posts seem to suggest that this is actually the expected data format.

Why does Redshift need data in a format the data demo doesn't use? Which do I reconfigure?

Original Q&A

There are 2 answers

Brian Blakely On 14 June 2022 at 20:30

I also just ran into this issue. The root of issue is as you pointed out, the data generated has column names which are all upper-case, while the table creation code snippet has all lower-case column names.

The easiest solution I found was to change COPY command suggested (in Step 2) from "json 'auto'", to include 'ignorecase':

json 'auto ignorecase'

This should solve the error with data generation without needing to create a custom JSON path file.

You can see here for the documentation on this syntax.

**John Haugeland** · Accepted Answer · 2017-01-05T22:48:03+00:00

Okay. There were three problems.

The AWS example data producer produces data in a format which needs alterations to the Redshift COPY command, but they don't tell you that.
You need to add FORMAT AS JSON 's3://yourbucketname/aJsonPathFile.txt'
You need to create a JSON path file because the default data producer produces upper-case column names, which redshift cannot consume

TechQA.

Kinesis Firehose demo has no delimiters; Redshift can't parse without. Which is wrong?

There are 2 answers

Related Questions in AMAZON-REDSHIFT

Related Questions in AMAZON-KINESIS

Related Questions in AMAZON-KINESIS-FIREHOSE

Popular Questions

Popular Tags

Trending Questions