How to create a glue table schema for my streaming data with firehose converting them to parquets?

33 views Asked by At

I need to extract data from DynamoDB to S3 using Kinesis Data Stream and Firehose stream, converting them into parquets. I'm having trouble setting up the transformation to parquets in Firehose Stream because there I need to select a Glue Table with the schema of my data coming into Firehose. I've tried many things, the only thing that worked for me is when the data is simply None, or they come not empty but not what I need. There is another option to use Lambda to convert them to parquet, but I still need to set up this schema.

These are my data coming into Firehose:

{"awsRegion":"us-west-2","eventID":"ebb59b0b-247d-49cc-83fc-ccc1988481ed","eventName":"INSERT","userIdentity":null,"recordFormat":"application/json","tableName":"table-oleg","dynamodb":{"ApproximateCreationDateTime":1710435429661487,"Keys":{"user_id":{"S":"nvbssdfbdfbsdvsdv"}},"NewImage":{"autonomy":{"S":"ssnvbnbdfbdfsdvsdv"},"user_id":{"S":"nvbssdfbdfbsdvsdv"}},"SizeBytes":74,"ApproximateCreationDateTimePrecision":"MICROSECOND"},"eventSource":"aws:dynamodb"}{"awsRegion":"us-west-2","eventID":"0f30f744-2d94-4494-bd55-e0278942cccb","eventName":"INSERT","userIdentity":null,"recordFormat":"application/json","tableName":"table-oleg","dynamodb":{"ApproximateCreationDateTime":1710435451018854,"Keys":{"user_id":{"S":"nvbssdfbdfbSVSV"}},"NewImage":{"autonomy":{"S":"ssnvbnbdfbdfsEV"},"user_id":{"S":"nvbssdfbdfbSVSV"}},"SizeBytes":67,"ApproximateCreationDateTimePrecision":"MICROSECOND"},"eventSource":"aws:dynamodb"}

I need these two columns with their values to be uploaded to S3:

autonomy: ssnvbnbdfbdfsEV
user_id: nvbssdfbdfbSVSV

These are my Firehose stream settings for conversion to Parquet

This is my JSON schema in the Glue table

These are my Glue table settings

Thank you in advance for your help!

I've tried using many schema examples found on the internet. I also attempted the example from the official AWS website:

{
    "$id": "https://example.com/person.schema.json",
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Person",
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string",
            "description": "The person's first name."
        },
        "lastName": {
            "type": "string",
            "description": "The person's last name."
        },
        "age": {
            "description": "Age in years which must be equal to or greater than zero.",
            "type": "integer",
            "minimum": 0
        }
    }
}

But so far without success...

0

There are 0 answers