AWS Glue error: "Invalid field: . The dataType 'double' is invalid for 'org.bson.BsonUndefined@0'."

81 views Asked by At

I am trying to use AWS Glue for first time to build a data pipeline to fetch data from MongoDB (AWS DocumentDB) to S3. The data has multiple nested fields with different data types. I was able to create the dynamic-frame without any error but when I try to show the same dynamic-frame or to store the dynamic-frame to S3, it is giving me an error saying Invalid field: <field name>. The dataType 'double' is invalid for 'org.bson.BsonUndefined@0'.. I am using the following code to connect to MongoDB:

read_mongo_options = {
    "connection.uri": mongo_uri,
    "database": "",
    "collection": "",
    "username": "",
    "password": "",
    "partitioner": "com.mongodb.spark.sql.connector.read.partitioner.PaginateIntoPartitionsPartitioner",
    "partitionerOptions.partitionSizeMB": "10",
    "partitionerOptions.partitionKey": "_id"
}




# Get DynamicFrame from MongoDB
dynamic_frame = glueContext.create_dynamic_frame.from_options(connection_type="Mongodb", connection_options=read_mongo_options)

dynamic_frame.toDF().count()

# glueContext.write_dynamic_frame.from_options(frame = dynamic_frame,
#           connection_type = "s3",
#           connection_options = {"path": output_path, "partitionKeys": []},
#           format = "glueparquet",
#           format_options={"compression": "uncompressed"}) 

If I change the partitioner then it gives same error on different column (with timestamp data type). What is the reason for this error? Do I need to do some transformation on dynamic-frame before using it further or saving it? If someone can help me here. I got the partitioner names from here: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect-mongodb-home.html

0

There are 0 answers