Spark load all partions at once

34 views Asked by pin At 31 March 2024 at 13:03

i am trying to load all partitions data from S3 using spark, but i am not able to do so. here is how the data is stored in S3, where statusts is partition column.

i am using below java code to load the data.

Dataset<Row> archivalDS = session.read().format("avro").option("basePath", folder_path)
                .load(folder_path);

where folder_path variable value is s3a://bucket-name/ArchiveTables/log/60a89657-47df-4df4-8aa3-4c53f43782b9/ , application is throwing error with statusts column not available in schema. after going though few threads on stack overflow, i added column statusts to dataset , so that column is present in stored Avro file, but i see that column is missing in Avro file.

if i change the folder_path variable to specific partition, s3a://bucket-name/ArchiveTables/log/60a89657-47df-4df4-8aa3-4c53f43782b9/statusts=2024-03-01/ , it works. but in that case i will be able to load the data from one partition at a time.

is there a way i can load all partitions in a single load rather than giving partition value also in the path to load

other question, i am using below code to load the data in database after loading the data in avro. i gave lower bound to date from where we have data stored, upper bound some random date, but i have data where statusts is null. in that case how can we provide lower or upper bound, if i don't want to miss that data also, while loading.

transactionIdNullDS.repartition(6).write().mode(SaveMode.Append).format("jdbc")
                .option("stringtype", "unspecified").option("url", url).option("JDBCOptions.JDBC_DRIVER_CLASS", driver)
                .option("dbtable", tableName).option("user", "username")
                .option("password", "password")
                .option("partitionColumn", "statusts").option("lowerBound", "2022-01-01")
                .option("upperBound", "2999-12-31")
                .option("numPartitions", Integer.parseInt(System.getProperty("numPartitions.number", "8"))).save();

Original Q&A

TechQA.

Spark load all partions at once

There are 0 answers

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-SQL

Popular Questions

Trending Questions