Sparklyr: Reading multiple Parquet files from S3 runs indefinitely

44 views Asked by At

I'm working with Sparklyr to read Parquet files from an S3 bucket, and I'm facing an issue when trying to read multiple files. Reading a specific file works fine, but when attempting to read all files in a directory, the operation runs indefinitely. Here's a simplified version of the code I'm using:

library(sparklyr)

config$sparklyr.connect.enablehivesupport <- FALSE

sc <- spark_connect(master = "local", config = config)

sparklyr::spark_read_parquet( 
   sc,
   name = 'test',
   #path = 's3a://.../../data_01_04.parquet', #works fine
   #path = 's3a://.../../' #does not work
   #path = 's3a://.../../*.parquet' #does not work
 )

Am I missing something in the way I'm specifying the path for reading multiple files? Any insights or suggestions would be greatly appreciated.

1

There are 1 answers

1
Gin On

Have you tried enabling the recursive file lookup? And put the path ending with the folder name without /