Sparklyr: Reading multiple Parquet files from S3 runs indefinitely

Question

Sparklyr: Reading multiple Parquet files from S3 runs indefinitely

44 views Asked by alex At 17 January 2024 at 09:20

I'm working with Sparklyr to read Parquet files from an S3 bucket, and I'm facing an issue when trying to read multiple files. Reading a specific file works fine, but when attempting to read all files in a directory, the operation runs indefinitely. Here's a simplified version of the code I'm using:

library(sparklyr)

config$sparklyr.connect.enablehivesupport <- FALSE

sc <- spark_connect(master = "local", config = config)

sparklyr::spark_read_parquet( 
   sc,
   name = 'test',
   #path = 's3a://.../../data_01_04.parquet', #works fine
   #path = 's3a://.../../' #does not work
   #path = 's3a://.../../*.parquet' #does not work
 )

Am I missing something in the way I'm specifying the path for reading multiple files? Any insights or suggestions would be greatly appreciated.

Original Q&A

There are 1 answers

**Gin** · Answer 1 · 2024-01-19T03:39:35+00:00

Gin On 19 January 2024 at 03:39

Have you tried enabling the recursive file lookup? And put the path ending with the folder name without /

TechQA.

Sparklyr: Reading multiple Parquet files from S3 runs indefinitely

There are 1 answers

Related Questions in R

Related Questions in APACHE-SPARK

Related Questions in PARQUET

Related Questions in SPARKLYR

Related Questions in READ-DATA

Popular Questions

Trending Questions