Find and replace in a aws.s3 object during json-streamin

Question

Find and replace in a aws.s3 object during json-streamin

101 views Asked by Andi At 04 December 2024 at 22:43

I have a fairly practical question, where it's hard to provide a regex - sorry for that. So I try to explain it properly.

A script connects to a AWS s3 bucket with the aws.s3 package. In that bucket there are .gz-files which contain JSON. Unfortunately some lines - not all - contain a bug in JSON-Format. They end with }]]} instead of }]}.

So I try to find an R-way to find and replace the pattern before unpacking the JSON-Object fails. A non-working line is already inserted (# gsub()) which represents a lucky guess to fix that thing.

What would be your solution?

    data_i <- aws.s3::get_object(
  object = objectname_i,
  bucket = bucketname_i,
  region = "eu-central-1",
  as = "raw"
) %>%
  rawConnection() |> 
  gzcon() |> 
 # gsub("}]]}", "}]]}") |>  
  jsonlite::stream_in()

Original Q&A

There are 1 answers

**Andi** · Accepted Answer · 2023-06-28T09:15:18+00:00

I found following solution: After setting up a connection, I use gzcon() for unpacking - as before. Now I read in the lines (readLines()) over the connection and have the data in R.

Now I can operate on the R object with gsub().

After that I want to use stream_in() again, and open therefore a textConnection(). As a result I have the data.frame s3ObjectDataframe

   s3ObjectUnpacked <- aws.s3::get_object(
      object = objectname_i,
      bucket = bucketname_i,
      region = "eu-central-1",
      as = "raw"
    ) |> 
      rawConnection() |>
      gzcon()

    s3ObjectPerLine <- readLines(s3ObjectUnpacked)
    s3ObjectCorrected <- gsub("}]]}", "}]}", s3ObjectPerLine)
    s3ObjectDataframe <- jsonlite::stream_in(textConnection(gsub("\\n", "", s3ObjectCorrected)))

TechQA.

Find and replace in a aws.s3 object during json-streamin

There are 1 answers

Related Questions in R

Related Questions in JSONLITE

Related Questions in AWS.S3

Popular Questions

Popular Tags

Trending Questions