What is the equivalent of BlobstoreLineInputReader for targeting Google Cloud Storage?

158 views Asked by At

This is a python appengine question, mapreduce library 1.9.21 .

I have code writing lines to a blob in the local blobstore, then processing that using mapreduce BlobstoreLineInputReader.

Given that the files api is going away, I thought I'd retarget all my processing to cloud storage.

I would expect to find a class called GoogleCloudStorageLineInputReader, but there isn't anything like that. Is it hiding somewhere?

Is there something way I can use GoogleCloudStorageInputReader to read lines?

Another possibility is using GoogleCloudStorageRecordInputReader, but for that my input file needs to be in LevelDB format and I don't know how to create that except with a GoogleCloudStorageConsistentRecordOutputWriter, which I don't know how to use outside a mapreduce context. How might I do that?

Or am I doing this all wrong, is there some other possibility I've missed?

1

There are 1 answers

1
reikani On BEST ANSWER

At first, I attempted thinkjson's CloudStorageLineInputReader but had no success.

Then I found this pull request...which led me to rbruyere's fork. Despite some linting issues (like the spelling on GoolgeCloudStorageLineInputReader), however at the bottom of the pull request it is mentioned that it works fine, and asks if the project needs to be taken over.

Hope that helps!