Merging files in Google Cloud Storage using Google Cloud Dataflow

1.6k views Asked by At

Nathan Marz in his book "Big Data" describes how to maintain files of data in HDFS and how to optimize files' sizes to be as near native HDFS block size as possible using his Pail library running on top of Map Reduce.

  1. Is it possible to achieve the same result in Google Cloud Storage?
  2. Can I use Google Cloud Dataflow instead of MapReduce for this purpose?
1

There are 1 answers

2
Darren Olivier On BEST ANSWER

Google Cloud Storage allows for composite objects, letting you store an object in multiple parts and combine them later up to a limit of 32 parts at once and 1024 constituent parts in total. This functionality is available in the API.

Composite Objects and Parallel Uploads - Google Cloud Platform Developer's Guide