I am playing around with Kubeflow Pipelines, what I want to achieve is have one step (python function) where I create an Iterator (generator), from which I want to create a TF.Dataset

Connections between Kubeflow steps are only allowed to have primitive-type inputs/outputs, thus I am not able to pass the Iterator or the iterator-initialized-dataset into the next step.

This is overview of the pipeline

+-------------+   +-------------------+   +------------------------------+
| Data Ingest +---> Create TF.Dataset +---> Consume Tf.Dataset in Model  |
+-------------+   +-------------------+   +------------------------------+

Since I can pass around only primitive types, is there any possibility for storing the Iterator-Initialized Dataset?

Data are on Google Storage, the size is big enought to not fit into memory, how would anyone achieve this?

I know this is kind of broad question, but since Kubeflow is pretty new, I cannot find any helpful resources anywhere.

0 Answers