Unable to use Sentence embeddings in Transform component (TFX)

202 views Asked by At

I am working with a review dataset where the columns are numerical or categorical. The last column though, is a text review (paragraph of english sentences) so I used the Universal Sentence Encoder (https://tfhub.dev/google/universal-sentence-encoder/4) to get a sentence embedding. The goal of this project is to assign a sentiment label to each review example. All this was relatively straightforward with TF (2.0) Keras model.

But I'm unable to figure out how to get this to work in a TFX pipeline. Specifically, I'm unable to figure out how to use ExampleGen and Transform components together while trying to use the pretrained embeddings model. The example gen component is fed the original data, so the transform component does eventually gets a sparse tensor for the review text. But this is where I want to use the universal sentence encoder and get the sentence embedding tensor instead (the encoder accepts a string, list of strings or list of string eager tensors to generate the embeddings). The review text is the dominant part of the sentiment classification, so in essence, I'm unable to proceed further.

There are two possible things I can do:

  1. In the preprocessing_fn for the transform component, somehow convert the review sparse tensor into a string tensor which can then be fed into universal sentence encoder model. I've attempted this but I ran into eager execution issues ('Tensor' object has no attribute 'numpy' error).
  2. Discard ExampleGen component and go instead with Apache Beam + Tensorflow Transform to get data, perform the necessary transformations then get the Trainer component to pick it up from here (I'm not sure of this part yet) followed by Evaluator, Pusher, etc. But this feels like a bit more effort and I'd like to understand if there's a trick that I'm missing before I go down this path.

I'm relatively new to TF so any help on this would be appreciated, thanks!

Edit: I did find this link discussing something similar (https://github.com/tensorflow/tfx/issues/2517) but was unable to derive anything concrete out of it either.

0

There are 0 answers