In my code I wish to create a tokenizer from a pretrained TF Hub BERT module. Following is my code:
sess = tf.compat.v1.Session()
bert_path = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"
def create_tokenizer_from_hub_module():
"""Get the vocab file and casing info from the Hub module."""
bert_module = hub.load(bert_path)
tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
vocab_file, do_lower_case = sess.run(
[
tokenization_info["vocab_file"],
tokenization_info["do_lower_case"],
]
)
return FullTokenizer(vocab_file=vocab_file, do_lower_case=do_lower_case)
The following function call...
tokenizer = create_tokenizer_from_hub_module()
...gives the following error
--------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-74-97952ec55966> in <cell line: 2>()
1 # Instantiate tokenizer
----> 2 tokenizer = create_tokenizer_from_hub_module()
<ipython-input-73-ed88ad053485> in create_tokenizer_from_hub_module()
31 """Get the vocab file and casing info from the Hub module."""
32 bert_module = hub.load(bert_path)
---> 33 tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
34 vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],tokenization_info["do_lower_case"]])
35
TypeError: 'AutoTrackable' object is not callable
I am working on Google Colab with TF version 2.12.0 and the code was originally written and tested in TF version 1.X.
Pretty new to TensorFlow so I'm not sure what to do.
Followed these questions and answers: SO1, SO2, SO3, but I don't seem to understand.
Also tried downgrading TF version and subsequently Python version even though it is not the best option to follow. Still no success.
It is good that you are aware of the versions being used, which is exactly why the error has occurred. It is mentioned in the documentation :
You may as well check out these excerpts from the migration guide for TensorFlow Hub :
To sum it up, here is the code you can refer for tokenization :
You can find the complete gist in this notebook.