how to predict a masked word in a given sentence

390 views Asked by At

FitBERT is an useful package , but I have a small doubt on BERT development for masked word prediction as below: I trained a bert model with custom corpus using Google's Scripts like create_pretraining_data.py, run_pretraining.py, extract_features.py etc..as a result I got vocab file, .tfrecord file, .json file and check point files.

Now how to use those file for your package to predict a masked word in a given sentence??

1

There are 1 answers

0
Elidor00 On

From the tensorflow documentation:

A TFRecord file stores your data as a sequence of binary strings. This means you need to specify the structure of your data before you write it to the file. Tensorflow provides two components for this purpose: tf.train.Example and tf.train.SequenceExample. You have to store each sample of your data in one of these structures, then serialize it and use a tf.python_io.TFRecordWriter to write it to disk.

This document along with the tensorflow documentation explain quite well how to use those file types.

While instead to use FitBERT directly through the library you can follow the examples you find on the project's github.