Out of memory Error while training Rasa/LaBSE

588 views Asked by At

I want to train rasa/LaBSE from the LanguageModelFeaturizer. I have followed the steps in the docs and did not change the default training data.

My config file looks like:

# The config recipe.
# https://rasa.com/docs/rasa/model-configuration/
recipe: default.v1

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
   - name: WhitespaceTokenizer
#   - name: RegexFeaturizer
#   - name: LexicalSyntacticFeaturizer
   - name: LanguageModelFeaturizer
     # Name of the language model to use
     model_name: "bert"
     # Pre-Trained weights to be loaded
     model_weights: "rasa/LaBSE"
     cache_dir: null
   - name: CountVectorsFeaturizer
   - name: CountVectorsFeaturizer
     analyzer: char_wb
     min_ngram: 1
     max_ngram: 4
   - name: DIETClassifier
     epochs: 100
     constrain_similarities: true
     batch_size: 8
   - name: EntitySynonymMapper
   - name: ResponseSelector
     epochs: 100
     constrain_similarities: true
   - name: FallbackClassifier
     threshold: 0.3
     ambiguity_threshold: 0.1

After running rasa train I get:

tensorflow.python.framework.errors_impl.ResourceExhaustedError: failed to allocate memory [Op:AddV2]

I am using a GTX 1660ti with 6GB memory. My system specifications are:

Rasa
----------------------
rasa                    3.0.8
rasa-sdk                3.0.5

System
----------------------
OS: Ubuntu 18.04.6 LTS x86_64
Kernel: 5.4.0-113-generic
CUDA Version: 11.4
Driver Version: 470.57.02

Tensorflow
----------------------
tensorboard             2.8.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit  1.8.1
tensorflow              2.6.1
tensorflow-addons       0.14.0
tensorflow-estimator    2.6.0
tensorflow-hub          0.12.0
tensorflow-probability  0.13.0
tensorflow-text         2.6.0

Regular training works fine and I can run the model. I tried to reduce the batch_size but the error persists.

3

There are 3 answers

0
aamra On BEST ANSWER

Running the same code using google colab (Using 16GB GPU memory) works fine. The model uses around 6.5-7GB of memory.

0
Himanshu Teotia On

You can create swap memory if your RAM gets full at some point in training.

0
Enoch Levandovsky On

I am assuming OOM is with the diet classifier

Try decreasing some of these parameters. I will list the defaults below

- name: DIETClassifier
  epochs: 100
  batch_size: [16, 32]
  num_transformer_layers: 2
  embedding_dimension: 20
  hidden_layer_sizes:
    text: [256, 128]
  ...