Training spaCy NER models on multiple GPUs (not just one)

420 views Asked by At

I am training my NER model using the following code.

Start of Code:

def train_spacy(nlp, training_data, iterations):
    
    if "ner" not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe("ner", last = True)
    
    training_examples = []
    faulty_dataset = []
    
    for text, annotations in training_data:
        doc = nlp.make_doc(text)
        try:
            training_examples.append(Example.from_dict(doc, annotations)) #creating examples for training as per spaCy v3.
        except:
            faulty_dataset.append([doc, annotations])        
        for ent in annotations['entities']:
            ner.add_label(ent[2])
    
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe!= 'ner']
    
    with nlp.disable_pipes(*other_pipes):
        optimizer = nlp.begin_training()
    
        for iter in range(iterations):
    
            print('Starting iteration: ' + str(iter))
            random.shuffle(training_examples)
            losses = {}
            batches = minibatch(training_examples, size=compounding(4.0, 32.0, 1.001))
            for batch in batches:
                nlp.update(
                            batch,
                            drop = 0.2,
                            sgd = optimizer,
                            losses = losses
                            )
            print(losses)
    
            for i in range(deviceCount): #to see how much GPU cores I am using:
                handle = nvmlDeviceGetHandleByIndex(i)
                util = nvmlDeviceGetUtilizationRates(handle)
                print(util.gpu)
    
    return nlp, faulty_dataset, training_examples

spacy.require_gpu() #this returns "True"

nlp = spacy.blank('en')
word_vectors = 'w2v_model.txt'
model_name = "nlp"
load_word_vectors(model_name, word_vectors) #I have some trained word vectors that I try to load them here.

test = train_spacy(nlp, training_data, 30) #training for 30 iterations

End of Code.

The problem:

The issue is that each iteration take about 30 minutes - I have 8000 training records which include very long texts and also 6 labels.

So I was hoping to reduce it using more GPU cores, but it seems that only one core is being used - when I execute print(util.gpu) in the code above, only the first core returns a non zero value .

Question 1: Is there any way I could use more GPU cores in the training process to make it faster? I would appreciate any leads.

After some more research, it seems that spacy-ray is intended to enable parallel training. But I cannot find the documentation on using Ray in the nlp.update as all I find is about using "python -m spacy ray train config.cfg --n-workers 2."

Question 2: Does Ray enable parallel processing using GPUs, is it only for CPU cores?
Question 3: How could I integrate Ray in the python code I have using nlp.update as opposed to using "python -m spacy ray train config.cfg --n-workers 2." ?

Thank you!

Environment:

All of the code above is in one conda_python3 notebook on AWS Sagemaker using ml.p3.2xlarge EC2 instance.
Python Version Used: 3
spaCy Version Used: 3.0.6

0

There are 0 answers