Is there a faster method to process pandas list of string values

56 views Asked by At

There are 13000 values approximately for a given column. The below function works in a way that the input is a list of strings and does the NER tagging for each word in the list. On an average there could be 300 words in a list across 13000 values. It takes around more than 1 hour for the function to process the current column. Hence, I would like to have a solution which processed it faster. I am running on azure ml notebook with a standard CPU compute.

Function :

def perform_ner_batch(texts):
    if not texts:  # Check if texts is empty
        return []
    # Perform NER on the provided texts
    list_entity = []
    for i in texts:
      ner_result = ner_pipeline(i)
      if ner_result == []:
        list_entity.append('O')
      for results in ner_result:
        list_entity.append(results['entity_group'])
    return list_entity

Calling the function:

df['entities'] = df['Tokenized_Abstract_list'].apply(lambda x: perform_ner_batch(x))

0

There are 0 answers