How to use pyspellchecker to autocorrect spelling errors in a pandas column?

580 views Asked by At

I have the following dataframe:

df = pd.DataFrame({'id':[1,2,3],'text':['a foox juumped ovr the gate','teh car wsa bllue','why so srious']})

I would like to generate a new column with the fixed spelling errors using the pyspellchecker library.

I have tried the following but it did not correct any spelling errors:

import pandas as pd
from spellchecker import SpellChecker

spell = SpellChecker()

def correct_spelling(word):
    corrected_word = spell.correction(word)
    if corrected_word is not None:
        return corrected_word
    else:
        return word

df['corrected_text'] = df['text'].apply(correct_spelling)

Below is a dataframe for what the expected output should look like

pd.DataFrame({'id':[1,2,3],'text':['a foox juumped ovr the gate','teh car wsa bllue','why so srious'],
              'corrected_text':['a fox jumped over the gate','the car was blue','why so serious']})
2

There are 2 answers

0
Jason Baker On

I don't know anything about this package (how to fix accuracy) but you can split the strings in each row into a list and then iterate over a list of lists. This example uses a list comprehension:

df["text"] = [[spell.correction(word) for word in row] for row in df["text"].str.split(" ").to_list()]
df["text"] = df["text"].apply(lambda x: " ".join(x))

Output (As you can see you would need to work on the accuracy):

   id                       text
0   1  a food jumped or the gate
1   2           the car was blue
2   3             why so serious
1
Joep On

The accuracy is oké. Spellchecker can't read, only determine words that aren't spelled right. Spellchecker uses Levenhsteins method to determine the 'correct' word, based on the amount of corrections needed to correct the word. Foox is one-step away from fox but also from food. To 'solve' this problem, spellchecker uses a word-frequency list. If food has a higher frequency than fox, spellchecker will autocorrect to the first, which is probably the case. Constructing your own spellchecker dictionary with words common to your use will certainly improve the results.