How can I "replace" NoneType objects in Pandas if Python package raises error?

834 views Asked by At

I am using the eldar package in Python to write boolean queries. The dataframes on which I am running my queries contain a lot of empty cells, which are read as NoneType objects. Eldar, however, raises an error as one of the operations it performs is to set all cases to "lower", so it expects a string.

The error notification reads:

'NoneType' object has no attribute 'lower'

I have thus tried the following:

  1. Make sure that the input dataframe df is read as string:

df = pd.read_excel(file, sheet_name='FactoidList', axis=1, ignore_index=False, sort=False, dtype=str)

  1. Replace empty fields for None (two options):

df = df.where((pd.notnull(df)), None) df = df.replace({np.nan: None})

  1. Drop "na":

dropna(inplace = True)

These solutions are based on similar issues I found in various forums, but none of them do the trick. My empty cells are either still read as NoneTpye, or the entire data frame is empty.

If I was only running eldar on one column at a time, I could easily exclude empty cells that are empty, but I want to integrate eldar into a numpy condition list to check several columns per row.

My latest attempt to work with try and except unfortunately excludes the entire column from being considered if one cell is empty.

The relevant section of my much longer script looks like this:

# read all excel files in directory as one data frame

frame_list=[]
for item in os.listdir(filenames):
    file = os.path.join(filenames, item)
    df = pd.read_excel(file, sheet_name='FactoidList', axis=1, ignore_index=False, sort=False, dtype=str)
    df = df.replace({np.nan: None})
    #df = df.where((pd.notnull(df)), None) # replace empty fields for None
    frame_list.append(df)

f = pd.concat(frame_list, axis=0, ignore_index=False, sort=False)
    
# read factoids from data frame

pers_f=(f[['pers_name']]) # retrieve data from selected column
pers_list=pers_f.values.tolist() # convert data frame to sorted list
pers_list_flat=[item for sublist in pers_list for item in sublist] # flatten list
pers_unique=pers_f.drop_duplicates() # remove duplicates
pers_unique_list=pers_unique.values.tolist() # write unique values to list
    
print("\n\nYour factoid list contains", len(pers_f), "entries.") # count data in selected column

#for i in [item for sublist in pers_unique_list for item in sublist]: # count person occurrences
    #print("\n", i, " / ", "Häufigkeit:", pers_list_flat.count(i), "\n") # print name and occurrences
    
### STEP 2: LET USER SELECT SEARCH CRITERIA

print("Query format :", '("gandalf" OR "frodo") AND NOT ("movie" OR "adaptation")')
# queried names
print("Enter person names or wildcard *.")
qn=input()
# queried year  
print("Date(s):")
ex_year=input() 
# select type of time processing
print("No date selected (0), exact dates (1), data range (2), BEFORE date (3) or AFTER date (4)?")
z=input() 
# queried institution
print("Enter institutions or wildcard *:")
qi=input() 
# queried title
print("Enter person titles or wildcard *:")
qt=input() 
# queried function
print("Enter person functions or wildcard *:")
qf=input() 
# queried related person
print("Enter related persons or wildcard *:")
qr=input() 

# Eldar Queries for boolean search

eldar_n = Query(qn, ignore_case=True, ignore_accent=False, match_word=True)
print(eldar_n) # <class 'eldar.query.Query'>
print(f['pers_name'].apply(eldar_n))

try:
    eldar_i = Query(qi, ignore_case=True, ignore_accent=False, match_word=True)
    print(f['inst_name'].apply(eldar_i))
except AttributeError as er:
    print(er.args)
    pass
    
try:    
    eldar_t = Query(qt, ignore_case=True, ignore_accent=False, match_word=True)
    print(f['pers_title'].apply(eldar_t))
except AttributeError as er:
    print(er.args)
    pass

try:
    eldar_f = Query(qf, ignore_case=True, ignore_accent=False, match_word=True)
    print(f['pers_function'].apply(eldar_f))
except AttributeError as er:
    print(er.args)
    pass

try:
    eldar_r = Query(qr, ignore_case=True, ignore_accent=False, match_word=True)
    print(f['rel_pers'].apply(eldar_r))
except AttributeError as er:
    print(er.args)
    pass

I would appreciate ideas for solving the problem!

0

There are 0 answers