I am using the eldar package in Python to write boolean queries. The dataframes on which I am running my queries contain a lot of empty cells, which are read as NoneType
objects. Eldar
, however, raises an error as one of the operations it performs is to set all cases to "lower", so it expects a string.
The error notification reads:
'NoneType' object has no attribute 'lower'
I have thus tried the following:
- Make sure that the input dataframe
df
is read as string:
df = pd.read_excel(file, sheet_name='FactoidList', axis=1, ignore_index=False, sort=False, dtype=str)
- Replace empty fields for None (two options):
df = df.where((pd.notnull(df)), None)
df = df.replace({np.nan: None})
- Drop "na":
dropna(inplace = True)
These solutions are based on similar issues I found in various forums, but none of them do the trick. My empty cells are either still read as NoneTpye
, or the entire data frame is empty.
If I was only running eldar
on one column at a time, I could easily exclude empty cells that are empty, but I want to integrate eldar into a numpy condition list to check several columns per row.
My latest attempt to work with try
and except
unfortunately excludes the entire column from being considered if one cell is empty.
The relevant section of my much longer script looks like this:
# read all excel files in directory as one data frame
frame_list=[]
for item in os.listdir(filenames):
file = os.path.join(filenames, item)
df = pd.read_excel(file, sheet_name='FactoidList', axis=1, ignore_index=False, sort=False, dtype=str)
df = df.replace({np.nan: None})
#df = df.where((pd.notnull(df)), None) # replace empty fields for None
frame_list.append(df)
f = pd.concat(frame_list, axis=0, ignore_index=False, sort=False)
# read factoids from data frame
pers_f=(f[['pers_name']]) # retrieve data from selected column
pers_list=pers_f.values.tolist() # convert data frame to sorted list
pers_list_flat=[item for sublist in pers_list for item in sublist] # flatten list
pers_unique=pers_f.drop_duplicates() # remove duplicates
pers_unique_list=pers_unique.values.tolist() # write unique values to list
print("\n\nYour factoid list contains", len(pers_f), "entries.") # count data in selected column
#for i in [item for sublist in pers_unique_list for item in sublist]: # count person occurrences
#print("\n", i, " / ", "Häufigkeit:", pers_list_flat.count(i), "\n") # print name and occurrences
### STEP 2: LET USER SELECT SEARCH CRITERIA
print("Query format :", '("gandalf" OR "frodo") AND NOT ("movie" OR "adaptation")')
# queried names
print("Enter person names or wildcard *.")
qn=input()
# queried year
print("Date(s):")
ex_year=input()
# select type of time processing
print("No date selected (0), exact dates (1), data range (2), BEFORE date (3) or AFTER date (4)?")
z=input()
# queried institution
print("Enter institutions or wildcard *:")
qi=input()
# queried title
print("Enter person titles or wildcard *:")
qt=input()
# queried function
print("Enter person functions or wildcard *:")
qf=input()
# queried related person
print("Enter related persons or wildcard *:")
qr=input()
# Eldar Queries for boolean search
eldar_n = Query(qn, ignore_case=True, ignore_accent=False, match_word=True)
print(eldar_n) # <class 'eldar.query.Query'>
print(f['pers_name'].apply(eldar_n))
try:
eldar_i = Query(qi, ignore_case=True, ignore_accent=False, match_word=True)
print(f['inst_name'].apply(eldar_i))
except AttributeError as er:
print(er.args)
pass
try:
eldar_t = Query(qt, ignore_case=True, ignore_accent=False, match_word=True)
print(f['pers_title'].apply(eldar_t))
except AttributeError as er:
print(er.args)
pass
try:
eldar_f = Query(qf, ignore_case=True, ignore_accent=False, match_word=True)
print(f['pers_function'].apply(eldar_f))
except AttributeError as er:
print(er.args)
pass
try:
eldar_r = Query(qr, ignore_case=True, ignore_accent=False, match_word=True)
print(f['rel_pers'].apply(eldar_r))
except AttributeError as er:
print(er.args)
pass
I would appreciate ideas for solving the problem!