My script below takes a sample from an excel file, calculates a sample size based on some criteria, and spits out a csv file. My issue is with a part of the script that checks to see if a certain column is empty. I have tried .empty and isnull. Is null doesn't throw an error, but it doesn't do what I want, and .empty gives me a keyword error. How can I combine an if statement and a statement to check for an empty column?
**if df2['Subcategory'].isnull:**
def sample_per(df2):
if len(df2) >= 15000:
return (df2.groupby('Category').apply(lambda x: x.sample(frac=0.01)))
elif len(df2) < 15000 and len(df2) > 10000:
return (df2.groupby('Category').apply(lambda x: x.sample(frac=0.03)))
else:
return (df2.groupby('Category').apply(lambda x: x.sample(frac=0.05)))
else:
def sample_per(df2):
if len(df2) >= 15000:
return (df2.groupby('Subcategory').apply(lambda x: x.sample(frac=0.01)))
elif len(df2) < 15000 and len(df2) > 10000:
return (df2.groupby('Subcategory').apply(lambda x: x.sample(frac=0.03)))
else:
return (df2.groupby('Subcategory').apply(lambda x: x.sample(frac=0.05)))
.isnull() is used to check for NaN (or similar) values! (Not a Number)
If by empty column you mean a column of NaN...
You can either use .isnan() or .isnull() methods of Series object!
Watch it! in if df2['Subcategory'].isnull you didn’t call .isnull() ... meaning you didn’t write the parenthesis!
After that you will be returned a Series of Boolean values.
If you wanna know if all of the rows in that column are NaN you can just do this (to obtain a single True or False):
if df2['Subcategory'].isnull().all(): Rest of the code
If by empty you mean filled with “” (empty strings)
Then you could do this
df2['Subcategory'].apply(lambda x: not x).all()
Which evaluates to True if all the rows in “Subcategory” are empty strings.
Ps. Use .any() instead of .all() to check if at least one is True!