Checking for empty column in sample data?

129 views Asked by At

My script below takes a sample from an excel file, calculates a sample size based on some criteria, and spits out a csv file. My issue is with a part of the script that checks to see if a certain column is empty. I have tried .empty and isnull. Is null doesn't throw an error, but it doesn't do what I want, and .empty gives me a keyword error. How can I combine an if statement and a statement to check for an empty column?

**if df2['Subcategory'].isnull:**
    def sample_per(df2):
        if len(df2) >= 15000:
            return (df2.groupby('Category').apply(lambda x: x.sample(frac=0.01)))
        elif len(df2) < 15000 and len(df2) > 10000:
            return (df2.groupby('Category').apply(lambda x: x.sample(frac=0.03)))
        else:
            return (df2.groupby('Category').apply(lambda x: x.sample(frac=0.05)))

else:
    def sample_per(df2):
        if len(df2) >= 15000:
            return (df2.groupby('Subcategory').apply(lambda x: x.sample(frac=0.01)))
        elif len(df2) < 15000 and len(df2) > 10000:
            return (df2.groupby('Subcategory').apply(lambda x: x.sample(frac=0.03)))
        else:
            return (df2.groupby('Subcategory').apply(lambda x: x.sample(frac=0.05)))
1

There are 1 answers

0
Masso On

.isnull() is used to check for NaN (or similar) values! (Not a Number)

If by empty column you mean a column of NaN...

You can either use .isnan() or .isnull() methods of Series object!

Watch it! in if df2['Subcategory'].isnull you didn’t call .isnull() ... meaning you didn’t write the parenthesis!

After that you will be returned a Series of Boolean values.

If you wanna know if all of the rows in that column are NaN you can just do this (to obtain a single True or False):

if df2['Subcategory'].isnull().all(): Rest of the code

If by empty you mean filled with “” (empty strings)
Then you could do this

df2['Subcategory'].apply(lambda x: not x).all()

Which evaluates to True if all the rows in “Subcategory” are empty strings.

Ps. Use .any() instead of .all() to check if at least one is True!