I am trying to return the rows of a dataframe in pandas that correspond to the label I choose. For example, in my function Female, it returns all the rows in which the patient is female. For AgeRange, I have run into issues satisfying both conditions without getting an error.

dataset = pd.read_csv('insurance.csv')

def Female(self):
    rows = dataset[dataset.sex == 1]
    print(rows)

def AgeRange(self):
    rows = dataset[dataset.age > 0] & dataset[dataset.age < 20]
    print(rows)

Using the bitwise operator gets be the error below: TypeError: unsupported operand type(s) for &: 'float' and 'bool'

def AgeRange(self):
    rows = dataset[dataset.age > 0] and dataset[dataset.age < 20]
    print(rows)

Using the boolean and operator gets me the error below: ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

def AgeRange(self):
    rows = np.logical_and(dataset[dataset.age > 0],dataset[dataset.age < 20])
    print(rows)

Using the numpy logical and gets me the error: ValueError: operands could not be broadcast together with shapes (1309,7) (135,7).

I'm honestly not sure what that leaves me with, or what is causing the issue in the first place. Can anyone help point out where I'm going wrong?

2

There are 2 answers

0
Oleg O On BEST ANSWER

Standard syntax is

df[(df['a'] > X) & (df['a'] < Y)]

or using query():

df.query('X < a < Y') 
1
eimarin On

This syntax is easier for me! If you hace 3 diferent conditions that want to meet at the same time

cond1 = df["id"] == id
cond2  = df["date"] > date_min
cond3  = df["date"] < date_max

result = df[cond1 & cond2 & cond3]