Linked Questions

Popular Questions

Filter on multiple columns in pandas data frame using a loop

Asked by At

Context: I have a data in excel that we process through Pandas to clean up and then further use it in ML model. In clean-up process, I'm trying to filter data based on multiple columns as an OR condition. This set of columns has header name as start of week date -so these 7 columns would represent 7 weeks. This column's header name changes every week. Hence, I can't keep the consistent code in place to pick the header name automatically.

Logic That I Have tried: I wrote a code chunk to print the "OR" condition using this date columns, after that I copy paste this print statement in Data frame in-dices part. Below is how it looks like:

I'm copy pasting the column as of now. But I guess I can built a logic to identify the date column by applying type-based-condition to column names

Sample Data:

 1/20/2019 1/27/2019  2/3/2019 2/10/2019    2/17/2019 2/24/2019  3/3/2019  \
0   0(80CS,8H)   0(80CS)   0(80CS)   0(80CS)      0(80CS)   0(80CS)   0(80CS)   
1   0(50CS,8H)   0(50CS)   0(50CS)   0(50CS)      0(50CS)   0(50CS)   0(50CS)   
2   0(40CS,8H)   0(40CS)   0(40CS)   0(40CS)      0(40CS)   0(40CS)   0(40CS)   
3   0(40CS,8H)   0(40CS)   0(40CS)   0(40CS)      0(40CS)   0(40CS)   0(40CS)   
4   0(40CS,8H)   0(40CS)   0(40CS)   0(40CS)      0(40CS)   0(40CS)   0(40CS)   
5   0(40CS,8H)   0(40CS)   0(40CS)   0(40CS)      0(40CS)   0(40CS)   0(40CS)   
6  12(25CS,8H)  15(25CS)  15(25CS)  15(25CS)     15(25CS)  15(25CS)  15(25CS)   
7  11(28CS,8H)  12(28CS)  12(28CS)  12(28CS)     12(28CS)  12(28CS)  12(28CS)   
8   8(30CS,8H)  10(30CS)  10(30CS)  10(30CS)  2(30CS,32T)  10(30CS)  10(30CS)   
9   0(40CS,8H)   0(40CS)   0(40CS)   0(40CS)      0(40CS)   0(40CS)   0(40CS)   

  3/10/2019 3/17/2019 3/24/2019 3/31/2019  4/7/2019  
0   0(80CS)   0(80CS)   0(80CS)   0(80CS)   0(80CS)  
1   0(50CS)   0(50CS)   0(50CS)   0(50CS)   0(50CS)  
2   0(40CS)   0(40CS)   0(40CS)   0(40CS)   0(40CS)  
3   0(40CS)   0(40CS)   0(40CS)   0(40CS)   0(40CS)  
4   0(40CS)   0(40CS)   0(40CS)   0(40CS)   0(40CS)  
5   0(40CS)   0(40CS)   0(40CS)   0(40CS)   0(40CS)  
6  15(25CS)  15(25CS)  15(25CS)  20(20CS)  20(20CS)  
7  12(28CS)  12(28CS)  12(28CS)  12(28CS)  12(28CS)  
8  10(30CS)  10(30CS)  10(30CS)  10(30CS)  10(30CS)  
9   0(40CS)   0(40CS)   0(40CS)   0(40CS)   0(40CS)


avail_col = ['1/20/2019',
   '1/27/2019', '2/3/2019', '2/10/2019', '2/17/2019', '2/24/2019',
   '3/3/2019', '3/10/2019', '3/17/2019', '3/24/2019', '3/31/2019',
   '4/7/2019']

##changing the data type of selected columns
for i in avail_col:
    avail_dat[i] = avail_dat[i].astype(str).apply(lambda x: x.split('(')[0])
    avail_dat[i] = avail_dat[i].str.replace('-','0')
    avail_dat[i] = avail_dat[i].astype(float)


or_str = ''
for i in avail_col:
    or_str = "(avail_dat['"+i+"'] >= 24) | "
    print(or_str)

Apparently I can't pass the variable to data frame to filter or I don't know how to do that yet, So I copy paste the printed statement to the below code to filter the data frame

 avail_dat = avail_dat[(avail_dat['1/20/2019'] >= 24) | 
(avail_dat['1/27/2019'] >= 24) | 
(avail_dat['2/3/2019'] >= 24) | 
(avail_dat['2/10/2019'] >= 24) | 
(avail_dat['2/17/2019'] >= 24) | 
(avail_dat['2/24/2019'] >= 24) | 
(avail_dat['3/3/2019'] >= 24) | 
(avail_dat['3/10/2019'] >= 24) | 
(avail_dat['3/17/2019'] >= 24) | 
(avail_dat['3/24/2019'] >= 24) | 
(avail_dat['3/31/2019'] >= 24) | 
(avail_dat['4/7/2019'] >= 24)
 ]

Is there a way that I can pass a variable instead of copy pasting every time ?

Related Questions