Let's say I'm starting with the following input table below:
date | id | val1 | val2 | val3 | val4 | val5 | val_final |
---|---|---|---|---|---|---|---|
2023-08-29 | B3241 | 496C | |||||
2023-09-08 | A3290 | 349C | 078F | 274F | |||
2023-09-12 | D2903 | 349C | 072F | 307C | 170F | 201D | |
2023-09-14 | I13490 | 497C | 0349 | 303F | 101A |
The code to create the initial input table is below:
import pandas as pd
df = pd.DataFrame({'date':["2023-08-29","2023-09-08","2023-09-12", "2023-09-14"],'id':["B3241","A3290","D2903", "I13490"],'val1':["496C","349C","349C", "497C"], 'val2':["","078F","072F", "0349"], 'val3':["","274F","307C", "303F"], 'val4':["","","170F", "101A"], 'val5':["","","201D",""]})
I want to look at columns "val1" through "val5" and see which rows contain a value from my code list. I want to populate the "val_final" column accordingly (if the value is in the list).
code_list = ['349C', '303F', '201D', '497C']
If multiple columns contain values from code_list I want to pick the one that's on the right most column.
Given the above logic, my desired output table would look like this:
date | id | val1 | val2 | val3 | val4 | val5 | val_final |
---|---|---|---|---|---|---|---|
2023-08-29 | B3241 | 496C | |||||
2023-09-08 | A3290 | 349C | 078F | 274F | 349C | ||
2023-09-12 | D2903 | 349C | 072F | 307C | 170F | 201D | 201D |
2023-09-14 | I13490 | 497C | 349C | 303F | 101A | 303F |
I searched all over StackOverflow to try and solve this problem but to no avail. I assume doing if then else is an option starting with column "val5" and going down to column "val1" but I want to learn a more efficient way to do this.
IIUC you can use apply for each row with a function that returns the first value that exists in the code_list starting with the right-most columns:
Output: