I have the following dataframe:
Out[117]: mydata
author email ri oi
0 X1 NaN NaN 0000-0001-8437-498X
1 X2 NaN NaN NaN
2 X3 [email protected] K-5448-2012 0000-0001-8437-498X
3 X4 [email protected] NaN 0000-0001-8437-498X
4 X5 [email protected] NaN 0000-0001-8437-498X
where column ri represents an author's ResearcherID, and oi the ORCID. One author may has more than one email address, so column email has duplicates.
Firstly, I'm trying to fill na in ri if the corresponding rows in oi share the same value, using a non-NaN value in ri. The result I want is:
author email ri oi
0 X1 NaN K-5448-2012 0000-0001-8437-498X
1 X2 NaN NaN NaN
2 X3 [email protected] K-5448-2012 0000-0001-8437-498X
3 X4 [email protected] K-5448-2012 0000-0001-8437-498X
4 X5 [email protected] K-5448-2012 0000-0001-8437-498X
Secondly, merging emails and using the merged value to fill na in column email, if the values in ri (or oi) are identical. I want to get a dataframe like the following one:
author email ri oi
0 X1 [email protected];[email protected] K-5448-2012 0000-0001-8437-498X
1 X2 NaN NaN NaN
2 X3 [email protected];[email protected] K-5448-2012 0000-0001-8437-498X
3 X4 [email protected];[email protected] K-5448-2012 0000-0001-8437-498X
4 X5 [email protected];[email protected] K-5448-2012 0000-0001-8437-498X
I've tried the following code:
final_df = pd.DataFrame()
na_df = mydata[mydata.oi.isna()]
for i in set(mydata.oi.dropna()):
fill_df = mydata[mydata.oi == i]
fill_df.ri = fill_df.ri.fillna(method='ffill')
fill_df.ri = fill_df.ri.fillna(method='bfill')
null_df = pd.concat([null_df, fill_df])
final_df = pd.concat([final_df, na_df])
This code returned the one I want in the the frist step, but is there an elegent way to approach this? Furthermore, how to get the merged value in email and then use the merged value as an input in the process of filling na?
Try 2
transform. One for each column. Onri, usefirst. Onemail, use combination ofdropna,unique, andjoin