Suppose that I have 2 dataframes d1 and d2 which can be generated using code below.
d1 = pd.DataFrame({'c1':['A', 'B', 'C', 'D', 'E', 'F'],
'c2': ['G', 'H', 'I', 'J', 'K', 'L'],
'val':[10, 20, 30, 40, 50, 60]})
d2 = pd.DataFrame({'c1':['A', 'B', 'C', 'D', 'E', 'F'],
'c2': ['H', 'H', 'I', 'J', 'L', 'K'],
'c1_found' : [1, 1, 1, 1, 1, 1],
'c2_found' : [1, 1, 1, 1, 1, 1]})
I want to create a column c1_c2_found by checking if both c1 and c2 combination exists in table d1.
I can achieve that using code below. Is there a more optimized method (vectorized approach) that I can use to solve this problem?
# Check if both 'c1' and 'c2' values in d1 exist in d2
merged_data = pd.merge(d2, d1, on=['c1', 'c2'], how='inner')
d2['c1_c2_found'] = d2.apply(lambda row: 1 if (row['c1'], row['c2']) in zip(merged_data['c1'], merged_data['c2']) else 0, axis=1)
IIUC you can do left merge on
d2:Prints: