I have a Pandas data frame with several columns that together make up a unique identifier. I want to write a generic test case that allows me to concatenate those columns together into a single column (uid
) and test that column for uniqueness. I have the following code as a non-generic test case:
import pandas as pd
import pytest
df = pd.DataFrame(columns=['one', 'two', 'three'])
df.one = 'abc', 'def', 'ghi'
df.two = 'jkl', 'mno', 'pqr'
df.three = 'stu', 'vwx', 'yzz'
# Test one
df['uid'] = df.one + df.two
assert len(df.index) == len(df.drop_duplicates(['uid']).index)
# Test two
df['uid'] = df.one + df.three
assert len(df.index) == len(df.drop_duplicates(['uid']).index)
Since i will be reusing this, i need a solution that allows for a selection of arbitrary columns to be concatenated together, for example in lines 8 and 11.
Suppose you want to select columns
two
andthree
to add:Use
sum(axis=1)
to concatenate these columns: