Concatenate a list of series into a uid

125 views Asked by At

I have a Pandas data frame with several columns that together make up a unique identifier. I want to write a generic test case that allows me to concatenate those columns together into a single column (uid) and test that column for uniqueness. I have the following code as a non-generic test case:

import pandas as pd
import pytest
df = pd.DataFrame(columns=['one', 'two', 'three'])
df.one = 'abc', 'def', 'ghi'
df.two = 'jkl', 'mno', 'pqr'
df.three = 'stu', 'vwx', 'yzz'
# Test one
df['uid'] = df.one + df.two
assert len(df.index) == len(df.drop_duplicates(['uid']).index)
# Test two
df['uid'] = df.one + df.three
assert len(df.index) == len(df.drop_duplicates(['uid']).index)

Since i will be reusing this, i need a solution that allows for a selection of arbitrary columns to be concatenated together, for example in lines 8 and 11.

1

There are 1 answers

2
Vidhya G On BEST ANSWER

Suppose you want to select columns two and three to add:

col_to_add = ['two', 'three']

Use sum(axis=1) to concatenate these columns:

df['uid'] = df[col_to_add].sum(axis=1)