I have a dataframe with transactions and another dataframe with employees who are assigned to those transactions. Each transaction can have 0...N assignments. I would like to have one dataframe which consists of transaction ids and all of the employee assignments in separate columns. Please see example below:
I have a one dataframe as follows:
TransactionIds | Other_Columns.. |
---|---|
T1 | Cell 2...These don't matter |
T2 | Cell 4...These don't matter |
T3 | Cell 4...These don't matter |
I have another dataframe as follows:
TransactionIds | Assignments |
---|---|
T1 | Assignment1 |
T1 | Assignment2 |
T1 | Assignment3 |
T2 | Assignment3 |
T2 | Assignment4 |
T3 | Assignment6 |
T4 | NULL |
I would like to have a dataframe which looks like as follows:
TransactionIds | Assignment1 | Assignment2 | Assignment3 | AssignmentN |
---|---|---|---|---|
T1 | Assignment1 | Assignment2 | Assignment3 | NULL |
T2 | Assignment3 | Assignment4 | NULL | NULL |
T3 | NULL | NULL | NULL | NULL |
I tried group by and then agg() function. However, it gives me a list which I don't know how to convert to columns. Another problem with this approach is I wouldn't know how many columns to convert this list into. I would like to dynamically determine/create the number of assignment columns from the SELECT.
I was able to resolve this by doing the following: