Construct sequences from a dataframe using dictionaries in Python

134 views Asked by At

I would like to construct sequences of user's purchasing history using dictionaries in Python. I would like these sequences to be ordred by date.

I have 3 columns in my dataframe:

users        items         date

1             1            date_1 
1             2            date_2
2             1            date_3
2             3            date_1
4             5            date_2
4             1            date_5
4             3            date_3

And the result should be like this :

{1: [[1,date_1],[2,date_2]], 2:[[3,date_1],[5,date_2],[1,date_3]], 4:[[5,date_2],[3,date_3][1,date_5]]}

My code is :

df_sub = df[['uid', 'nid', 'date']] 
dic3 = df_sub.set_index('uid').T.to_dict('list')

And my results are :

{36864: [258509L, '2014-12-03'], 548873: [502105L, '2015-09-08'], 42327: [492268L, '2015-01-29'], 548873: [370049L, '2015-02-18'], 36864: [258909L, '2016-01-13'] ... }

But I would like to group by users :

 {36864: [[258509L, '2014-12-03'],[258909L, '2016-01-13']], 548873: [[502105L, '2015-09-08'],[370049L, '2015-02-18']], 42327: [492268L, '2015-01-29'] }

Some help, please!

1

There are 1 answers

2
Nickil Maveli On BEST ANSWER

Firstly, set users as the index and perform groupby w.r.t that. Then, you could pass a function to sort each group by it's date column and extract it's underlying array part using .values.

Use .tolist to get back it's list equivalent. This gives you in the required format. Finally, use .to_dict to get your final output as a dictionary.

fnc = lambda x: x.sort_values('date').values.tolist()
df.set_index('users').groupby(level=0).apply(fnc).to_dict()

produces:

{1: [[1, 'date_1'], [2, 'date_2']],
 2: [[3, 'date_1'], [1, 'date_3']],
 4: [[5, 'date_2'], [3, 'date_3'], [1, 'date_5']]}