I have a dataset which looks like below:
File_no A B Date Batch State
0 1 2 3 23-1-2019 2 3
1 2 7 6 23-1-2019 2 4
2 3 9 2 24-1-2019 1 2
3 5 6 3 24-1-2019 2 3
4 6 4 3 24-1-2019 1 4
5 8 2 3 25-1-2019 1 4
I want to group the data columns 'A' and 'B' based on date and batch. And then do a shift of rows of these columns based on the sequence of file numbers. For instance, in the above dataframe File no 4 is missing.
I am able to achive the shift function, but I am not able to do it for every group individually.
For e.g: 6 & 8 files are not in sequence, but they are from different dates. So the shift should not be performed because it is missing a sequence.
diff = data['File_no'].diff().ne(1).cumsum()
grouped=data.groupby(['Date','Batch'])
grouped.apply(lambda data: data.groupby(diff)['A','B'].shift())
This performs a shift, whenever there is a missing sequence and doesn't consider the groups into consideration.
Expected output:
File_no A B Date Batch State
0 1 Nan Nan 23-1-2019 2 3
1 2 2 3 23-1-2019 2 4
2 3 9 2 24-1-2019 1 2
3 5 Nan Nan 24-1-2019 2 3
4 6 6 3 24-1-2019 1 4
5 8 2 3 25-1-2019 1 4
I think you can pass columns with series to one
groupby
:EDIT: