import pandas as pd
df = pd.DataFrame(
[
['China', 'L', '08/06/2022 20:00', '08/10/2022 20:00'],
['China', 'L', '8/13/2022 00:54', '8/14/2022 00:54'],
['China', 'M', '8/14/2022 00:54', '8/14/2022 12:54'],
['United Kingdom', 'L', '8/27/2022 06:36', '8/31/2022 21:08'],
['United Kingdom', 'L', '9/01/2022 21:08', '09/02/2022 21:38'],
['China', 'D', '09/04/2022 21:38', '09/06/2022 21:38']
],
columns=['Country', 'Function', 'Arrival', 'Departure']
)
In this case, i want to remove the consistent duplicate rows and replace the departure time with the last duplicates value, with below two conditions:
- do not remove other duplicates that are not in consistent manner.
- if the 'Function' column changed, do not take it as duplicate even it is in consistent manner.
So it should look like this:
df = pd.DataFrame(
[
['China', 'L', '08/06/2022 20:00', '8/14/2022 00:54'],
['China', 'M', '8/14/2022 00:54', '8/14/2022 12:54'],
['United Kingdom', 'L', '8/27/2022 06:36', '09/02/2022 21:38'],
['China', 'D', '09/04/2022 21:38', '09/06/2022 21:38']
],
columns=['Country', 'Function', 'Arrival', 'Departure']
)
You can use
groupby.idxmax:output: