I have a Pandas DataFrame with the following format:
In [0]: df
Out[0]:
col1 col2 date
0 1 1 2015-01-01
1 1 2 2015-01-09
2 1 3 2015-01-10
3 2 1 2015-02-10
4 2 2 2015-02-10
5 2 3 2015-02-25
In [1]: df.dtypes
Out[1]:
col1 int64
col2 int64
date datetime64[ns]
dtype: object
We want to find the value for col2
corresponding to the greatest difference in date (between consecutive elements in the sorted-by-dates groups), grouped by col1
. Assume there are no groups of size 1.
Desired Output
In [2]: output
Out[2]:
col1 col2
1 1 # This is because the difference between 2015-01-09 and 2015-01-01 is the greatest
2 2 # This is because the difference between 2015-02-25 and 2015-02-10 is the greatest
The real df
has many values for col1
that we need to groupby to do calculations. Is this possible by applying a function to the following? Please note, the dates are already in ascending order.
gb = df.groupby(col1)
gb.apply(right_maximum_date_difference)
Here's something that's almost your dataframe (I avoided copying the dates):
With this, define:
and you have: