Splitting in pandas a timestamp date

Asked by At

I have a question about a pandas issue:

So I have a dataframe that looks like the following:

timestamp     user     exercises
2018-01-01    John         7
2018-01-01    Mary         9
2018-02-01    John         3
2018-02-01    Mary         2
2018-03-01    John         1
2018-03-01    Mary         5
2019-01-01    John         3
2019-01-01    Mary         4
2019-02-01    John         2
2019-02-01    Mary         5
2020-01-01    John         6
2020-01-01    Mary         2
2020-02-01    John         1
2020-02-01    Mary         2

And I need to get an output dataframe which is a subset of the given one, but it must only keep the data for the year 2018, like this:

    timestamp     user     exercises
    2018-01-01    John         7
    2018-01-01    Mary         9
    2018-02-01    John         3
    2018-02-01    Mary         2
    2018-03-01    John         1
    2018-03-01    Mary         5

Any ideas on how could I get this output dataframe from the given dataframe?

Thank you very much in advance.

Any help will be appreciated.

5 Answers

0
Damanpreet kaur On
 import pandas as pd

 /* Convert the date column to Datetime format */

 data['DATE'] = pd.to_datetime(data['DATE'])

 /* Create mask for the required condition */

 mask = data['DATE'] <= '31-12-2018'

 /* apply mask to the data */

 data = data.loc[mask]

Try something like this and let me know if this helps.

0
Erfan On

Use Series.dt.year to select only the year 2018:

# df['timestamp'] = pd.to_datetime(df['timestamp'])

df_new = df[df['timestamp'].dt.year == 2018]

print(df_new)
   timestamp  user  exercises
0 2018-01-01  John          7
1 2018-01-01  Mary          9
2 2018-02-01  John          3
3 2018-02-01  Mary          2
4 2018-03-01  John          1
5 2018-03-01  Mary          5
1
sentence On

Try:

import pandas as pd
import datetime as dt

df = pd.DataFrame({"timestamp": ['2018-01-01',
                                 '2018-01-01',
                                 '2019-01-01',
                                 '2020-01-01'],
                   "user": ['john', 'mary', 'john', 'mary'],
                   'exercises': [7,9,3,2]},)


df['timestamp'] = pd.to_datetime(df['timestamp'])

df[df['timestamp'].dt.year == 2018]

input

    timestamp   user    exercises
0   2018-01-01  john    7
1   2018-01-01  mary    9
2   2019-01-01  john    3
3   2020-01-01  mary    2

output

timestamp   user    exercises
0   2018-01-01  john    7
1   2018-01-01  mary    9
0
âńōŋŷXmoůŜ On

If you are fond of lambdas, you can use below:

if timestamp is string:

df.loc[lambda df: df.timestamp.str[:4] == '2018']

if timestamp is date:

df.loc[lambda df: (pd.to_datetime(df.timestamp)).dt.year == 2018]
0
blalterman On

Is your index is a DatetimeIndex? If so, you can call data.loc["2018"]. Internally, pandas will treat "2018" as the year 2018 and, because .loc slicing is inclusive on both edges, select all data in that year.