Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)?

Question

Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)?

160 views Asked by TropicalMagic At 29 August 2020 at 14:55

I have encountered some issues while processing my dataset using Pandas DataFrame.

Here is my dataset:

My data types are displayed below:

My dataset is derived from:
MY_DATASET = pd.read_excel(EXCEL_FILE_PATH, index_col = None, na_values = ['NA'], usecols = "A, D")

I would like to sum all values in the "NUMBER OF PEOPLE" column for each month in the "DATE" column. For example, all values in "NUMBER OF PEOPLE" column would be added as long as the value in the "DATE" column was "2020-01", "2020-02" ...
However, I am stuck since I am unsure how to use the .groupby on partial match.
After 1) is completed, I am also trying to convert the values in the "DATE" column from YYYY-MM-DD to YYYY-MMM, like 2020-Jan.
However, I am unsure if there is such a format.

Does anyone know how to resolve these issues?

Many thanks!

Original Q&A

There are 3 answers

Iñigo González On 29 August 2020 at 16:09

You can get an abbeviated month name using strftime('%b') but the month name will be all in lowercase:

df['group_time'] = df.date.apply(lambda x: x.strftime('%Y-%B'))

If you need the first letter of the month in uppercase, you could do something like this:

df.group_date = df.group_date.apply(lambda x: f'{x[0:5]}{x[5].upper()}{x[6:]}'

# or in one step:

df['group_date']= df.date.apply(lambda x: x.strftime('%Y-%B')).apply(lambda x: f'{x[0:5]}
    ...: {x[5].upper()}{x[6:]}')

Now you just need to .groupby and .sum():

result = df['NUMBER OF PEOPLE'].groupby(df.group_date).sum()

TropicalMagic On 30 August 2020 at 03:58

I did some tinkering around and found that this worked for me as well:

Cheers all

**BENY** · Accepted Answer · 2020-08-29T14:57:51+00:00

BENY On 29 August 2020 at 14:57 BEST ANSWER

Check

s = df['NUMBER OF PEOPLE'].groupby(pd.to_datetime(df['DATE'])).dt.strftime('%Y-%b')).sum()

TechQA.

Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)?

There are 3 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in DATE

Popular Questions

Trending Questions