Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)?

148 views Asked by At

I have encountered some issues while processing my dataset using Pandas DataFrame.

Here is my dataset:

enter image description here

My data types are displayed below:

enter image description here

My dataset is derived from:
MY_DATASET = pd.read_excel(EXCEL_FILE_PATH, index_col = None, na_values = ['NA'], usecols = "A, D")

  1. I would like to sum all values in the "NUMBER OF PEOPLE" column for each month in the "DATE" column. For example, all values in "NUMBER OF PEOPLE" column would be added as long as the value in the "DATE" column was "2020-01", "2020-02" ...
    However, I am stuck since I am unsure how to use the .groupby on partial match.

  2. After 1) is completed, I am also trying to convert the values in the "DATE" column from YYYY-MM-DD to YYYY-MMM, like 2020-Jan.
    However, I am unsure if there is such a format.

Does anyone know how to resolve these issues?

Many thanks!

3

There are 3 answers

1
BENY On BEST ANSWER

Check

s = df['NUMBER OF PEOPLE'].groupby(pd.to_datetime(df['DATE'])).dt.strftime('%Y-%b')).sum()
0
Iñigo González On

You can get an abbeviated month name using strftime('%b') but the month name will be all in lowercase:

df['group_time'] = df.date.apply(lambda x: x.strftime('%Y-%B')) 

If you need the first letter of the month in uppercase, you could do something like this:

df.group_date = df.group_date.apply(lambda x: f'{x[0:5]}{x[5].upper()}{x[6:]}'

# or in one step:

df['group_date']= df.date.apply(lambda x: x.strftime('%Y-%B')).apply(lambda x: f'{x[0:5]}
    ...: {x[5].upper()}{x[6:]}')   

Now you just need to .groupby and .sum():

result = df['NUMBER OF PEOPLE'].groupby(df.group_date).sum()
0
TropicalMagic On

I did some tinkering around and found that this worked for me as well:

enter image description here

enter image description here

Cheers all