How to Sum Amounts Column Based On Conditions from Categorical Column in R

131 views Asked by At

I have a dataframe in r called house_expenses that looks like this (2 columns: DESCRIPTION and AMOUNT):

DESCRIPTION             AMOUNT
-----------            ---------
COUCH                    $801.713

TV                       $4999.996

TV_MOUNT                 $575.867

ENTERTAINMENT_SYSTEM     $1102.392

MATTRESS                 $1225.893

BEDFRAME                 $356.789

PILLOWS                  $528.989

I would like to create two additional columns to the dataframe that has the sums and is rounded to 2 decimal places:

  1. LIVING_ROOM_COSTS = sum(round(COUCH, TV, TV_MOUNT, ENTERTAINMENT_SYSTEM), =2)
  2. BEDROOM_COSTS = sum(round(MATTRESS, BEDFRAME, PILLOWS), =2)

I have tried doing

house_expenses  <- house_expenses %>%

                   group_by(DESCRIPTION) %>%

                   mutate(LIVING_ROOM_COSTS  = sum(round(DESCRIPTION == "COUCH" &
                                                         DESCRIPTION == "TV" &
                                                         DESCRIPTION == "TV_MOUNT" &
                                                         DESCRIPTION == "ENTERTAINMENT_SYSTEM" , digits = 2)),
                    mutate(BEDROOM_COSTS = sum(round(DESCRIPTION == "MATTRESS" &
                                                     DESCRIPTION == "BEDFRAME" &
                                                     DESCRIPTION == "PILLOWS", digits = 2)))

But unfortunately this hasn't worked. Had anyone come across this before and know how to approach this problem?

1

There are 1 answers

0
AudioBubble On

To get the solution you want you have to do some subsetting, Description %in% c("COUCH", "TV","TV_MOUNT","ENTERTAINMENT_SYSTEM") Gets you the TRUE or FALSE according to the row, then you subset AMOUNT AMOUNT[Description %in% c("COUCH", "TV","TV_MOUNT","ENTERTAINMENT_SYSTEM")]

Then you wrap the values in a sum and round it:

df$LIVING_ROOM_COSTS = with(df, round(sum(Amount[Description %in% c("COUCH", "TV","TV_MOUNT","ENTERTAINMENT_SYSTEM")]), 2))
df$BEDROOM_COSTS = with(df, round(sum(Amount[Description %in% c("MATRESS", "BEDFRAME","PILLOWS")]), 2))

This gives us the data.frame of:

           Description   Amount LIVING_ROOM_COSTS BEDROOM_COSTS
1                COUCH  801.713           7479.97        885.78
2                   TV 4999.996           7479.97        885.78
3             TV_MOUNT  575.867           7479.97        885.78
4 ENTERTAINMENT_SYSTEM 1102.392           7479.97        885.78
5             MATTRESS 1225.893           7479.97        885.78
6             BEDFRAME  356.789           7479.97        885.78
7              PILLOWS  528.989           7479.97        885.78

Using with allows us to refer to column names without using $

The reason there wasn't an answer sooner enough is because the formatting given required extra work and humans are generally lazy.

If you had formatted your data.frame like this:

           Description   Amount
1                COUCH  801.713
2                   TV 4999.996
3             TV_MOUNT  575.867
4 ENTERTAINMENT_SYSTEM 1102.392
5             MATTRESS 1225.893
6             BEDFRAME  356.789
7              PILLOWS  528.989

Or like this using the function dput:

structure(list(Description = c("COUCH", "TV", "TV_MOUNT", "ENTERTAINMENT_SYSTEM", 
"MATTRESS", "BEDFRAME", "PILLOWS"), Amount = c(801.713, 4999.996, 
575.867, 1102.392, 1225.893, 356.789, 528.989)), class = "data.frame", row.names = c(NA, 
-7L))

It would have been answered swiftly.