Error in R with Months as levels. Is this a bug or logic flaw?

103 views Asked by At

Imagine a dataframe (this is an illustrative sample)

s <- c("January", "February", "March", "January", "March", "April")
t <- c(5, 3, 2, 3, 3, 7)
df1 <- as.data.frame(s)
df1[ , 2] <- t

Now for graphing purposes, I wanted to consolidate by month. If I write the code and then summarize:

 library(dplyr)
 df1$s <- factor(df1$s, levels = month.name)
 summary <- df1 %>% group_by(a) %>% summarize(Sales = sum(V2))

The outputs are correct but out of order:

April     7
February  3
January   8
March     5

However, if I do the following:

df1$s <- as.factor(df1$s)
levels(df1$s) <- c("January", "February", "March", "April")
Summary <- df1 %>% group_by(s) %>% summarize(Sales = sum(V2))

The output are:

January    7
February   3
March      8
April      5

The sums are wrong but order is correct. Why would this be?

It's like it organizes by month alphabetically then resorts the Month column without changing the other values.

1

There are 1 answers

0
jazzurro On

If you want to relevel factor, you can use the forcats package and manipulate factor order. As you see in the end of this post, your factor order was not in the month order. So, I used fct_relevel() to change the level and did the calculation.

library(dplyr)
library(forcats)

df1 %>%
mutate(s = fct_relevel(s, month.name[1:4])) %>%
group_by(s) %>%
summarise(Sales = sum(V2)) -> out

out

#             s Sales
#    <fctr> <dbl>
#1  January     8
#2 February     3
#3    March     5
#4    April     7

# Check level order

#levels(out$s)
#[1] "January"  "February" "March"    "April"

#levels(df1$s)
#[1] "April"    "February" "January"  "March"