I am relatively new to R, so my apologies if this question is too basic.
I have transactions that show quantity sold and revenue earned from different products. Because there are three products, there are 2^3 = 8
combinations for selling these products in a "basket." Each basket could be sold in any of the three given years (2016, 2017, 2018) and in any of the zones (East and West). [I have 3 years worth of transactions for the two zones: East and West.]
My objective is to analyze how much revenue is earned, how many quantities are sold, and how many transactions occurred for each combination of these products in a given year for a given zone.
I was able to do the above operation (using purrr::map
) by splitting the data based on zones. I have created a list of two data frames that hold data grouped by "year" for each combination described above. This works well. However, the code is a little clunky in my opinion. There are a lot of repetitive statements. I want to be able to create a list of 2X3 (i.e. 2 zones and 3 years)
Here's my code using zone-wise splitting.
First Try
UZone <- unique(Input_File$Zone)
FYear <- unique(Input_File$Fiscal.Year)
#Split based on zone
a<-purrr::map(UZone, ~ dplyr::filter(Input_File, Zone == .)) %>%
#Create combinations of products
purrr::map(~mutate_each(.,funs(Exists = . > 0), L.Rev:I.Qty )) %>%
#group by Fiscal Year
purrr::map(~group_by_(.,.dots = c("Fiscal.Year", grep("Exists", names(.), value = TRUE)))) %>%
#Summarize, delete unwanted columns and rename the "number of transactions" column
purrr::map(~summarise_each(., funs(sum(., na.rm = TRUE), count = n()), L.Rev:I.Qty)) %>%
purrr::map(~select(., Fiscal.Year:L.Rev_count)) %>%
purrr::map(~plyr::rename(.,c("L.Rev_count" = "No.Trans")))
#Now do Zone and Year-wise splitting : Try 1
EastList<-a[[1]]
EastList <- EastList %>% split(.$Fiscal.Year)
WestList<-a[[2]]
WestList <- WestList %>% split(.$Fiscal.Year)
write.xlsx(EastList , file = "East.xlsx",row.names = FALSE)
write.xlsx(WestList , file = "West.xlsx",row.names = FALSE)
As you can see, the above code is very clunky. With limited knowledge of R, I researched https://blog.rstudio.org/2016/01/06/purrr-0-2-0/ and read purrr::map2()
manual but I couldn't find too many examples. After reading the solution at How to add list of vector to list of data.frame objects as new slot by parallel?, I am assuming that I could use X = zone and Y= Fiscal Year to do what I have done above.
Here's what I tried: Second Try
#Now try Zone and Year-wise splitting : Try 2
purrr::map2(UZone,FYear, ~ dplyr::filter(Input_File, Zone == ., Fiscal.Year == .))
But this code doesn't work. I get an error message that :
Error: .x (2) and .y (3) are different lengths
Question 1: Can I use map2
to do what I am trying to do? If not, is there any other better way?
Question 2: Just in case, we are able to use map2
, how can I generate two Excel files using one command? As you can see above, I have two function calls above. I'd want to have only one.
Question 3: Instead of two statements below, is there any way to do sum and count in one statement? I am looking for more cleaner ways to do sum and count.
purrr::map(~summarise_each(., funs(sum(., na.rm = TRUE), count = n()), L.Rev:I.Qty)) %>%
purrr::map(~select(., Fiscal.Year:L.Rev_count)) %>%
Can someone please help me?
Here's my data:
dput(Input_File)
structure(list(Zone = c("East", "East", "East", "East", "East",
"East", "East", "West", "West", "West", "West", "West", "West",
"West"), Fiscal.Year = c(2016, 2016, 2016, 2016, 2016, 2016,
2017, 2016, 2016, 2016, 2017, 2017, 2018, 2018), Transaction.ID = c(132,
133, 134, 135, 136, 137, 171, 171, 172, 173, 175, 176, 177, 178
), L.Rev = c(3, 0, 0, 1, 0, 0, 2, 1, 1, 2, 2, 1, 2, 1), L.Qty = c(3,
0, 0, 1, 0, 0, 1, 1, 1, 2, 2, 1, 2, 1), A.Rev = c(0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 1, 0, 0), A.Qty = c(0, 0, 0, 2, 2, 3, 0,
0, 0, 0, 0, 3, 0, 0), I.Rev = c(4, 4, 4, 0, 1, 0, 3, 0, 0, 0,
1, 0, 1, 1), I.Qty = c(2, 2, 2, 0, 1, 0, 3, 0, 0, 0, 1, 0, 1,
1)), .Names = c("Zone", "Fiscal.Year", "Transaction.ID", "L.Rev",
"L.Qty", "A.Rev", "A.Qty", "I.Rev", "I.Qty"), row.names = c(NA,
14L), class = "data.frame")
Output Format:
Here's the code to generate the output. I would love to see EastList.2016
and EastList.2017
as two sheets in one Excel file, and WestList.2016
, WestList.2017
and WestList.2018
as 3 sheets in one Excel file.
#generate the output:
EastList.2016 <- EastList[[1]]
EastList.2017 <- EastList[[2]]
WestList.2016 <- WestList[[1]]
WestList.2017 <- WestList[[2]]
WestList.2018 <- WestList[[3]]
Two lists broken down by year with sums and counts for each?
In dplyr: (df <- your dataframe)