I am trying to split my data into 3 parts based on 3 columns, and then want to spread the data for further processing. However, when I split using 2 columns, the code works. It doesn't work for 3 columns. This is built upon the discussion at How can I spread repeated measures of multiple variables into wide format?
Here's my data:
structure(list(Zone = c("East", "East", "East", "East", "East",
"East", "East", "West", "West", "West", "West", "West", "West",
"West"), Fiscal.Year = c(2016, 2016, 2016, 2016, 2016, 2016,
2017, 2016, 2016, 2016, 2017, 2017, 2018, 2018), Transaction.ID = c(132,
133, 134, 135, 136, 137, 171, 171, 172, 173, 175, 176, 177, 178
), L.Rev = c(3, 0, 0, 1, 0, 0, 2, 1, 1, 2, 2, 1, 2, 1), L.Qty = c(3,
0, 0, 1, 0, 0, 1, 1, 1, 2, 2, 1, 2, 1), A.Rev = c(0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 1, 0, 0), A.Qty = c(0, 0, 0, 2, 2, 3, 0,
0, 0, 0, 0, 3, 0, 0), I.Rev = c(4, 4, 4, 0, 1, 0, 3, 0, 0, 0,
1, 0, 1, 1), I.Qty = c(2, 2, 2, 0, 1, 0, 3, 0, 0, 0, 1, 0, 1,
1)), .Names = c("Zone", "Fiscal.Year", "Transaction.ID", "L.Rev",
"L.Qty", "A.Rev", "A.Qty", "I.Rev", "I.Qty"), row.names = c(NA,
14L), class = "data.frame")
Here's the code that works:
Input_File %>%
gather(Rev_Qty,Value, L.Rev:I.Qty) %>%
separate(Rev_Qty, into=c("L.A","Rev.Qty")) %>%
split(.,list(.$Zone,.$Rev.Qty)) %>%
#Ideally, I want three-way split--i.e. Fiscal.Year, Zone and Rev.Qty
purrr::map(~unite(.,LAType.Rev.Qty, L.A, Rev.Qty, sep = ".")) %>%
purrr::map(~spread_(.,key_col = "LAType.Rev.Qty", value_col = "Value"))
This works nicely--i.e. I get a list of length 4 that I can use for further processing.
However, the following code doesn't work when I apply three-way split based on Rev.Qty
; Zone
and Fiscal.Year
.
Input_File %>%
gather(Rev_Qty,Value, L.Rev:I.Qty) %>%
separate(Rev_Qty, into=c("L.A","Rev.Qty")) %>%
#Now split the data based on zone, Rev vs. Qty and year--DOESN'T WORK
split(.,list(.$Zone,.$Rev.Qty,.$Fiscal.Year)) %>%
purrr::map(~unite(.,LAType.Rev.Qty, L.A, Rev.Qty, sep = ".")) %>%
purrr::map(~spread_(.,key_col = "LAType.Rev.Qty", value_col = "Value"))
I get the following error:
Error in enc2utf8(col_names(col_labels, sep = sep)) :
argumemt is not a character vector
Upon debugging, I found that the code executes well until the unite().
It breaks as soon as I call spread_()
.
Expected Output: If we run the code until unite()
, we will see that we will get a list of length 12. Expected output would be this list after the application of spread on LAType.Rev.Qty
and Value
columns. I hope this clarifies the expected output.
Can someone please help me what's going on? I am a beginner, and I've no clue what's happening.
We need
drop=TRUE
insplit
to remove the combinations that do not exist in the dataset