I have a data table of the form
ID REGION INCOME_BAND RESIDENCY_YEARS
1 SW Under 5,000 10-15
2 Wales Over 70,000 1-5
3 Center 15,000-19,999 6-9
4 SE 15,000-19,999 15-19
5 North 15,000-19,999 10-15
6 North 15,000-19,999 6-9
created by
exp = data.table(
ID = c(1,2,3,4,5,6),
REGION=c("SW", "Wales", "Center", "SE", "North", "North"),
INCOME_BAND = c("Under ?5,000", "Over ?70,000", "?15,000-?19,999", "?15,000-?19,999", "?15,000-?19,999","?15,000-?19,999"),
RESIDENCY_YEARS = c("10-15","1-5","6-9","15-19","10-15", "6-9"))
I would like to transform this to
I've managed to perform the majority of the work with dcast:
exp.dcast = dcast(exp,ID~REGION+INCOME_BAND+RESIDENCY_YEARS, fun=length,
value.var=c('REGION', 'INCOME_BAND', 'RESIDENCY_YEARS'))
However I need some help creating sensible column headings. Currently I have
["ID"
"REGION.1_Center_?15,000-?19,999_6-9"
"REGION.1_North_?15,000-?19,999_10-15"
"REGION.1_North_?15,000-?19,999_6-9"
"REGION.1_SE_?15,000-?19,999_15-19" "REGION.1_SW_Under ?5,000_10-15" "REGION.1_Wales_Over ?70,000_1-5"
"INCOME_BAND.1_Center_?15,000-?19,999_6-9"
"INCOME_BAND.1_North_?15,000-?19,999_10-15"
"INCOME_BAND.1_North_?15,000-?19,999_6-9"
"INCOME_BAND.1_SE_?15,000-?19,999_15-19"
"INCOME_BAND.1_SW_Under ?5,000_10-15"
"INCOME_BAND.1_Wales_Over ?70,000_1-5"
"RESIDENCY_YEARS.1_Center_?15,000-?19,999_6-9" "RESIDENCY_YEARS.1_North_?15,000-?19,999_10-15" "RESIDENCY_YEARS.1_North_?15,000-?19,999_6-9"
"RESIDENCY_YEARS.1_SE_?15,000-?19,999_15-19"
"RESIDENCY_YEARS.1_SW_Under ?5,000_10-15"
"RESIDENCY_YEARS.1_Wales_Over ?70,000_1-5"
And I would like the column headings to be
ID SW Wales Center SE North Under 5,000 Over 70,000 15,000-19,999 1-5 6-9 10-15 15-19
Could anybody advise?
This apparently simple question is not easy to answer. So, we will go forward step-by step.
First, the OP has tried to reshape multiple value columns simultaneously which creates an unwanted cross product of all available combinations.
In order to treat all values in the same way, we need to
melt()
all value columns first before reshaping:Now, the result has 13 columns instead of 19 and the columns are named by the respective value as requested.
Unfortunately, the columns appear in the wrong order because they alphabetically ordered. There are two approaches to change the order:
Change order of columns after reshaping
The
setcolorder()
function reorders the columns of adata.table
in place, e.g. without copying:Now, all
REGION
columns appear first, followed byINCOME_BAND
andRESIDENCY_YEARS
columns in the specified order.Set factor levels before reshaping
If
value
is turned into a factor with appropriately ordered factor levelsdcast()
will use the factor levels for ordering the columns:Set factor levels before reshaping - lazy version
If it is sufficient to have the columns grouped by
REGION
,INCOME_BAND
, andRESIDENCY_YEARS
then we can use a short cut to avoid specifying each value incol_order
. Thefct_inorder()
function from theforcats
package reorders factor levels by their first appearance in a vector:This works because the output of
melt()
is ordered byvariable
: