Match a column with a range of two columns of a loopup table

119 views Asked by At

I hope this hasn't been answered yet and that I search for the correct terms (since I am not sure how to express my problems in a few words).

So basically I have a dataframe with a column of so-called SIC-Codes. I also have a lookup table which classifies each SIC-Code into a certain category. Now, I want to match the SIC-Code with the category number. However, the lookup table only gives a range for the SIC-Codes, i.e. the rows contain:

  1. the classification number
  2. the lowest SIC-Code for this classification
  3. the highest SIC-Code for this classification.

If I have a SIC-Code like 1111, but the loopup table column 2 is 1000 and column 3 is 1500, a simple match function won't work. I have created an example for better understanding:

test <- as.data.frame(c(1012, 2010, 3545, 5550, 7068))
colnames(test) <- "SIC"
ind_num <- c(1, 3, 4, 5, 7, 10, 11, 12, 14, 15)
sic_low <-  c(0, 1010, 1012, 1050, 2000, 2005, 3500, 5550, 7050, 8000)
sic_high <- c(20, 1011, 1020, 1099, 2002, 2020, 3545, 5551, 7070, 8010)
LUPtable <- data.frame(ind_num, sic_low, sic_high)

test$new <- lapply(test$SIC, function(x) LUPtable$ind_num[match(x, LUPtable$sic_low)])

Thanks in advance!

2

There are 2 answers

1
CER On BEST ANSWER

Or similar

test$new <- lapply(test$SIC, function(x) LUPtable$ind_num[x>=LUPtable$sic_low & x<=LUPtable$sic_high])
0
MHammer On

You can iterate over each value to test with vapply() and grab the index of which low and high values it falls between.

LUPtable$ind_num[vapply(test$SIC, function(x) which(x >= LUPtable$sic_low & x <= LUPtable$sic_high), numeric(1))]