In a dataframe, one column includes a GS1 code scanned from barcodes. A GS1 code is a string including different types of information. Application Identifiers (AI) indicate what type of information the next part of the string is. Here is an example of a GS1 string: (01)8714729797579(17)210601(10)23919374 the AI is indicated between brackets. In this case (01) means 'GTIN', (17) means 'Expiration Date' and (10) means 'LOT'. What I like to do in R is create three different columns from the single column, using the AI as the new column names.
I tried using 'separate', but the brackets aren't removed. Why aren't the brackets removed?
df <- data.frame(id =c(1, 2, 3), CODECONTENT = c("(01)871(17)21(10)2391", "(01)579(17)26(10)9374", "(01)979(17)20(10)9193"))
df <- df %>% separate(CODECONTENT, c("GTIN", "Expiration_Date"), "(17)", extra = "merge") %>%
separate(Expiration_Date, c("Expiration Date", "LOT"), "(10)", extra = "merge")
The above returns the following:
id | GTIN | Expiration Date | LOT | |
---|---|---|---|---|
1 | 1 | (01)871( | )21( | )2391 |
2 | 2 | (01)579( | )26( | )9374 |
3 | 3 | (01)979( | )20( | )9193 |
I am not sure why the brackets are still there. Besides removing the bracket would there be a smarter way to also remove the first AI (01) in the same code?
Because the parenthesis symbols are special characters, you need to tell the regex to treat them literally. One option is to surround them in square brackets.