Decoding GS1 string using R

105 views Asked by At

In a dataframe, one column includes a GS1 code scanned from barcodes. A GS1 code is a string including different types of information. Application Identifiers (AI) indicate what type of information the next part of the string is. Here is an example of a GS1 string: (01)8714729797579(17)210601(10)23919374 the AI is indicated between brackets. In this case (01) means 'GTIN', (17) means 'Expiration Date' and (10) means 'LOT'. What I like to do in R is create three different columns from the single column, using the AI as the new column names.

I tried using 'separate', but the brackets aren't removed. Why aren't the brackets removed?

df <- data.frame(id =c(1, 2, 3), CODECONTENT = c("(01)871(17)21(10)2391", "(01)579(17)26(10)9374", "(01)979(17)20(10)9193"))

df <- df %>% separate(CODECONTENT, c("GTIN", "Expiration_Date"), "(17)", extra = "merge") %>%
  separate(Expiration_Date, c("Expiration Date", "LOT"), "(10)", extra = "merge") 

The above returns the following:

id GTIN Expiration Date LOT
1 1 (01)871( )21( )2391
2 2 (01)579( )26( )9374
3 3 (01)979( )20( )9193

I am not sure why the brackets are still there. Besides removing the bracket would there be a smarter way to also remove the first AI (01) in the same code?

1

There are 1 answers

0
Nicolás Velasquez On

Because the parenthesis symbols are special characters, you need to tell the regex to treat them literally. One option is to surround them in square brackets.

df %>% 
    separate(col = CODECONTENT, 
     sep = "[(]17[)]", 
     into = c("gtin", "expiration_date")) %>% 
    separate(expiration_date, 
     sep = "[(]10[)]",
     into = c("expiration_date", "lot"),
     extra = "merge")


  id    gtin expiration_date  lot
1  1 (01)871              21 2391
2  2 (01)579              26 9374
3  3 (01)979              20 9193