Efficient implementation of the GTIN-13 algorithm

Question

Efficient implementation of the GTIN-13 algorithm

174 views Asked by Merik At 30 July 2018 at 20:39

I am looking for an efficient way to implement the GTIN-13 check digit algorithm. I have looked at some relevant SO posts such as this and this but it seems like efficiency was not the subject of attention in either case.

Briefly, the algorithm takes a numeric string (such as 123765) and multiples every other digit (from right to left) by 1 or 3 to calculate a sum (so 5 * 1 + 6 * 3 + 7 * 1 + 3 * 3 + 2 * 1 + 1 * 3 = 44) and then subtracts this sum from the closest multiple of 10 that is equal to or greater to this sum (in this case 50 - 44 = 6) to derive the final check digit (here, 6). The input is expected to be 12 digits long, but if shorter, it can be simply padded with zeros from the left (so 123765 is really expected as 000000123765) but the result will be still the same.

A naive implementation of this would be as follows:

gtin13 <- function(n) {
  s <- as.character(n)
  check.sum <- 0
  for (i in 1:nchar(s)) {
    digit <- substr(s, nchar(s) - i + 1, nchar(s) - i + 1)
    check.sum <- check.sum + as.numeric(digit) * ifelse(i %% 2, 1, 3)
  }
  10 - check.sum %% 10
}

However, this is inefficient because of the for loop as well as the conversion to a string and back to a number. For instance:

df <- data.frame(
  num <- sample(1:1000000, 100000, T)
)
system.time(cd <- vapply(df$num, gtin13, 0))

Take about 6 seconds on an average desktop.

What is a more efficient to calculate this check.sum?

Original Q&A

There are 3 answers

Emil On 30 July 2018 at 21:00

We can do much better. If we operate on integers instead of characters, we see a great gain in efficiency:

gtim13Challenger <- function(n) {
    n <- as.integer(n)
    len <- as.integer(ceiling(log10(n)))
    digs <- n %/% as.integer(10^(0L:(len - 1L))) %% 10L
    if (len > 1L)
        digs[seq.int(2L,len,2L)] <- digs[seq.int(2L,len,2L)] * 3L
    10L - sum(digs) %% 10L
}

system.time(cd <- vapply(df$num, gtim13, 0))
user  system elapsed 
6.15    0.00    6.16 

system.time(cd2 <- vapply(df$num, gtim13Challenger, 0L))
user  system elapsed 
0.76    0.00    0.76 

all.equal(cd, cd2)
[1] TRUE

F. Privé On 30 July 2018 at 22:35

Using Rcpp:

#include <Rcpp.h>
using namespace Rcpp;

int gtim13_cpp(int x) {

  int r, sum = 0, coeff = 1;
  while (x != 0) {
    r = x % 10;
    sum += coeff * r;
    coeff = 4 - coeff;  // 3 <--> 1
    x /= 10;
  }

  return 10 - (sum % 10);
}

// [[Rcpp::export]]
IntegerVector gtim13_all_cpp(IntegerVector x) {

  int n = x.size();
  IntegerVector res(n);
  for (int i = 0; i < n; i++) {
    res[i] = gtim13_cpp(x[i]);
  }

  return res;
}


/*** R
gtim13_all_cpp(123765)

gtin13 <- function(n) {
  s <- as.character(n)
  check.sum <- 0
  for (i in 1:nchar(s)) {
    digit <- substr(s, nchar(s) - i + 1, nchar(s) - i + 1)
    check.sum <- check.sum + as.numeric(digit) * ifelse(i %% 2, 1, 3)
  }
  10 - check.sum %% 10
}
df <- data.frame(
  num <- sample(1:1000000, 100000, T)
)
system.time(cd <- vapply(df$num, gtin13, 0))
system.time(cd3 <- gtim13_all_cpp(df$num))
all.equal(cd3, cd)
*/

Results:

> system.time(cd <- vapply(df$num, gtin13, 0))
   user  system elapsed 
  4.105   0.001   4.105 

> system.time(cd3 <- gtim13_all_cpp(df$num))
   user  system elapsed 
  0.004   0.000   0.003 

> all.equal(cd3, cd)
[1] TRUE

**MrFlick** · Accepted Answer · 2018-07-30T21:03:34+00:00

This version doesn't need the vapply so it's faster because we don't loop over the number of possible digits in R. For example

gtim13_vec <- function(x) {
  d <- x %% 10
  for(i in 1:12) { # Input can be up to 12 digits
    d <- d +(x%/% 10^i %% 10) * c(1,3)[1+i%%2]
  }
  d
  10-(d%%10)
}

I used set.seed(7) for this experiment. I see

system.time(r1 <- vapply(df$num, gtim13, 0))
#    user  system elapsed 
#    3.21    0.00    3.36 
system.time(r2 <- gtim13_vec(df$num))
#    user  system elapsed 
#    0.03    0.00    0.03 
all(r1==r2)
# [1] TRUE

So there's a big speed improvement.

TechQA.

Efficient implementation of the GTIN-13 algorithm

There are 3 answers

Related Questions in R

Related Questions in PERFORMANCE

Related Questions in CHECKSUM

Related Questions in CHECK-DIGIT

Popular Questions

Popular Tags

Trending Questions