Efficient implementation of the GTIN-13 algorithm

176 views Asked by At

I am looking for an efficient way to implement the GTIN-13 check digit algorithm. I have looked at some relevant SO posts such as this and this but it seems like efficiency was not the subject of attention in either case.

Briefly, the algorithm takes a numeric string (such as 123765) and multiples every other digit (from right to left) by 1 or 3 to calculate a sum (so 5 * 1 + 6 * 3 + 7 * 1 + 3 * 3 + 2 * 1 + 1 * 3 = 44) and then subtracts this sum from the closest multiple of 10 that is equal to or greater to this sum (in this case 50 - 44 = 6) to derive the final check digit (here, 6). The input is expected to be 12 digits long, but if shorter, it can be simply padded with zeros from the left (so 123765 is really expected as 000000123765) but the result will be still the same.

A naive implementation of this would be as follows:

gtin13 <- function(n) {
  s <- as.character(n)
  check.sum <- 0
  for (i in 1:nchar(s)) {
    digit <- substr(s, nchar(s) - i + 1, nchar(s) - i + 1)
    check.sum <- check.sum + as.numeric(digit) * ifelse(i %% 2, 1, 3)
  }
  10 - check.sum %% 10
}

However, this is inefficient because of the for loop as well as the conversion to a string and back to a number. For instance:

df <- data.frame(
  num <- sample(1:1000000, 100000, T)
)
system.time(cd <- vapply(df$num, gtin13, 0))

Take about 6 seconds on an average desktop.

What is a more efficient to calculate this check.sum?

3

There are 3 answers

4
MrFlick On BEST ANSWER

This version doesn't need the vapply so it's faster because we don't loop over the number of possible digits in R. For example

gtim13_vec <- function(x) {
  d <- x %% 10
  for(i in 1:12) { # Input can be up to 12 digits
    d <- d +(x%/% 10^i %% 10) * c(1,3)[1+i%%2]
  }
  d
  10-(d%%10)
}

I used set.seed(7) for this experiment. I see

system.time(r1 <- vapply(df$num, gtim13, 0))
#    user  system elapsed 
#    3.21    0.00    3.36 
system.time(r2 <- gtim13_vec(df$num))
#    user  system elapsed 
#    0.03    0.00    0.03 
all(r1==r2)
# [1] TRUE

So there's a big speed improvement.

0
Emil On

We can do much better. If we operate on integers instead of characters, we see a great gain in efficiency:

gtim13Challenger <- function(n) {
    n <- as.integer(n)
    len <- as.integer(ceiling(log10(n)))
    digs <- n %/% as.integer(10^(0L:(len - 1L))) %% 10L
    if (len > 1L)
        digs[seq.int(2L,len,2L)] <- digs[seq.int(2L,len,2L)] * 3L
    10L - sum(digs) %% 10L
}

system.time(cd <- vapply(df$num, gtim13, 0))
user  system elapsed 
6.15    0.00    6.16 

system.time(cd2 <- vapply(df$num, gtim13Challenger, 0L))
user  system elapsed 
0.76    0.00    0.76 

all.equal(cd, cd2)
[1] TRUE
0
F. Privé On

Using Rcpp:

#include <Rcpp.h>
using namespace Rcpp;

int gtim13_cpp(int x) {

  int r, sum = 0, coeff = 1;
  while (x != 0) {
    r = x % 10;
    sum += coeff * r;
    coeff = 4 - coeff;  // 3 <--> 1
    x /= 10;
  }

  return 10 - (sum % 10);
}

// [[Rcpp::export]]
IntegerVector gtim13_all_cpp(IntegerVector x) {

  int n = x.size();
  IntegerVector res(n);
  for (int i = 0; i < n; i++) {
    res[i] = gtim13_cpp(x[i]);
  }

  return res;
}


/*** R
gtim13_all_cpp(123765)

gtin13 <- function(n) {
  s <- as.character(n)
  check.sum <- 0
  for (i in 1:nchar(s)) {
    digit <- substr(s, nchar(s) - i + 1, nchar(s) - i + 1)
    check.sum <- check.sum + as.numeric(digit) * ifelse(i %% 2, 1, 3)
  }
  10 - check.sum %% 10
}
df <- data.frame(
  num <- sample(1:1000000, 100000, T)
)
system.time(cd <- vapply(df$num, gtin13, 0))
system.time(cd3 <- gtim13_all_cpp(df$num))
all.equal(cd3, cd)
*/

Results:

> system.time(cd <- vapply(df$num, gtin13, 0))
   user  system elapsed 
  4.105   0.001   4.105 

> system.time(cd3 <- gtim13_all_cpp(df$num))
   user  system elapsed 
  0.004   0.000   0.003 

> all.equal(cd3, cd)
[1] TRUE