Extract GTIN, LoT, SN and EXP from GS1 DataMatrix barcode

686 views Asked by At

I create software for pharmacies to validate drugs in NMVS. The program should work in such a way that I scan the drug code with a handheld scanner, click "Verify" and connect to NMVS. Most of the work is done, but to correctly verify the drug, I need to extract from the GTIN code (PC), batch number (LoT), serial number (SN) and expiry date (EXP)

Here are the scan results for the test drugs:

01059099913808231003ZP082117230831210XXFAE5AWA6RF8
0105909990054152101123926172207012162RB6FBN09
010590999109968821100322567773831721093010100013978
01059099907954202190EPCNT32ZH5581004032217250331
010590999032841321YCK3EB53CNZXD1725083110C48700
0105909990071029211165895472021010MU465417241031

I know that it's GS1 DataMatrix format and GTIN is prefixed with 01 (following 14 digits is GTIN), LoT with prefix 10 (following 1-20 alphanumeric characters is LoT), SN with prefix 21 (following 1-20 alphanumeric characters is LoT) LoT) and the expiry date is prefixed with 17 (following 6 digits is EXP).

For the given examples, I should have e.g.:

[
    {
        "gtin": "05909991380823",
        "lot": "03ZP08",
        "sn": "0XXFAE5AWA6RF8",
        "exp": "230831"
    },
    {
        "gtin": "05909990054152",
        "lot": "1123926",
        "sn": "62RB6FBN09",
        "exp": "220701"
    },
    {
        "gtin": "05909991099688",
        "lot": "100013978",
        "sn": "10032256777383",
        "exp": "210930"
    },
    {
        "gtin": "05909990795420",
        "lot": "040322",
        "sn": "90EPCNT32ZH558",
        "exp": "250331"
    },
    {
        "gtin": "05909990328413",
        "lot": "C48700",
        "sn": "YCK3EB53CNZXD",
        "exp": "250831"
    },
    {
        "gtin": "05909990071029",
        "lot": "10MU4654",
        "sn": "116589547202",
        "exp": "241031"
    }
]

The problem is that these sections can be in any order and of varying lengths. Only GTIN and EXP have a fixed length.

I created a regex to extract these sections: ^(?=.*01(\d{14}))(?=.*10([a-zA-Z0-9]{1,20}))(? =.*17(\d{6}))(?=.*21([a-zA-Z0-9]{1,20})).*$ but unfortunately it doesn't work properly. The client is written in Javascript (not in TS, exactly in AngularJS - yes, it's a legacy project, I'm trying to persuade the company to update it), and the server in Java.

I'm looking for any solution - whether it's a regex, library (Javascript or Java), external API - for this problem, personally I'm running out of ideas...

Also, I'll add that the handheld scanner I'm using is the Zebra DS2208.

I would appreciate any help on this topic.

EDIT:

I tried read barcode scanner output character by character, but I don't see a pattern. This is what I got:

Output with special characters

2

There are 2 answers

1
powermilk On BEST ANSWER

I did it! I noticed that GTIN and EXP are always extracted in proper way, so I tried something like this:

    const extractDataMatrix = (code) => {
        const response = {gtin: '', lot: '', sn: '', exp: ''};
        let responseCode = code;

        const prefixes = [
            {prefix: '01', key: 'gtin', length: 14},
            {prefix: '17', key: 'exp', length: 6}
        ];

        prefixes.forEach(({prefix, key, length}) => {
            const position = responseCode.indexOf(prefix);

            if (position !== -1) {
                const start = position + prefix.length;
                const end = start + length;

                response[key] = responseCode.substring(start, end);
                responseCode = responseCode.slice(0, position) + responseCode.slice(end);
            }
        });

        const lotAndSn = extractLotAndSn(responseCode);
        response.lot = lotAndSn.lot;
        response.sn = lotAndSn.sn;

        return response;
    };

    const extractLotAndSn = (responseCode) => {
        const pattern = /^(10.+?)(?=10|21)(21.+?)$|^(21.+?)(?=10|21)(10.+?)$/;
        const matches = responseCode.match(pattern);
        if (!matches) return {lot: '', sn: ''};

        const [lot1, sn1, sn2, lot2] = matches.slice(1);
        const lot = (lot1 || lot2 || '').substring(2);
        const sn = (sn1 || sn2 || '').substring(2);
        return checkLotAndSn(lot, sn, responseCode);
    };

    const checkLotAndSn = (lot, sn, responseCode) => {
        if (responseCode.includes("1010") && !responseCode.includes("10100")) {
            const isLotStart = lot.startsWith("10");
            if (isLotStart) {
                lot = lot.slice(2);
                sn += "10";
            }
        } else if (responseCode.includes("2121") && responseCode.includes("21210")) {
            const isSnStart = sn.startsWith("21");
            if (isSnStart) {
                sn = sn.slice(2);
                lot += "21";
            }
        }

        return {lot, sn};
    };

I think it can be optimized anyway, but for now I don't care ;).

What is going on?

  1. In extractDataMatrix I check prefixes with fixed length (GTIN and EX).
  2. After forEach I remove it from responseCode.
  3. I pass this responseCode to extractLotAndSn() function.
  4. I this function I used regex to get lot and sn.
  5. Sometimes I got code, where prefix is a part of previous sequence (ex. 1010 and 2121), so checkLotAndSn() function remove prefix from start of sequence and add it to previous sequence.
  6. I don't know why in case where 10100 and 21210 is part of responseCode everything is splitted correctly, so I excluded it from swap.
0
Arsany Benyamine On

As For GS1 specs (https://www.gs1.org/sites/default/files/docs/barcodes/WR14-221_GSCN_Software%20Version_1Jun2015.pdf), Fields with variable length should be appended FNC1.

Sometimes Barcode Scanners translate it as GS (Group Separator - decimal 29).

However, you will not be able to view GS character in text editors because it has not symbolic representation in ASCII.

You can check the character existence by scanning the code in Linux text editors like vim or By capturing the onKeyPress event in browser.

GS or FNC1 is mandatory and crucial in Variable-size fields.