How can I package an electron app with OCR features?

1.1k views Asked by At

I built an electron app and ran into a situation where I need to add some OCR features. I figured out that people are mostly using tesseract.js but it's much slower than the native tesseract-ocr (factor or 10-20). That's why tesseract.js is no option for me. Alternatively I could just use one of the many tesseract node modules which are out there. In theory.

Problem: Those node modules require the compiled tesseract-ocr binaries (which are usually installed with apt or brew). I don't want my users to have to go through the extra setup hassle, I would rather like if I could just put everything into the electron installer.

Any ideas how I could get this done? I'm using electron-builder for packaging my app.

4

There are 4 answers

0
stoefln On BEST ANSWER

In the end I managed to get things working, with the help of a great dev mate. Here is node-native-ocr, and it works flawlessly with electron: https://www.npmjs.com/package/node-native-ocr

3
Kiran Maniya On

You are seeking the packaging of a native module packaging with an electron app for production. The general idea is to compile the native library for electron using electron-rebuild. You can refer to the native module building in electron docs or in electron-builder docs. You can set buildDependenciesFromSource: true to compile the native dependencies for a specific platform build while using electron-builder.

Here is the basic configurations you can take a look at,

"build": {
    "appId": "com.trinityinfosystem.electron.exchange.stream",
    "productName": "Accurate",
    "copyright": "Copyright © 2018 Trinity InfoSystem",
    "buildDependenciesFromSource": true,
    "mac": {
      "category": "public.app-category.utilities",
      "icon": "assets/icon.icns",
      "target": [
        "dmg"
      ]
    },
    "win": {
      "publisherName": "Trinity InfoSystem"
      "target": [
        "nsis"
      ]
    },
    "linux": {
      "target": [
        "AppImage",
        "tar.gz"
      ]
    },
    "dmg": {
      "background": "assets/background.png",
      "icon": "assets/icon.icns",
      "title": "Accurate Installer"
    },
    "nsis": {
      "oneClick": false,
      "perMachine": false,
      "allowToChangeInstallationDirectory": true
    }
  }
1
Nemzytch On

How did you go about this feature in the end ? We kinda faced a similar problem recently ( we have to do 10 ocr/seconde on small areas ) so we went for " capture2text" cli ( doesnt require any install frome the end user. ) but the weight of our app jumps from 40 Mo to 120 Mo, wich is a lot. So i'd be interested to know how you solved it in the end.

3
Hans Koch On

The node modules you linked are not native node modules (except one), but just CLI wrapper that spawn the process that is why they require the tools to be installed by the user.

To solve this issue you need a native node module, eg node-tesseract would be the one you should use. You can rebuild it for electron via node-gyp either by using electron-builder or this command (replace the marked parts with your target settings):

node-gyp rebuild --target=YOUR_ELECTRON_VERION --arch=YOUR_ARCH -RELEASE_OR_DEBUG --dist url=https://electronjs.org/headers --directory=./node_modules/node-tesseract`

Since I didn't go the extra mile testing node-tesseract for functionality with newer node versions there are a few things that may not work right away. In case it does not work you maybe want to invest some time in wrapping the c++ library libtesseract for node with via the NAPI. You only need to wrap the functions you need not everything.

Once that is done, you can follow the steps described by @Kiran Maniya