pdfjs-dist font loading issue in nodejs

717 views Asked by At

I am using pdfjs-lib npm package for PDF parsing and converting to canvas and returning as buffer image format, but unfortunately while parsing a PDF it returns only half the content, the issue I got

"Warning: fetchStandardFontData: failed to fetch file "LiberationSans-Regular.ttf" with "UnknownErrorException: The standard font "baseUrl" parameter must be specified, ensure that the "standardFontDataUrl" API parameter is provided.".

`const pdfjsLib = require("pdfjs-dist/build/pdf.js");
const pdfjsWorker = require('pdfjs-dist/build/pdf.worker.entry.js');
pdfjsLib.GlobalWorkerOptions.workerSrc = pdfjsWorker;
pdfjsLib.disableFontFace = true;

const loadingTask = pdfjsLib.getDocument({ data: pdfData });
const pdf = await loadingTask.promise;
const page = await pdf.getPage(pageNumber);`

This are my code configuration in nodejs

2

There are 2 answers

0
zhyd1997 On

Bing AI response

It seems that you are getting a warning message from pdfjs-dist, a library for parsing and rendering PDF documents in Node.js. The warning indicates that the library could not fetch a font file named "LiberationSans-Regular.ttf", which is a standard font used by PDF documents. This could cause some text to be missing or rendered incorrectly in the output.

One possible reason for this warning is that you did not specify the standardFontDataUrl option when calling the getDocument method. This option tells the library where to find the font files that are needed for rendering the PDF. You can either provide a URL to a folder that contains the font files, or a function that returns a URL for a given font name. For example:

// Assuming you have a Blob object named pdfData

var pdfjsLib = require ("pdfjs-dist/build/pdf.js");
var pdfjsWorker = require ('pdfjs-dist/build/pdf.worker.entry.js');
pdfjsLib.GlobalWorkerOptions.workerSrc = pdfjsWorker;
pdfjsLib.disableFontFace = true;

// Option 1: Provide a URL to a folder that contains the font files

var standardFontDataUrl = "https://example.com/fonts/";
var loadingTask = pdfjsLib.getDocument ({
data: pdfData,
standardFontDataUrl: standardFontDataUrl
});

// Option 2: Provide a function that returns a URL for a given font name

var standardFontDataUrl = function (fontName) {
return "https://example.com/fonts/" + fontName;
};
var loadingTask = pdfjsLib.getDocument ({
data: pdfData,
standardFontDataUrl: standardFontDataUrl
});

You can find more information and examples on how to use the standardFontDataUrl option in the following link: StandardFontDataFactory - pdfjs-dist - GitHub pdfjs-dist font loading issue in nodejs

I hope this helps you with your problem. Have a nice day! blush

0
Lakindu On

Try installing the Liberation font family on the machine / container your code runs on.

To reproduce your problem I used a Docker container running Ubuntu 22.04.

$ docker run -it -v $PWD:/app --name ubuntu ubuntu:22.04 /bin/bash

On the started bash terminal session of the Docker container ran following commands to install the Liberation font family.

$ apt update
$ apt install fontconfig

# Check if LiberationSans font is available
$ fc-list LiberationSans
# Above command listed nothing. So font is not installed on the system.
# It is the problem.

# Then downloaded and installed the Liberation font family
$ apt install wget
$ wget http://archive.ubuntu.com/ubuntu/pool/main/f/fonts-liberation/fonts-liberation_2.1.5-3_all.deb
$ dpkg -i fonts-liberation_2.1.5-3_all.deb
$ rm fonts-liberation_2.1.5-3_all.deb
$ fc-cache -f -v
# Above command scans the font directories and
# creates font cache including newly installed Liberation fonts

# Now list Liberation fonts
$ fc-list LiberationSans
# Listed following...
# /usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf: Liberation Sans:style=Regular
# /usr/share/fonts/truetype/liberation/LiberationSans-Italic.ttf: Liberation Sans:style=Italic
# /usr/share/fonts/truetype/liberation/LiberationSans-BoldItalic.ttf: Liberation Sans:style=Bold Italic
# /usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf: Liberation Sans:style=Bold

Now, let's try rendering a PDF page on a canvas.

Installing Node JS (v20), first.

$ wget https://deb.nodesource.com/setup_20.x -O nodesource_setup.sh
$ bash nodesource_setup.sh
$ apt install -y nodejs
$ rm nodesource_setup.sh

# I use pnpm as the package manager.
$ npm install -g pnpm

To successfully build canvas NPM package, we need following dependencies on our system.

$ apt install -y build-essential libcairo2-dev libpango1.0-dev

Source code...

package.json

{
  "name": "app",
  "version": "1.0.0",
  "main": "index.js",
  "dependencies": {
    "canvas": "^2.11.2",
    "pdfjs-dist": "^3.11.174"
  }
}

index.js

const pdfjs = require("pdfjs-dist/build/pdf")
const {readFileSync} = require('fs')
const {writeFile} = require('fs/promises')
const {createCanvas} = require('canvas')

async function main() {
    const pdfFilePath = ...

    const pdfjsWorker = require('pdfjs-dist/build/pdf.worker.entry')
    pdfjs.GlobalWorkerOptions.workerSrc = pdfjsWorker;
    pdfjs.disableFontFace = true;

    const pdf = await pdfjs.getDocument(new Uint8Array(readFileSync(pdfFilePath))).promise
    // Get first page
    const pdfPage = await pdf.getPage(1)

    // Render page to a canvas
    const viewport = pdfPage.getViewport({ scale: 1 });
    const canvas = createCanvas(viewport.width, viewport.height);
    const canvasContext = canvas.getContext('2d');
    await pdfPage.render({canvasContext, viewport}).promise

    // Get PNG image buffer
    canvas.toBuffer()

    pdfPage.cleanup()
    await pdf.cleanup()
    await pdf.destroy()
}

main()

Run...

$ node ./index.js

This should work.