Search with accented characters in pdf.js

723 views Asked by At

I am using ng2-pdf-viewer library to display some pdfs. I was asked to include a search bar for these pdfs and I did by using this command available in PdfFindController from pdf.js

this.pdfFindController.executeCommand('find', {
            caseSensitive: false,
            findPrevious: false,
            highlightAll: true,
            phraseSearch: phraseSearch,
            query: stringToSearch
        });

However most of my pdfs are in french and so they use weird characters such as è û etc. So what I need is to know if there is an option in findcontroller parameters to set this find function to find all matching no matter if they have accents or stress. And if not what workaround do you advice me to do.

I also found this issue on pdfjs github page https://github.com/mozilla/pdf.js/issues/8101 about it but they don't give a straight answer.

Thank you guys for your help !

1

There are 1 answers

0
César Castro Aroche On BEST ANSWER

So you have to modify lib in order to accept this characters. This can be tricky and may depend on pdfjs lib version. In my case I modify version 2.4.456. Here is source code for pdf-find-controller.js https://drive.google.com/file/d/1pbDG7gmeBpPp8soC1MNOyXVRYxf5AomD/view?usp=sharing this the only file you should change.

Then you should compile library again using these commands:

npm install -g gulp-cli
npm install
gulp generic

And you should get pdf-viewer.js result file as this one https://drive.google.com/file/d/1tWOW_P6-O8ATiQc9cOVt2LAToRB-niHc/view?usp=sharing

This fix is specially designed for french language but is adaptable to every language. My advice is to do a comparison between original files and modified and then you'll see the logic you should add.

Also to force npm to use new version of library without it being a pain read about npm-force-resolutions