Encoded PDF File Parsing

94 views Asked by At

I am trying to parse PDF file to text. That file can be downloaded from official goverment site, but I spent hours trying to decode it. Adobe Extractor came close, but not really sure, If I can configure it to parse it properly (https://developer.adobe.com/document-services/docs/overview/pdf-extract-api/quickstarts/python/). Adobe extract some of its text, but in wrong characters and only small portion.

This is the file : https://isir.justice.cz/isir/doc/dokument.PDF?id=49996138

Does somebody have any idea, how to extract it?

I am using python, but I guess any language will do, if it works.

0

There are 0 answers