getting osd output from tesseract on (need the script value Latin, cyrillic...) tika-server

18 views Asked by At

I am a beginner currently using the Tika 2.9.1 server version and need the output of the OSD in my metadata, particularly the value of the script (Latin, Cyrillic, etc.). So my questions are the following: Does my server version of Tika integrate it? Is it possible? If yes, how can I configure my Tika server? Thanks for your work (and also english is not m'y native language)

I found this topic but i don't see how i can integrate it to my Dockerfile to build an image that will allow me to return the content of osd from tesseract in the metadata after a request to tika server. https://github.com/apache/tika/pull/246/commits/8eb7f93324b20a641b488a4b2d64731db39e717c#diff-8e0377396ab503c58862153ead9a186b611d715d8c2e2025874ae07a4e27c565

1

There are 1 answers

0
Tarik On

Ok problem solved, i used a custom tika config yml file to set psm 0 and in thé rmeta i get the content of thé osd script.