I was using Camelot and tabula for parsing a pdf file with Cyrillic symbols inside. But in the output CSV file, I got the messed-up font with no sign of Russian language.
What can help me to parse the pdf table in a non-English language?
import camelot
file = 'file-name.pdf'
tables = camelot.read_pdf(file, pages = "1-end", encoding='utf-8')
Output: 00550529-1295-06 -ТКР5.СО1 0520529-12955--0066--ТТККРР55--ГГЧЧ23 00552299--11229955--0066--ТТККРР55--ГГЧЧ45
So, basically, Camelot is pretty good with Cyrillic.
The output will be pretty raw, needs cleaning, but symbols won't be broken which I assume is a good result.