Does anybody know a way to vectorize the text in a PDF document? That is, I want each letter to be a shape/outline, without any textual content. I'm using a Linux system, and open source or a non-Windows solution would be preferred.
The context: I'm trying to edit some old PDFs, for which I no longer have the fonts. I'd like to do it in Inkscape, but that will replace all the fonts with generic ones, and that's barely readable. I've also been converting back and forth using pdf2ps
and ps2pdf
, but the font info stays there. So when I load it into Inkscape, it still looks awful.
Any ideas? Thanks.
To achieve this, you will have to:
This answer will omit step 3, since that's not programmable.
Splitting the PDF
If you don't want a programmatic way to split documents, the modern way would be with using stapler. In your favorite shell:
Would generate
{file_1.pdf,...,file_N.pdf}
, where1...N
are the PDF pages. Stapler itself uses PyPDF2 and the code for splitting a PDF file is not that complex. The following function splits a file and saves the individual pages in the current directory. (shamelessly copying from the commands.py file)Converting the individual pages to SVG
Now to convert the PDFs to editable files, I'd probably use pdf2svg.
If we take a look at the pdf2svg.c file, we can see that the code in principle is not that complex (assuming the input filename is in the
filename
variable and the output file name is in theoutputname
variable). A minimal working example in python follows. It requires the pycairo and pypoppler libraries:At this point you should have an SVG in which all text has been converted to paths, and will be able to edit with Inkscape without rendering issues.
Combining steps 1 and 2
You can call
pdf2svg
in a for loop to do that. But you would need to know the number of pages beforehand. The code below figures the number of pages and does the conversion in a single step. It requires only pycairo and pypoppler:Assembling the SVGs into a single PDF
To reassemble you can use the pair inkscape / stapler to convert the files manually. But it is not hard to write code that does this. The code below uses rsvg and cairo. To convert from SVG and merge everything into a single PDF:
PS: It should be possible to use the command
pdftocairo
, but it doesn't seem to callrender_for_printing()
, which makes the output SVG maintain the font information.