The file srcfile.pdf
has a variable number of roman-numerated pages (i, ii, iii, etc) and the following arabic-numerated pages (1, 2, 3, ..., n).
How to extract only arabic-numbered pages (e.g. #1 to #10)?
The following command extracts pages i, ii, iii, 1, 2, etc.
qpdf --empty --pages srcfile.pdf 1-10 -- targetfile.pdf
Is it possible to extract only pages 1, 2, 3, etc.?
qpdf
has an option--json
to generate a json representation of the file.With this option there is a workaround using a json parser like e.g. jq:
With the following bash script "relative" pages in a pagelabel can be converted to absolute pages:
Usage:
./relpage.sh inputfile.pdf 1:17
. Note that pagelabels are 0-based.To extract pages 17 to 39 in the pagelabel 1 use following command:
To get the pagelabels info just use
qpdf --json --json-key=pagelabels inputfile.pdf
or the following