Ghostscript: How to auto-crop STDIN to "bounding box" and write to PDF?

2.3k views Asked by At

Here have been already quite a few questions and answers about cropping documents with Ghostscript. However, the answers are not matching my exact needs and are still confusing to me. I expected that there would be a single option e.g. "-AutoCropToBBox" or something like this.

For clarification, as a bounding box, I understand the smallest rectangular box which contains all (non-white(?)) printed objects completely.

Furthermore, I want/have to use a printer port redirection (RedMon) to generate a cropped PDF via printing to a Postscript-printer from basically any application. So, under Win7/64bit, I set the redirected port properties: Redirected port properties Win7/64bit

The output is redirected to C:\Windows\system32\cmd.exe

The arguments for the program are:

/c gswin64c.exe -sDEVICE=pdfwrite -o -sOutputFile="%1".pdf -

"%1" contains the user input for filename. With this, I get a full-page PDF. Fine!

But how to add the cropping options?

Additional question: If I have a multipage document will such an (auto-)cropping be individual for each page? Or would there be an option to keep it all the same e.g. like the first page or like the largest bounding box of all pages?

Another related issue: the window for prompting for the filename is always popping up behind the application I am printing from. Any ideas to always bring it to the front?

Another question: There is the Perl-script "ps2eps" and program bbox.exe (see http://ctan.org/pkg/ps2eps). It's said there that Ghostscript (or ps2epsi) is occationally(?) calculating wrong bounding boxes. Is this (still) true?

Thanks for your help.

2

There are 2 answers

8
KenS On

Well your first problem is that PostScript programs are normally written to expect to be rendered to a specific media size, and are usually not tightly bounded to it. White space is important for readability.

So ordinarily the PostScript program you generate will request a specific media size, and the interpreter will do its best to match that. If it can't match it then it will use a strategy to try and get as close as possible, and scale the entire content to fit that media.

You can't expect the printer to perform any of those things if it doesn't know the required size until its finished, and you can't be certain of the bounding box until you have rendered all the marking content. It is true that some files generally EPS files have a %%BoundingBox comment but.. that's a comment, it has no effect in PostScript, its there for the benefit of applications which don't want to interpret the PostScript.

So that's why the simple switch you want isn't there, it would break the interpreter's normal functioning, for rendering.

So, the first thing you need to do is determine the bounding box of the content. You can do that, as Stefan says, by using the bbox device. And on that note, as far as I know the bbox device produces accurate output. If it does not then we would appreciate a bug report proving it so we can fix it. If people don't report bugs how are we supposed to know about them ? Its disappointing to see someone spreading FUD instead of helping out with a bug report.......

ps2epsi isn't Ghostscript, its a crappy cheap and cheerful script, I wouldn't use it. However..... If the original PostScript leaves stuff on the stack then it will end up as a corrupted (or invalid) EPS file and the original PostScript should be fixed before trying to use it as it will break any PostScript program that tries to use it (eg if you include the EPS in a docuemnt and then print it).

So if you are using Ghostscript, and you want to take a PostScript program and get an EPS out of it, use the eps2write device. It won't have a preview bu frankly who cares.

Now if I remember correctly the bbox device (and eps2write) record all marking operations, you can't simply record all the non-white marking operations; what if the white overwrites an existing mark on the page ? What if the media is not white ? Note that if you render to a PNG with Ghostscript, the untouched portion of the output is transparent, whereas white marks are not.

So the bbox is the extent of all the marking operations, regardless of the colour. The only other way to proceed would be to render the content and count the non-white pixels. But that only works at a specific resolution, change the resolution and the precise bounding box may change as well.

Once you have the Bounding Box you can tell Ghostscript to use media that size. Note that you will almost certainly also have to translate the origin, as its unlikely that the content will start tightly at the bottom left corner. You will need -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS to set the media size, and you will need to use -c and -f to send PostScript to alter the origin appropriately. In simple cases an '-x -y translate' will suffice but if the program executes initgraphics you will instead have to set a BeginPage procedure to alter the initial CTM.

If you set the media size with -dDEVICEWIDTHPOINTS etc then all pages will be the same size. If you don't want that then you need to write a BeginPage procedure to resize each page individually (you will also need to hook setpagedevice and remove the /PageSize entries from the dictionary.

I've no idea why Windows is putting the dialog box behind the active Window, it seems to have started doing that with Windows 7 (or possibly Vista). I don't see any way to alter that because I'm not sure what is generating the dialog.....

Personally I would suggest that you try the 2-step approach of running the original through Ghostscript's eps2write device and then take the EPS and create a PDF file using the pdfwrite device and the -dEPSCrop switch. Double converting is bad, but other solutions are worse. Note that EPS files cannot be multi-page, so you will have to create 'n' EPS files from an n-page PostScript program, and then supply a command line listing each EPS file as input to the pdfwrite device.

Take an example file and try this out from the command line before you try scripting it.

0
theozh On

As I understood from @KenS explanations:

  1. the way eps2write works, it may not or will not or actually cannot result in the minimum possible bounding box
  2. it needs to be a 2-step process via -sDEVICE=bbox

So, I now ended up with the following process to "print" a PDF with a correct minimum possible bounding box:

Redirected Printer Port to cmd.exe C:\Windows\system32\cmd.exe

Arguments for the program:

 /c gswin64c.exe -q -o "%1".ps -sDEVICE=ps2write - && gswin64c.exe -q -dBATCH -dNOPAUSE -sDEVICE=bbox -dLastPage=1 "%1".ps 2>&1 >nul | perl.exe C:\myFiles\CropPS2PDF.pl "%1"

Unfortunately, it requires a little Perl script (let's call it: CropPS2PDF.pl):

#!usr/bin/perl -w
use strict;
my $FileName = $ARGV[0];
$/ = undef;
my $Crop = <STDIN>; 

$Crop =~ /%%BoundingBox: (\d+) (\d+) (\d+) (\d+)/s;   # get the bbox coordinates
my ($llx, $lly, $urx, $ury) = ($1, $2, $3, $4);
print "\n$FileName: $llx, $lly, $urx, $ury \n";   # print just to check

my $Command = qq{gswin64c.exe -q -o $FileName.pdf -sDEVICE=pdfwrite -c "[/CropBox [$llx $lly $urx $ury]" -c " /PAGE pdfmark" -f $FileName.ps};

print $Command;    # print just to check
system($Command);    # execute command

It seems to work... :-) Improvements are welcome.

My questions are still:

  1. Can this be done somehow without Perl? Just Win7, cmd.exe and Ghostscript?
  2. Is there maybe a way without writing the PS-File to disk which I do not need? Of course, I could also delete it afterwards with the Perl-script.