I need a little advice/push in the right direction.
I have written some small scripts that takes an incoming HTML email, converts it to PostScript and then sends it to a designated printer via CUPS. Printers are based on the recipient of the email.
I am using the following to achieve this;
- Exim
- Procmail
- HTML2PS
- Two custom scripts (posted below)
The flow
- An email is received by Exim and passed to Procmail
- .procmailrc calls the custom script "process_mail", passing the subject and content as parameters
- "process_mail" pulls the content into a function and calls "get_html_from_message" (I am not doing anything with the subject yet)
- "get_html_from_message" dumps everything but the HTML
- HTML is then converted to PostScript
- PostScript file is sent to designated printer.
Problems
- At the HTML2PS stage an error is generated and an NDR is sent back to the sender stating that there was an error opening the images. Error opening cid:logo.jpg
- PostScript file is successfully printed but obviously does not contain the images from the email.
My question is: How do I get those images out of the email so that they will print out successfully in the PostScript file?
I am more than happy to convert to PDF if PostScript is not suitable, but even converting to PDF leaves me without the images because I cannot get at them.
.procmailrc
SHELL=/bin/bash
# Extract the subject and normalise
SUBJECT=`formail -x"Subject: "\
| /usr/bin/tr '[:space:][:cntrl:][:punct:]' '_' | expand | sed -e 's/^[_]*//' -e 's/[_]*$//'`
YMD=`date +%Y%m%d`
MAKE_SURE_DIRS_EXIST=`
mkdir -p received_mail/backup
if [ ! -z ${SUBJECT} ]
then
mkdir -p received_mail/${YMD}/${SUBJECT}
else
mkdir -p received_mail/${YMD}/no_subject
fi
`
# Backup all received mail into the backup directory appending to a file named by date
:0c
received_mail/backup/${YMD}.m
# If no subject, just store the mail
:0c
* SUBJECT ?? ^^^^
received_mail/${YMD}/no_subject/.
# Else there is a subject, generate a unique filemane, place the received email
# in that file and then execute process_mail passing the filename and subject as parameters
:0Eb
| f=`uuidgen`; export f; cat > received_mail/${YMD}/${SUBJECT}/${f}; $HOME/bin/process_mail received_mail/${YMD}/${SUBJECT}/${f} "${SUBJECT}"
# and don't deliver to standard mail, don't want to clutter up the inbox.
:0
/dev/null
process_mail
#/bin/bash
# Test Printer
printer=$(whoami)
file=$1
subject=$2
function process_rrs {
typeset file
file=$1
cat $file \
| $HOME/bin/get_html_from_message \
| html2ps \
| lp -d ${printer} -o media=a4 2>&1
}
case "$subject" in
*)
process_rrs $file
;;
esac
get_html_from_message
cat | awk '
BEGIN {
typeout=0
}
{
if($0 ~ /<html/)
typeout=1
if($0 ~ /^------=/)
typeout=0
if(typeout)
print $0
}'
EDIT: Formatting
The problem is probably an incomplete understanding of how HTML is represented in email. There will typically be a MIME multipart with one HTML part and multiple images. The HTML uses the
cid:
addressing scheme in image links to refer to these sibling parts. But if you extract just the HTML, it no longer exists in a context where it has any siblings. (Even if you extract all parts to files,cid:
does not normally map to a local file. Maybe you could post-process the HTML to fix that; but I'm thinking maybe your approach should be rethought. Have you considered using a mail client with native HTML support for rendering these messages?)A simple
xmlstarlet
script or similar to strip thecid:
prefix from thesrc
attribute of anyimg
link should not be hard, but there are probably additional things you discover you need to do if you attempt this path.