PHP: How to determine email attachment's MIME type?

3.5k views Asked by At

I'm using Zend's Zend_Mail_Storage_Pop3 to connect to mail server, open an email, and iterate through its attachments. If the attachment is a PDF, I need to download it. At each iteration of each message part, I call the getHeaders and use Regex to determine the mime type of the attachment. In most cases, I get something like this:

["content-type"]=> string(64) "application/octet-stream; name=abc.pdf"
["content-transfer-encoding"]=> string(6) "base64"

But in some cases, I get something like this:

multipart/mixed; boundary=--boundary_2_1dca5b3b-499e-4109-b074-d8b5f914404a

How do I determine the mime type of such attachments?

1

There are 1 answers

6
Andrew On BEST ANSWER

This is a little bit of a complicated case. When the content-type is multipart/mixed that means that there are several pieces of the email. One or more of these might be an attachment (in addition to possibly including an html region or plain text).

When the content-type is multipart/mixed, a boundary is also given. You can use this regex to determine if you are dealing with a multipart email:

$contentType = $this->GetHeader('content-type');
$regex = '%multipart.*?boundary\s*=\s*"?(\w+)?"?%is';
$matches = array();

if (preg_match($regex, $contentType, $matches)) {
    $this->isMultiPart = true;
    $this->boundary = $matches[1];
} else {
    $this->isMultiPart = false;
}

(note that this sample is part of a larger class dealing with email messages)

If your message is a multipart email, the next step is to separate all of the parts. You can do this like so:

$parts = explode("--$this->boundary", $this->fullBody);

The boundary always will start with -- per the email standards. Then the only thing left to do is to parse each of the individual parts.
You probably already have code to do that. Each part will have the same headers that you mentioned in your question: content-type and content-transfer-encoding.
There might be other part headers as well, and you will want to remove them (they will all start with the prefix content if I remember correctly).
Then make sure that if the part is base64 encoded that you account for that (you can check the content-transfer-encoding header to determine this.
The mime-type of the individual attachment will be stored in the part's content-type header just like in the case of a single part message.

One note - this assumes that you are dealing with the raw source of the message. To do this, you can use getRawHeader and getRawContent.