What are special characters in E-Mail-Headers and when to use quotes?

4k views Asked by At

I'm trying to send and read e-mail using PHP. So far I found out that I have to encode special characters using the function mb_encode_mimeheader(), but I don't have to encode spaces.

I also found out that brackets in address-field don't work: (Is there an error in PHP's imap_fetch_overview()-function when reading headers with brackets?). For example PHP is unable to the header-section From: Admin [] <[email protected]>, but can read the header-section From: "Admin []" <[email protected]>.

So, obviously, brackets have a special meaning in the mail-header (at least for PHP). What are there for special characters in the Mailheader, what ist their meaning, and where do they need to be encoded/quoted?

For example, PHP has no problems with brackets in the subject, although the Subject is also part of the header.

It seems that quotes can help me with the problem (https://www.rfc-editor.org/rfc/rfc5322#section-3.2.4 - I'm still not 100% sure if this is a problem of PHP or an incorrect mailheader). But how to use quotes, and what is escaped by quotes?

In https://www.rfc-editor.org/rfc/rfc5322#section-3.2.4 it says:

Strings of characters that include characters other than those allowed in atoms can be represented in a quoted string format, where the characters are surrounded by quote (DQUOTE, ASCII value 34) characters.

So, should I now "escape/quote" each character on his own

From: Admin "[""]" <[email protected]>

or is it fine to quote everything together?

From: "Admin []" <[email protected]>

But what happens, if other control sequences are enclosed withing the quotes? For example I have the special characters ÄÖÜ within my String, which are encoded to =?UTF-8?B?w4PChMODwpbDg8Kc?=. So, will 'quoted AND encoded' strings still be fine according to the RFC?

From: "Admin [=?UTF-8?B?w4PChMODwpbDg8Kc?=]" <[email protected]>
1

There are 1 answers

0
tripleee On

If you have RFC2047, you might as well encode the entire header as RFC2047 and forget about quoting.

Apparently you already found RFC5322, which is the authoritative source on what needs to be quoted and why. Basically, anything which has a meaning as an email address needs to be quoted if it's not part of an email address. The traditional quoting mechanism was backslash and/or double quotes, but with MIME, you can easily encode everything transparently with the available MIME encodings.

The link you gave explains that characters which are not permitted in "atoms" require quoting. The list of characters which are permitted in atoms is in the previous section.

ALPHA / DIGIT /    ; Printable US-ASCII
                   "!" / "#" /        ;  characters not including
                   "$" / "%" /        ;  specials.  Used for atoms.
                   "&" / "'" /
                   "*" / "+" /
                   "-" / "/" /
                   "=" / "?" /
                   "^" / "_" /
                   "`" / "{" /
                   "|" / "}" /
                   "~"

If you cross-check against the ASCII table, you get

32   (space)                    not OK
33 !                            OK
34 "                            not OK
35 # through $%& 38             OK
39 ' through () 41              not OK
42 * through + 43               OK
44 ,                            not OK
45 -                            OK
46 .                            not OK
47 / through 0123456789 57      OK
58 : through ;< 60              not OK
61 =                            OK
62 >                            not OK
63 ?                            OK
64 @                            not OK
65 A through BCD...XYZ 90       OK
91 [ through \] 93              not OK
94 ^ through _ 95               OK
96 `                            not OK
97 a through bcd...xyz{|}~ 126  OK
127 DEL                         not OK

In some contexts, the set "dot-atom" which is the above plus dot (full stop, period, ASCII 46) are permitted without quoting.

Some clients obviously err on the cautious side (some will simply put everything in double quotes, as if your real name wasn't really your real name. That sucks).

My understanding is that an RFC2047 sequence is permitted where an atom is permitted, but that means that it cannot be adjacent to another atom. Anyway, I would cop out and recommend against even trying to mix quoting and RFC2047 wrapping in the same header, instead of possibly figuring out how they interact (and perhaps then find that your interpretation is not the only game in town, either because others made a mistake when figuring it out, or because there are multiple valid interpretations of the spec).