It's unclear to me whether raw (non base64) binary is standard supported in MIME multipart/mixed, in particular when decoding with python:
msg = email.message_from_binary_file(fp, policy=email.policy.HTTP)
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
filename = part.get_filename()
payload = part.get_content()
payload gets text processed (or something like it. 0x13s turn into 0x10s). Obviously this corrupts the data. Is there a way to put this into a pure binary mode? Should I be base64 encoding it? The MIME itself looks like this:
Content-Type: multipart/mixed;boundary=123456789000000000000987654321
Transfer-Encoding: chunked
--123456789000000000000987654321
Content-Type: image/jpeg
Content-transfer-encoding: binary
Content-Disposition: attachment; filename="2024-02-08T000418.jpg"
Content-Length: 23302
<binary data>
--123456789000000000000987654321
UPDATE 08FEB24
- RFC 2045 tells us "Content-Transfer-Encoding [...] "binary" all mean that the identity (i.e. NO) encoding transformation has been performed"
- RFC 2045 tells us "there are no circumstances in which the "binary" Content-Transfer-Encoding is actually valid in Internet mail."
- RFC 2045 tells us:
mechanism := "7bit" / "8bit" / "binary" /
"quoted-printable" / "base64" /
- Python call
email.message_from_bytesleaves CRs alone - Yes, fp is opened in binary mode
From this I conclude:
Content-transfer-encoding: base64is standards-supported- Python very much took item #2 to heart, since this is an email processing mechanism
- Presuming that that 28 year old RFC still holds true, one might not accuse Python
emailmodule of having a a bug. However, I say it does -email.message_from_binary_filespecifically fails on a binary file.
Is my logic sound?