I have a problem with my app that reads e-mails from external server using mailman gem (which is also using mail).
ruby 1.9.2p0
mail (2.3.0)
mailman (0.4.0)
actionmailer (= 3.1.3)
database.yml
production:
adapter: mysql2
encoding: utf8
Here is a simple method to receive 'mail'. I build @message_body from text_part of multipart email (for ex. with attachments) or from the whole body (decoded).
def self.receive_mail(message)
# some code here
@message_body = message.multipart? ? message.text_part.body.to_s : message.body.decoded
# some code here, to save message in database
My problem is that if the message doesn't have attachments but have diacritics, like ą ś ł ń ż ź ó ... body is split just before first diacricit. So if body is: "test żłóbek test" I will get only "test " in @message_body.
My question is how to save such a message in an elegant way, so that text part is saved in database with all diacritics.
EDIT: to make it cleaner, I get e-mails that look like this one (it's just a part of e-mail sent from gmail)
--20cf307ac4372d830104c11c8cc6 Date: Mon, 28 May 2012 20:06:16 +0200 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: base64 Content-ID: <[email protected]_domain>
dGVzdCC/s7zm8bbzsSB0ZXN0Cg==
So we have this 'body' : dGVzdCC/s7zm8bbzsSB0ZXN0Cg==
After decoding we get : 'test \xbf\xb3\xbc\xe6\xf1\xb6\xf3\xb1 test\n'
And the problem is that starting from '\xbf' data is not saved in database.
UPDATE
another example, I think this is the problem here:
irb(main):008:0* require 'base64'
=> true
irb(main):009:0> a = "test źćłżąńś"
=> "test źćłżąńś"
irb(main):010:0> b = Base64.encode64(a)
=> "dGVzdCDFusSHxYLFvMSFxYTFmw==\n"
irb(main):011:0> Base64.decode64(b)
=> "test \xC5\xBA\xC4\x87\xC5\x82\xC5\xBC\xC4\x85\xC5\x84\xC5\x9B"
see, after decode64 my diacritics are LOST, what to do to get them back?
Doesn't work because the data isn't utf-8 - your mail headers clearly states that the message body is ISO 8859-2.
Mysql2 assumes everything is utf8 but can't convert the bytes to utf8 (because ruby doesn't know the original encoding) so your non ascii characters are thrown away by mysql
For that one string you could try
But really you want to be working out what encoding to use from the content type header. I'm surprised the mail gem isn't doing that for you