Ruby 2.6.3.
I have been trying to parse a StringIO
object into a CSV
instance with the bom|utf-8
encoding, so that the BOM character (undesired) is stripped and the content is encoded to UTF-8:
require 'csv'
CSV_READ_OPTIONS = { headers: true, encoding: 'bom|utf-8' }.freeze
content = StringIO.new("\xEF\xBB\xBFid\n123")
first_row = CSV.parse(content, CSV_READ_OPTIONS).first
first_row.headers.first.include?("\xEF\xBB\xBF") # This returns true
Apparently the bom|utf-8
encoding does not work for StringIO
objects, but I found that it does work for files, for instance:
require 'csv'
CSV_READ_OPTIONS = { headers: true, encoding: 'bom|utf-8' }.freeze
# File content is: "\xEF\xBB\xBFid\n12"
first_row = CSV.read('bom_content.csv', CSV_READ_OPTIONS).first
first_row.headers.first.include?("\xEF\xBB\xBF") # This returns false
Considering that I need to work with StringIO
directly, why does CSV
ignores the bom|utf-8
encoding? Is there any way to remove the BOM character from the StringIO
instance?
Thank you!
Ruby 2.7 added the
set_encoding_by_bom
method toIO
. This methods consumes the byte order mark and sets the encoding.