Why is Ruby failing to convert CP-1252 to UTF-8?

925 views Asked by At

I have a CSV files saved from Excel which is CP-1252/Windows-1252. I tried the following, but it still comes out corrupted. Why?

csv_text = File.read(arg[:file], encoding: 'cp1252').encode('utf-8')
# csv_text = File.read(arg[:file], encoding: 'cp1252')
csv = CSV.parse csv_text, :headers => true
  csv.each do |row|
    # create model
    p model

The result

>rake import:csv["../file.csv"] | grep Brien
... name: "Oâ?TBrien ...

However it works in the console

> "O\x92Brien".force_encoding("cp1252").encode("utf-8")
=> "O'Brien"

I can open the CSV file in Notepad++, Encoding > Character Sets > Western European > Windows-1252, see the correct characters, then Encoding > Convert to UTF-8. However, there are many files an I want Ruby to handle this.

Similar: How to change the encoding during CSV parsing in Rails. But this doesn't explain why this is failing.

Ruby 2.4, Reference: https://ruby-doc.org/core-2.4.3/IO.html#method-c-read

1

There are 1 answers

1
Chloe On

Wow, it was caused by the shitty grep in DevKit.

>rake import:csv["../file.csv"]
... name: "O'Brien ...

>where grep
C:\DevKit2\bin\grep.exe

I also did not need the .encode('utf-8').

Let that be a lesson kids. Never take anything for granted. Trust no one!