I have a problem with Ruby (1.9.3) and Powershell.
I need to write an interactive console app which will deal with sentences in the Polish language. I've been helped out and can retrieve ARGV elements with Polish diacritics, but the Standard Input doesn't work as I want it to.
Code illustration:
# encoding: UTF-8
target = ARGV[0].dup.force_encoding('CP1250').encode('UTF-8')
puts "string constant = dupą"
puts "dupą".bytes.to_a.to_s
puts "dupą".encoding
puts "target = " +target
puts target.bytes.to_a.to_s
puts target.encoding
puts target.eql? "dupą"
STDIN.set_encoding("CP1250", "UTF-8")
# the line above changes nothing, it can be removed and the result is still the same
# I obviously wanted to mimic the ARGV solution
target2 = STDIN.gets
puts "target2 = " +target2
puts target2.bytes.to_a.to_s
puts target2.encoding
puts target2.eql? "dupą"
The output:
string constant = dupą
[100, 117, 112, 196, 133]
UTF-8
target = dupą
[100, 117, 112, 196, 133]
UTF-8
true
dupą //this is fed to STDIN.gets
target2 = dup
[100, 117, 112]
UTF-8
false
Apparently Ruby never gets the fourth character from the STDIN.gets. If I write a longer string, like dupąlalala
, still only the three initial bytes appear within the program.
- I've tried enumerating the bytes and looping with getc, but they never seem to reach Ruby (where are they lost?)
- I've used chcp 65001 (doesn't seem to change a thing)
I've changed my $OutputEncoding to [Console]::OutputEncoding; it now looks like this:
IsSingleByte : True BodyName : ibm852 EncodingName : Środkowoeuropejski (DOS) HeaderName : ibm852 WebName : ibm852 WindowsCodePage : 1250 IsBrowserDisplay : True IsBrowserSave : True IsMailNewsDisplay : False IsMailNewsSave : False EncoderFallback : System.Text.InternalEncoderBestFitFallback DecoderFallback : System.Text.InternalDecoderBestFitFallback IsReadOnly : True CodePage : 852
I'm using the Consolas font
What do I do to read Polish diacritics properly in Powershell?
.Net 4.x expects and creates a Byte Order Mark (BOM) with CHCP 65001 (UTF-8) on stdin.
This appears to be fixed in .Net Core, but requires changing
Console.StandardInputEncoding
in 4.x to properly hook communication with child processes that don't have similar assumptions.