This question is related to another one which went the perl way but found much difficulties due to Windows bugs. (see Perl or Powershell how to convert from UCS-2 little endian to utf-8 or do inline oneliner search replace regex on UCS-2 file )

I would like the POWERSHELL equivalent of simple perl regex on a little endian UCS-2 format file (UCS-2LE is same as UTF-16 Little Endian). ie:

perl -pi.bak -e 's/search/replace/g;' MyUCS-2LEfile.txt

You will probably need to tell Powershell gci that input file is ucs2-le and that you want output file in same UCS-2LE (windows CR LF) format also etc.

1 Answers

1
lit On Best Solutions

This will output the file after regex. The output file does -not- begin with a BOM. This should work for small files. For large files, it may require changes to be speedy.

$fin = 'C:/src/t/revbom-in.txt'
$fout = 'C:/src/t/revbom-out.txt'
if (Test-Path -Path $fout) { Remove-Item -Path $fout }

# Create a file for input
$UCS2LENoBomEncoding = New-Object System.Text.UnicodeEncoding $False, $False
[System.IO.File]::WriteAllLines($fin, "now is the time`r`nwhen was the time", $UCS2LENoBomEncoding)

# Read the file in, replace string, write file out
[System.IO.File]::ReadLines($fin, $UCS2LENoBomEncoding) |
    ForEach-Object {
        [System.IO.File]::AppendAllLines($fout, [string[]]($_ -replace 'the','a'), $UCS2LENoBomEncoding)
    }

HT: @refactorsaurusrex at https://gist.github.com/refactorsaurusrex/9aa6b72f3519dbc71f7d0497df00eeb1 for the [string[]] cast

NB: mklement0 at https://gist.github.com/mklement0/acb868a9f15d9a34b6e88fc874b3851d

NB: If the source file is HTML, please see https://stackoverflow.com/a/1732454/447901