How to manipulate multibyte string in python?

289 views Asked by At

I have a log file having multibyte data in it (). I want to write a script that does some data manipulation on it.

with open(fo, encoding="cp1252") as file:
    for line in file:
        print(line)
        if("WINDOWS" in line):
            print(found)

print(line) give following output: enter image description here

there is one extra byte after every character. This is not working due to the fact that WINDOWS is not multibyte. I am unable to find the solution for this. Can someone help me here ?

1

There are 1 answers

0
tripleee On BEST ANSWER

cp1252 is not a multibyte encoding. If the file in fact contains UTF-16, but most of it is in the very lowest range of Unicode, using cp1252 will yield roughly the correct characters except there will be zero (null) bytes between them. Without an unambiguous sample of the bytes in the file, we can only speculate; but try opening the file with encoding='utf-16le'. (If this fails, please edit your question to indlude a hex dump or repr() of the binary bytes in the file; see also Problematic questions about decoding errors)