I have a log file having multibyte data in it (). I want to write a script that does some data manipulation on it.
with open(fo, encoding="cp1252") as file:
for line in file:
print(line)
if("WINDOWS" in line):
print(found)
print(line) give following output:
there is one extra byte after every character.
This is not working due to the fact that WINDOWS
is not multibyte. I am unable to find the solution for this. Can someone help me here ?
cp1252
is not a multibyte encoding. If the file in fact contains UTF-16, but most of it is in the very lowest range of Unicode, usingcp1252
will yield roughly the correct characters except there will be zero (null) bytes between them. Without an unambiguous sample of the bytes in the file, we can only speculate; but try opening the file withencoding='utf-16le'
. (If this fails, please edit your question to indlude a hex dump orrepr()
of the binary bytes in the file; see also Problematic questions about decoding errors)