I have git
reading the file name "ùàèòùèòùùè.txt" as a simple string of bytes, so when I ask git for a list of commited files, I'm given the following string:
r"\303\271\303\240\303\250\303\262\303\271\303\250\303\262\303\271\303\271\303\250.txt"
How can I use Python 2 to have it back to "ùàèòùèòùùè.txt"?
If the
git
format contains literal\ddd
sequences (so up to 4 characters per filename byte) you can use thestring_escape
(Python 2) orunicode_escape
(Python 3) codecs to have Python interpret the escape sequences.You'll get UTF-8 data; my terminal is set to interpret UTF-8 directly:
You'd want to decode that as UTF-8 to get text:
In Python 3, the
unicode_escape
codec gives you (Unicode) text so an extra encode to Latin-1 is required to make it bytes again:Note that
git_data
is abytes
object before decoding.