I've been working with a pandas dataframe, where one column was bytes encoded. I decoded it once with .decode('utf-8'), and it worked for the major part of the data, but there were some strings, that occured to be encoded more than once. For example: b'b'b\'[{"charcName":"\\\\u0420\\\\u0438\\\\u0441\\\\u0443\\\\u043d\\\\u043e\\\\u043a","charcValues":["\\\\u043c\\\\u0438\\\\u043b\\\\u0438\\\\u0442\\\\u0430\\\\u0440\\\\u0438 \\\\u043a\\\\u0430\\\\u043c\\\\u0443\\\\u0444\\\\u043b\\\\u044f\\\\u0436"]}]\'''
I tried to decode it consequently (and encode as well, in order to prevent an error 'str' object has no attribute 'decode'), but it doesn't seem to work. How can I decode such strings completely? In what order utf-8 and unicode_escape decoding should be applied?
The original string wasn't valid so I stripped one bad layer of bytes-decoration and focused on decoding the remainder. It won't work on the other entries since I manually stripped the bad part of the invalid string. Tell the hacks upstream to fix it.
Output: