#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import chardet
s = '123'.encode('utf-8')
print(s)
print(chardet.detect(s))
ss ='编程'.encode('utf-8')
print(chardet.detect(ss))
and results
b'123'
{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
{'encoding': 'utf-8', 'confidence': 0.7525, 'language': ''}
Why it can not detect s
as UTF-8?
And why is ASCII?
Is this line useless? # -*- coding: utf-8 -*-
Python newcomer, thanks!
Let's just talk about these lines--all the meat is there:
You are correct that Python 3 uses Unicode by default. When you say
'123'.encode()
you are converting a Unicode string to a sequence of bytes which will then print with the uglyb
prefix to remind you that it is not a default type of string.