python chardet can not detect utf-8 correctly

Question

python chardet can not detect utf-8 correctly

1.8k views Asked by alwayslz At 09 September 2017 at 14:36

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import chardet
s = '123'.encode('utf-8')
print(s)
print(chardet.detect(s))

ss ='编程'.encode('utf-8')
print(chardet.detect(ss))

and results

b'123'
{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
{'encoding': 'utf-8', 'confidence': 0.7525, 'language': ''}

Why it can not detect s as UTF-8?

And why is ASCII?

Is this line useless? # -*- coding: utf-8 -*- Python newcomer, thanks!

Original Q&A

There are 1 answers

**John Zwinck** · Accepted Answer · 2017-09-09T14:42:26+00:00

Let's just talk about these lines--all the meat is there:

s = '123'.encode('utf-8')
print(s)

You are correct that Python 3 uses Unicode by default. When you say '123'.encode() you are converting a Unicode string to a sequence of bytes which will then print with the ugly b prefix to remind you that it is not a default type of string.

TechQA.

python chardet can not detect utf-8 correctly

There are 1 answers

Related Questions in PYTHON

Related Questions in ENCODE

Related Questions in CHARDET

Popular Questions

Popular Tags

Trending Questions