Why was the Python Unicode internal format implemented as described in PEP 100?

Question

Why was the Python Unicode internal format implemented as described in PEP 100?

626 views Asked by mkelley33 At 14 September 2024 at 00:23

http://www.python.org/dev/peps/pep-0100/

PEP 100 states that the internal format, Python Unicode, holds UTF-16 encodings, but addresses the values as UCS-2 (or UCS-4 when compiled with flag --enable-unicode=ucs4).

Why wasn't UTF-16 chosen (a variable length format) as opposed to UCS-2 (fixed length)?

Though the two encodings are largely the same, UTF-16 was already 4 years old when PEP-100 was published (2000 Mar). Was Python Unicode meant to address backwards compatibility issues?

I'm really just curious as to why Python's internal format was implemented using this (seemingly) hybrid approach to store encoded data internally?

A better way to ask my question might be: does anyone have a citation or link with quote from an official document that specifically states why PEP 100 chose to treat UTF-16 as UCS-2 instead of using UTF-16?

Original Q&A

There are 1 answers

**John Machin** · Answer 1 · 2011-11-05 21:17:01

Read on a little further: "UCS-2 and UTF-16 are the same for all currently defined Unicode character points" ... and that was true in the year 2000 when the PEP was written. The initial implementation covered only the BMP (first 64K codepoints).

TechQA.

Why was the Python Unicode internal format implemented as described in PEP 100?

There are 1 answers

Related Questions in PYTHON

Related Questions in UNICODE

Related Questions in ENCODING

Related Questions in UTF-16

Related Questions in UCS2

Popular Questions

Popular Tags

Trending Questions