What is a code point and code space?

2.4k views Asked by At

I was reading the Wikipedia article on code points, but not sure if I understand correctly.

For example, the character encoding scheme ASCII comprises 128 code points in the range 0hex to 7Fhex

So is 0hex a code point?

Also could not find anything on code space.

PS. If it's a duplicate please post a link in the comments and I'll remove the question.

2

There are 2 answers

4
uraimo On BEST ANSWER

A code point is a numerical code that refers to a single element/character in a specific coded character set, that sentence means that ASCII has 128 possible symbols (only a part of those will be printable characters) and each one of those has a related numerical code by which it can be identified/addressed, the code point.

For an alternative wording, check out this Joel's post and this summary by Oracle that also introduces the concept of code unit :)

To give you a real world example of what code points are, consider the unicode character snowman ☃, its code point (with unicode syntax U+<code point in hex>) is U+2603.

0
tripleee On

The concepts are slightly more abstract than the traditional, pre-Unicode concepts.

Traditionally, a "code space" was more or less synonymous with "character range". A 7-bit encoding would have a code space from 0 through 127, an 8-bit encoding 0 through 255, a 16-bit encoding 0 through 65535. Unicode has a code space from 0 through 0x10FFFF, though parts of the code space are unpopulated.

Traditionally, a "code point" was more or less synonymous with "character code". Unicode abstracts away from the single "character code" mapping to emphasize that there is a more-complex relationship between a set of glyphs and a set of character codes, and that some code points (such as joining modifiers) do not encode individual glyphs as such. Superficially, U+0020 is still the same character as ASCII SPACE 0x20, but Unicode has a much richer set of well-defined attributes and relationships.

Unicode had to coin new terms for these concepts so as not to overload the traditional terms with extended meanings. A "code space" is a unique, well-defined concept, which is not exactly the same thing as an (implicitly contiguous, possibly fully populated) character range. A "code point" is a unique, well-defined concept, which is not exactly the same thing as a "character code" (which isn't even entirely well-defined in the first place; it has multiple ambiguous interpretations).