Encoding 'utf-16' is not consistent when convert lisp string from/to C string

Question

Encoding 'utf-16' is not consistent when convert lisp string from/to C string

229 views Asked by xiepan At 15 June 2015 at 09:35

I find when use 'utf-16' as the encoding to convert a lisp string to C string with cffi, the actual encoding used is 'utf-16le'. But, when convert C string back to lisp string, the actual encoding used is 'utf-16be'. Since I'm not familiar with 'babel' yet (which provides the encoding facility for 'cffi'), I'm not sure whether that's a bug.

(defun convtest (str to-c from-c)
  (multiple-value-bind (ptr size)
      (cffi:foreign-string-alloc str :encoding to-c)
    (declare (ignore size))
    (prog1
        (cffi:foreign-string-to-lisp ptr :encoding from-c)
      (cffi:foreign-string-free ptr))))

(convtest "hello" :utf-16   :utf-16)     ;=> garbage string
(convtest "hello" :utf-16   :utf-16le)   ;=> "hello"
(convtest "hello" :utf-16   :utf-16be)   ;=> garbage string
(convtest "hello" :utf-16le :utf-16be)   ;=> garbage string
(convtest "hello" :utf-16le :utf-16le)   ;=> "hello"

The `convtest' convert a lisp string to C string then back to lisp string, with the `to-c', `from-c' as encoding. All the output garbage string are the same. From the test we see that if we use 'utf-16' as `to-c' and `from-c' at the same time, the conversion failed.

Original Q&A

There are 1 answers

**Rainer Joswig** · Answer 1 · 2015-06-15T19:07:02+00:00

Here the encoding to-c assumes little endian (le) by default. From-c then has big-endian as default (be).

The platform itself (x86) is little endian. UTF-16 prefers big endian or takes the information from a byte-order mark.

This probably depends on the platform you are running on? Platforms seem to have different defaults.

Best to look into the source code, why those encodings are chosen. Also you may ask on the CFFI mailing list about the encoding choices and how they depend on the platform, if at all.

TechQA.

Encoding 'utf-16' is not consistent when convert lisp string from/to C string

There are 1 answers

Related Questions in ENCODING

Related Questions in COMMON-LISP

Related Questions in BABELJS

Related Questions in CFFI

Popular Questions

Popular Tags

Trending Questions