A bunch of unicode-related functionality was removed from the Python 3.12 C-API. Unfortunately for me, there's a very old piece of code (~2010) in our library that uses these and I need to migrate this functionality somehow over to 3.12 since we're looking to upgrade to 3.12 eventually. One thing I'm specifically struggling with is the removal of the u# parameter. The following piece of code would parse any positional parameters passed to foo (including unicode strings), and store them in input:
static PyObject *
foo(PyObject *self, PyObject *args) {
Py_UNICODE *input;
Py_ssize_t length;
if (!PyArg_ParseTuple(args, "u#", &input, &length)) {
return NULL;
}
...
}
However, according to the docs, the u# has been removed:
Changed in version 3.12:
u,u#,Z, andZ#are removed because they used a legacy Py_UNICODE* representation.
and the current code simply throws something like bad-format-character when this is compiled and used in pure python.
Py_UNICODE is just wchar_t so that's easily fixed. But with the removal of u# I am not sure how to get PyArg_ParseTuple to accept unicode input arguments. Using s# instead of u# does not work since it won't handle anything widechar. How do I migrate this call in Python 3.12?
s#handles Unicode fine, but it gives you UTF-8 rather thanwchar_t. If you specifically need a wchar representation, you can get one from a string object withPyUnicode_AsWideCharString:Unlike the old
Py_UNICODEAPI, this allocates a new buffer, which you have to free withPyMem_Freewhen you're done with it.