Python with non-latin-1 PYTHONHOME path

535 views Asked by At

In my case I embedded Python into my application. When the path of my application contains a non-latin-1 character Py_Initialize calls exit(1) internally (more information later).

So I checked if can reproduce this with the standard interpreter executable.

Python-2.7.x on Windows doesn't seem to work when the path of PYTHONHOME contains a character outside of latin-1 charset. The problem is that the module site could not be found and imported. Since umlauts seem to work, what is the actual limitation here? Is just latin-1 supported? Why does it work on OSX then?

C:\Users\ъ\Python27\python.exe    // fails to start (KOI8-R)
         ^
C:\Users\ġ\Python27\python.exe    // fails to start (latin-3)
         ^
C:\Users\ä\Python27\python.exe    // works fine (latin-1)
         ^

Any ideas?

Background:

I haven't stepped through the code yet but Python 2.6 and Python 2.7 also behave differently when site is not available. Py 2.6 just prints a message, Py 2.7 rejects to start.

static void
initsite(void)
{
    PyObject *m;
    m = PyImport_ImportModule("site");
    if (m == NULL) {
        ...

        // Python 2.7 and later
        exit(1);

        // Python 2.6 and prior
        PyFile_WriteString("'import site' failed; traceback:\n", f);
    }
    ...
}

Python 2.7: https://github.com/enthought/Python-2.7.3/blob/master/Python/pythonrun.c#L725

Python 2.6: https://github.com/python-git/python/blob/master/Python/pythonrun.c#L705

2

There are 2 answers

3
Serge Ballesta On

I think that the problem is that internally, Python2 processes everything as byte strings in the platform system encoding which is (in western europe) CP1252 a variant of Latin-1. So ther is no surprise that it cannot correctly process a PYTHONHOME path containing other characters

But, when I was younger, I was used to the good old 8.3 format of MS/DOS files...

I can still see (and use them) in a Windows 7 box with DIR /X in a console (CMD.EXE) window. This format only use ASCII uppercase characters and tilda (~), so it could be used as a workaround : just declare the 8.3 path in the environment variable PYTHONHOME, and start python with that 8.3 path.

BTW, it is advisable for PYTHONHOME to use a path that contains neither special characters, nore spaces. It could work, but it could cause problems with other modules

2
hkBst On

Looking at the PyImport_ImportModule function version 2.7 gives this definition:

PyObject *
PyImport_ImportModule(const char *name)
{
    PyObject *pname;
    PyObject *result;

    pname = PyString_FromString(name);
    if (pname == NULL)
        return NULL;
    result = PyImport_Import(pname);
    Py_DECREF(pname);
    return result;
}

While looking at the PyImport_ImportModule function version 3.5 gives the same except with

pname = PyUnicode_FromString(name);

instead of

pname = PyString_FromString(name);

You can look at the code for PyString_FromString and the code for PyUnicode_FromString but it seems clear that python 2 does not use unicode and python 3 does, but I have not been able to find how/where exactly this leads to the behavior you describe.

The PyImport_Import(module_name) function (version 2.7) only uses module_name like so:

r = PyObject_CallFunction(import, "OOOOi", module_name, globals,
                          globals, silly_list, 0, NULL);

passing on the responsibility...