Why does this dynamic library loading code work with gcc?

994 views Asked by At

Background:

I've found myself with the unenviable task of porting a C++ GNU/Linux application over to Windows. One of the things this application does is search for shared libraries on specific paths and then loads classes out of them dynamically using the posix dlopen() and dlsym() calls. We have a very good reason for doing loading this way that I will not go into here.

The Problem:

To dynamically discover symbols generated by a C++ compiler with dlsym() or GetProcAddress() they must be unmangled by using an extern "C" linkage block. For example:

#include <list>
#include <string>

using std::list;
using std::string;

extern "C" {

    list<string> get_list()
    {
        list<string> myList;
        myList.push_back("list object");
        return myList;
    }

}

This code is perfectly valid C++ and compiles and runs on numerous compilers on both Linux and Windows. It, however, does not compile with MSVC because "the return type is not valid C". The workaround we've come up with is to change the function to return a pointer to the list instead of the list object:

#include <list>
#include <string>

using std::list;
using std::string;

extern "C" {

    list<string>* get_list()
    {
        list<string>* myList = new list<string>();
        myList->push_back("ptr to list");
        return myList;
    }

}

I've been trying to find an optimal solution for the GNU/Linux loader that will either work with both the new functions and the old legacy function prototype or at least detect when the deprecated function is encountered and issue a warning. It would be unseemly for our users if the code just segfaulted when they tried to use an old library. My original idea was to set a SIGSEGV signal handler during the call to get_list (I know this is icky - I'm open to better ideas). So just to confirm that loading an old library would segfault where I thought it would I ran a library using the old function prototype (returning a list object) through the new loading code (that expects a pointer to a list) and to my surprise it just worked. The question I have is why?

The below loading code works with both function prototypes listed above. I've confirmed that it works on Fedora 12, RedHat 5.5, and RedHawk 5.1 using gcc versions 4.1.2 and 4.4.4. Compile the libraries using g++ with -shared and -fPIC and the executable needs to be linked against dl (-ldl).

#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include <list>
#include <string>

using std::list;
using std::string;

int main(int argc, char **argv)
{
    void *handle;
    list<string>* (*getList)(void);
    char *error;

    handle = dlopen("library path", RTLD_LAZY);
    if (!handle)
    {
        fprintf(stderr, "%s\n", dlerror());
        exit(EXIT_FAILURE);
    }

    dlerror();

    *(void **) (&getList) = dlsym(handle, "get_list");

    if ((error = dlerror()) != NULL)
    {
        printf("%s\n", error);
        exit(EXIT_FAILURE);
    }

    list<string>* libList = (*getList)();

    for(list<string>::iterator iter = libList->begin();
          iter != libList->end(); iter++)
    {
        printf("\t%s\n", iter->c_str());
    }

    dlclose(handle);

    exit(EXIT_SUCCESS);
}
1

There are 1 answers

0
Chris Dodd On BEST ANSWER

As aschepler says, its because you got lucky.

As it turns out, the ABI used for gcc (and most other compilers) for both x86 and x64 returns 'large' structs (too big to fit in a register) by passing an extra 'hidden' pointer arg to the function, which uses that pointer as space to store the return value, and then returns the pointer itself. So it turns out that a function of the form

struct foo func(...)

is roughly equivlant to

struct foo *func(..., struct foo *)

where the caller is expected to allocate space for a 'foo' (probably on the stack) and pass in a pointer to it.

So it just happens that if you have a function that is expecting to be called this way (expecting to return a struct) and instead call it via a function pointer that returns a pointer, it MAY appear to work -- if the garbage bits it gets for the extra arg (random register contents left there by the caller) happen to point to somewhere writable, the called function will happily write its return value there and then return that pointer, so the called code will get back something that looks a like a valid pointer to the struct it is expecting. So the code may superficially appear to work, but its actually probably clobbering a random bit of memory that may be important later.