Tensorflow C API compiled as .so library __gnu_cxx::recursive_init_error

86 views Asked by At

Working with Ubuntu 20.04 x86_64, GCC 9.4.0, Tensorflow C API Linux CPU only.

A simple C++ tensorflow model prediction code with cppflow wrap.

//this is predict.cpp

#include <iostream>
#include "cppflow/cppflow.h"

int predict() {
    auto input = cppflow::decode_jpeg(cppflow::read_file(std::string("../my_cat.jpg")));
    input = cppflow::cast(input, TF_UINT8, TF_FLOAT);
    input = cppflow::expand_dims(input, 0);
    cppflow::model model("../model");
    auto output = model(input);
    std::cout << "It's a tiger cat: " << cppflow::arg_max(output, 1) << std::endl;
    return 0;
}
int main(){
    predict();
    return 0;
}

compiled with g++ as executable file, it works fine.

g++ -o predict.out predict.cpp -ltensorflow
./predict.out

However, when I compile this code as a .so library.

//this is predict.cpp
#include <iostream>
#include "cppflow/cppflow.h"

extern "C" int predict() {
    auto input = cppflow::decode_jpeg(cppflow::read_file(std::string("./my_cat.jpg")));
    input = cppflow::cast(input, TF_UINT8, TF_FLOAT);
    input = cppflow::expand_dims(input, 0);
    cppflow::model model("./model");
    printf("model created\n");
    auto output = model(input);
    printf("model predicted\n");
    std::cout << "It's a tiger cat: " << cppflow::arg_max(output, 1) << std::endl;
    return 0;
}

with a g++ command g++ -fPIC -shared -o predict.so predict.cpp -ltensorflow load and execute from another .cpp file, for example:

#include <dlfcn.h>
#include <stdio.h>

int (*predict)();
int main()
{
    void* handle=dlopen("./predict.so",RTLD_LAZY);
    predict=(int(*)(void))dlsym(handle, "predict");
    dlclose(handle);

    predict();

    return 0;
}

I got is error.

model created
terminate called after throwing an instance of '__gnu_cxx::recursive_init_error'
terminate called recursively
terminate called recursively

since cppflow is just a wrap of Tensorflow C AIP, I looked into the auto output = model(input); line which caused this error. It turned out the execution of below sentence of TF C API triggered this error

  TF_SessionRun(this->session.get(), /*run_options*/ NULL,
                inp_ops.data(), inp_val.data(), static_cast<int>(inputs.size()),
                out_ops.data(), out_val.get(), static_cast<int>(outputs.size()),
                /*targets*/ NULL, /*ntargets*/ 0, /*run_metadata*/ NULL,
                this->status.get());

put it in another words, when a .cpp file containing TF_SessionRun() is compiled with g++ as .out, it works. Exported and compiled as.so library and executed in another .cpp file, it fails. A more interesting thing is, when execute the function from .so library multiple times, the error message isn't always the same, sometimes, it prints

model created
terminate called after throwing an instance of '__gnu_cxx::recursive_init_error'
terminate called recursively
terminate called recursively

and some times

model created
terminate called recursively
terminate called after throwing an instance of '__gnu_cxx::recursive_init_error'

look like it is a random wrong memory access error. I checked the error information and the document says:

'If control re-enters the declaration (recursively) while the object is being initialized, the behavior is undefined.'

Still, I can't figure it out why the.out works while the .so failed, when both are compiled with one same compiler. Is it the way I exported the .so library caused this problem? Looking for some clues, appreciate.

0

There are 0 answers