second argument of the command line arguments in a format other than char** argv or char* argv[]

1.5k views Asked by At

To solve my problem here, I want to know if/how I can define the second variable of the command line arguments in a format other than char** argv or char* argv[]. The reason is that pybind11 doesn't allow either of those in the inputs of a function. Here are the methods I have tried:

Method 1:

#include <stdio.h>

int main(int argc, int* argv_){
    for (int i = 0; i < argc; ++i){
        printf("%s\n", (char *)(argv_[i]));
    }
}

The rationale behind this method is that a pointer is intrinsically an integer and by casting the address to a char pointer, one should be able to get the strings. Thanks for your kind support in advance.

Method 2:

#include <stdio.h>
#include <string>

int main(int argc, std::string* argv_){
    for (int i = 0; i < argc; ++i){
        printf("%s\n", argv_[i].c_str());
    }
}

Method 3:

#include <stdio.h>
#include <string>
#include <vector>

int main(int argc, std::vector<std::string> argv_){
    for (int i = 0; i < argc; ++i){
        const char* argv__ = argv_[i].c_str();
        printf("%s\n", argv_[i].c_str());
    }
}

issue:

Unfortunately, all of the above methods lead to the infamous segmentation fault.

I would appreciate it if you could help me know what is the problem (i.e., where is the memory leak) and how to solve them.

workaround/hack:

In the comments I'm being told that if any other form rather than main(), main(int argc, char** argv), or main(int argc, char* argv[]) is used, it will unavoidably lead to segmentation fault. However, the code below works:

#include <stdio.h>

int main(int argc, long* argv_){
    for (int i = 0; i < argc; ++i){
        printf("%s\n", (char *)(argv_[i]));
    }
}

This works on an Ubuntu minimal and g++ 7.4.0, and Windows 10 Visual Studio 2019 compilers. However, it does not compile with clang. As others have pointed out this is not a solution and a very bad practice. It can cause undefined behavior depending on the compiler, operating system and the current state of the memory. This should not be used in any actual code ever. The main function in any C/C++ code must be of the forms main(), main(int argc, char** argv), or main(int argc, char* argv[]).

2

There are 2 answers

5
DevSolar On

Let's try to tackle the plethora of issues that have cropped up during the lengthy discussion, one by one.


Question 1: Why do I get a segfault when using some non-standard parameters (like string vector or int pointer) to main?

The parameter types of int, char ** are defined that way by both the C and the C++ standard. Non-standard extensions aside, you cannot use other types.

From ISO/IEC 9899 (The C Language), 5.1.2.2.1 Program startup:

The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters:

int main(void) { /* ... */ }

or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared):

int main(int argc, char *argv[]) { /* ... */ }

or equivalent; or in some other implementation-defined manner.

That last sentence allows for those extensions I mentioned. One such extension I know of is GCC's environ:

https://www.gnu.org/software/libc/manual/html_node/Program-Arguments.html#Program-Arguments


Question 2: How do I hack around this?

You don't.

Using different types than those defined by the standard, or by compiler extensions, is Undefined Behavior, which can -- but does not need to -- lead to segfaults. Do not invoke undefined behavior. Do not "hack around" the standard. It is not a "workaround", let alone a "solution", it is broken code that can blow up in your face any time.


Question 3: How do I pybind a third-party function that takes a char ** as parameter?

You don't, as this is not a datatype supported by pybind.


Question 4: How do I interface such a function through pybind, then?

You write a wrapper function that, on the front end, takes parameters supported by pybind (e.g. std::vector< std::string >), appropriately marshals those, and then calls the third-party backend function for you with the marshalled arguments. (Then, of course, doing the same in reverse for the return type, if required.)

For an idiomatic example on how to do that, see the answer by @TedLyngmo.


Question 5: Can I pybind to a third-party main?

This is ill-advised, as main is a special function, and the called code may make assumptions (like atexit callbacks) that your calling code does not, and can not, comply with. It is certainly not a function the third party ever expected to be called as a library function.

3
Ted Lyngmo On

It doesn't look like it needs to be main after all, so you could do like this:

#include <iostream>
#include <string>
#include <vector>

int cppmain(std::string program, std::vector<std::string> args) {
    std::cout << program << " got arguments:\n";
    for(auto& arg : args) {
        std::cout << " " << arg << "\n";
    }
    return 0;
}

int main(int argc, char* argv[]) {
    // create a string from the program name and a vector of strings from the arguments
    return cppmain(argv[0], {argv + 1, argv + argc});
}

In case you need to call a closed source main-like function (that you can not change), create a wrapper function that you can pybind to and let that function call the closed source function.

#include <cstddef>
#include <iostream>
#include <string>
#include <vector>

int closed_source_function(int argc, char* argv[]) {
    for(int i = 0; i < argc; ++i) {
        std::cout << argv[i] << '\n';
    }
    return 0;
}

int pybind_to_this(std::vector<std::string> args) {
    // create a char*[]
    std::vector<char*> argv(args.size() + 1);

    // make the pointers point to the C strings in the std::strings in the
    // std::vector
    for(size_t i = 0; i < args.size(); ++i) {
        argv[i] = args[i].data();
    }

    // add a terminating nullptr (main wants that, so perhaps the closed source
    // function wants it too)
    argv[args.size()] = nullptr;

    // call the closed source function
    return closed_source_function(static_cast<int>(args.size()), argv.data());
}