How to create a "PyObject"-like structure in C++ for a dynamically typed programming language?

73 views Asked by At

I am developing a programming language in C++. I have a created a parser which makes a full abstract syntax tree for my language. The next step in my idea was to transpile the AST back into C++ code. So it is a compiled programing language, but you may call it directly from C++. The goal of the language is simple scripting language with interoperability with C++.

The problem:

The difficulty arose when I wanted to make the scripting language a dynamically typed language. I read about the void* type which can point to anything which lead me to creating an AnyTypePointer object. Realizing this was the same as std:any in the end as I had to check every type upon retrieval no matter what. There is no way to check the contained type without actually casting the contained pointer to a type it could be and failing then trying again. So my other attempt was using a tuple and std::visit which allowed me to iterate over any type but the issue was that tuples are a template not a class variables could not change type once assigned. std::variant attempts also failed as the type of the value may not changed once assigned. I am staring at my screen with no clue where to go to progress- maybe it is a dead end.

what other options can I try? I tried custom raw void* class, std::any + type_index map, std::variant, std::tuple + std::visit. Nothing seems to do the job.

I have read about PyObject written in C which performs this exact task. But I cant seem to translate it to C++ as my C understanding is low.

How does can I create a PyObject-like class or system in C++ for my dynamically typed language ? A point in the right direction would help. I cannot find any info relating to C++, only C.

This is a sample script in my programming language:

    example2.candi
// These classes are unrelated in heiarchy.
// But We can still make a function that works with all of them.

#class Horse {
    #func make_sound() { #return "Neigh!"};
}

#class Cow {
    #func make_sound() { #return "Moo!"};
}

#class Wolf {
    #func make_sound() { #return "Oooo!"};
}

#class Cricket {
    #func make_sound() { #return "Chirp!"};
}

#func make_sound(animal) {
    #return animal.make_sound();
}

#var farm_animals = {Horse(), Cow()};
#var all_animals = farm_animals + {Wolf(),Cricket()}; // You can combine the generic list.


#func make_sounds(animal_list{}) {
    #var sounds &string;
    #for(animal : animal_list) {
        sounds += make_sound(animal);
    }
    #return sounds;
}

#return make_sounds(all_animals);

Here is the sample transpiled code, works well but it is not scaleable:

// Typedefs for ease of reading on stackoverflow, tranpilation would use the actual types/method calls.
using any_list = std::vector<std::any>;
any_list concat_vector(const any_list& a, const any_list& b) {
any_list result;
result.reserve(a.size() + b.size());
result.insert(result.end(), a.begin(), a.end());
result.insert(result.end(), b.begin(), b.end());
return result;
}

any_list operator+(const any_list& a, const any_list& b) {
return concat_vector(a, b);
}

struct script_example2 {

auto run() {
    class Horse {
    public:
        auto make_sound() {
            return "Neigh!";
        }
    };

    class Cow {
    public:
        auto make_sound() {
            return "Moo!";
        }
    };

    class Wolf {
    public:
        auto make_sound() {
            return "Oooo!";
        }
    };

    class Cricket {
    public:
        auto make_sound() {
            return "Chirp!";
        }
    };

    auto make_sound = [](auto& animal) {
        return animal.make_sound();
    };

    std::any farm_animals = any_list{ Horse(), Cow() }; // WE KNOW farm_animals is of type std::vector<std::any> holding Horse, Cow types
    std::any all_animals = std::any_cast<any_list>(farm_animals) + any_list{Wolf(),Cricket()}; // WE KNOW all_animals contains Wolf and Crickets too now.

    auto make_sounds = [](auto& animal_list) {
        // How can we know its a list ? We can't. We can only know its a std::any. or test for all possible types.
        std::string sounds;
        for (auto& animal : std::any_cast<any_list>(animal_list)) {
            // Hmm.. apparently we cant find out what type is contained in the animal list? How can the contained type be stored along with the list?
            // We can test for every possible type, but that is not very efficient.
            // Using a visitor parrent still requires a check for every possible type.
            // How does Python do it?!
            try {
                sounds += std::any_cast<Horse>(animal).make_sound();
            }
            catch (const std::bad_any_cast& e) {
                try {
                    sounds += std::any_cast<Cow>(animal).make_sound();
                }
                catch (const std::bad_any_cast& e) {
                    try {
                        sounds += std::any_cast<Wolf>(animal).make_sound();
                    }
                    catch (const std::bad_any_cast& e) {
                        try {
                            sounds += std::any_cast<Cricket>(animal).make_sound();
                        }
                        catch (const std::bad_any_cast& e) {
                            throw std::runtime_error("Type cannot make_sound");
                        }
                    }
                }
            }
        }
        return sounds;
    };


    return make_sounds(all_animals);
}

Using the script in C++.

candi::script_example1 script;
auto result = script.run();
std::cout << "Script result: " << result << std::endl;
1

There are 1 answers

0
Big Teeny On

I will answer my own question as I have somewhat figured it out. Also added some code snippets of the class structures from my personal project. sl_ is same as std::, and rtenv is the runtime environment class. The difficulty is actually finding the name of this concept so you can research it.

A PyObject-like class is a combination of pointers to literal types or other PyObjects containing pointers to literal types. In the end all a struct is a combination of literal members and method:

struct runtime_value {
    enum eType {
        NUMBER = 0,
        REAL = 1,
        STRING = 2,
        BIT = 3,
        BYTE = 4,
        NONE = 5,
        UNSIGNED = 6,
        OBJECT = 7,
        FUNCTION = 8
    } type;

    sl_variant<int, double, sl_string, bool, unsigned char, none_t,unsigned, sl_shared_ptr<runtime_object>,sl_shared_ptr<function_t>> value;
}
class runtime_object {
    sl_string name_;
    rtenv& scope_;

    sl_map<sl_string, runtime_value > members_;
sl_map<sl_string, runtime_method > methods_;
    }

The next mystery is how to store a method in a PyObject, how is a method modeled?

You simply store the unevaluated AST of the method definition. Then when you call the method- you evaluate the AST based on the current environment variables(locals/globals).

class runtime_method {
sl_string name_;
rtenv& scope_;
sl_vector<sl_string> args_;
astnode body_;}

Lastly, I'd love to be proven wrong but I believe it impossible to achieve seamless interoperability with C++ if you are a dynamically typed language which is interpreted. Languages that do have seamless interop are also static(D, Go).

Some Topics to research for details:

  • Type Erasure
  • Foreign Function Interfaces (FFIs)
  • DLL Symbol Lookup (C interop)

This isn't meant to be the end-all answer, still learning. I hope it will provide others with some tips!