How to parallelize libosmium area parsing?

477 views Asked by At

I'm looking into ways to make libosmium based project to parse through OpenStreetMap PBF areas using multiple threads. Parsing nodes, ways, or relations is multi-threaded, but parsing areas is not.

For example, while running the following code that counts relations, top shows CPU utilization over 400%:

#include <osmium/handler.hpp>
#include <osmium/io/pbf_input.hpp>
#include <osmium/visitor.hpp>

using namespace osmium;

struct MyHandler : public handler::Handler {
    int counter = 0;
    void relation(const Relation &relation) {
        counter++;
    }
};

int main(int argc, char *argv[]) {
    io::File infile(argv[1]);
    MyHandler handler;
    io::Reader reader(infile);
    apply(reader, handler);
    reader.close();
    std::cout << handler.counter << "\n";
}

When I modify it to count areas instead, CPU stays around 100%:

#include <osmium/area/assembler.hpp>
#include <osmium/area/multipolygon_collector.hpp>
#include <osmium/handler/node_locations_for_ways.hpp>
#include <osmium/index/map/sparse_mem_array.hpp>
#include <osmium/io/pbf_input.hpp>

using namespace osmium;

using index_type = index::map::SparseMemArray<unsigned_object_id_type, Location>;
using cache_type = handler::NodeLocationsForWays<index_type>;

struct MyHandler : public handler::Handler {
    int counter = 0;
    void area(const Area &area) {
        counter++;
    }
};

int main(int argc, char *argv[]) {
    // pass 1
    io::File infile(argv[1]);
    area::Assembler::config_type assembler_config;
    area::MultipolygonCollector<area::Assembler>  collector(assembler_config);
    io::Reader reader1(infile, osm_entity_bits::relation);
    collector.read_relations(reader1);
    reader1.close();
    // pass 2
    index_type index;
    cache_type cache{index};
    cache.ignore_errors();
    MyHandler handler;
    io::Reader reader2(infile);
    apply(reader2, cache, collector.handler([&handler](
        const memory::Buffer &&buffer) { apply(buffer, handler); }));
    reader2.close();
    std::cout << handler.counter << "\n";
}

The difference here is that the code runs in 2 passes. The first pass assembles area references from relations, and the second pass then builds areas using node cache and collected relation data. However, such structure does not seem to be multi-threaded.

How can I make this to utilize several threads? Is there an existing approach already built into libosmium or should I start implementing my own thread pools and queues?

0

There are 0 answers