I'm working on some code to select and export geodata based on a bounding box. The data I want to select comes from 2 seperate layers in a huge File GDB (16GB) covering the entire Netherlands. I use a bounding box as to avoid reading the entire dataset before making a selection.
This method works great when applied on a gpkg database, but with a file geodatabase the time to process is way longer (0,2s vs 300s for a 200x200 meter selection). The File GDB I'm using has a spatial index set for the layers I'm reading. I'm using geopandas to read and select. Below you'll find an example for the layer 'Adres':
import geopandas as gpd
def ImportGeodata(FilePath, BoundingBox):
importBag=gpd.read_file(FilePath, layer='Adres', bbox=BoundingBox)
importBag['mergeid']=importBag['identificatie']
return importBag
Am I overseeing something? Or is this a limitation when importing from a huge File GDB? I can't find an obvious mistake here. For now the workaround is another script that imports and dumps the layers I need in a gpkg. Problem is this runs for 3 to 4 hours (gpkg result is almost 6 GB). I don't want to keep doing that, it would be necessary to do once every month or so in order to process a new version of this dataset.
Curious what you guys come up with.