After longer than I probably should have waited it's time to ask some experts. I am working with the following data set https://gadm.org/data.html. I have downloaded the world polygons as a .gpkg file and am looking at level 1 of the polygons. My end goal is to create a choropleth of the world at the state or state equivalent level. I have the following code that works up until the choropleth part.
import geopandas as gpd
import pandas as pd
import shapely
import matplotlib.pyplot as plt
import json
import plotly.graph_objects as go
import numpy as np
# set options so columns aren't hidden
pd.set_option('display.max_columns', None)
# set options so rows aren't hidden
pd.set_option('display.max_rows', None)
'''It is possible to reference multiple layers in the gadm gpkg file.
0 is country, 1 is state, and 2 is county.
There are levels of more granularity but are not relevant to the US
https://gadm.org/download_country.html
'''
# pulling county level locations
gdf_1 = gpd.read_file("gadm_410-levels.gpkg", layer="ADM_1")
# creating place holder color value
gdf_1['randNumCol'] = np.random.randint(0, 1500000, gdf_1.shape[0])
# aggregate state loss for randNumCol
states_fin_agg = gdf_1.groupby(['GID_0','COUNTRY','GID_1','NAME_1','VARNAME_1','NL_NAME_1'
,'TYPE_1','ENGTYPE_1','CC_1','HASC_1','ISO_1','geometry'])['randNumCol'].\
sum().reset_index()
# incurred loss zeros
states_fin_agg['randNumCol'] = states_fin_agg['randNumCol'].fillna(0)
# keeping only necessary columns
states_fin_agg = states_fin_agg[['COUNTRY','GID_1','NAME_1','randNumCol','geometry']]
# create geodataframe
states_fin_agg = gpd.GeoDataFrame(states_fin_agg)
# creating JSON
geojson = json.loads(states_fin_agg.to_json())
# choroplethmapbox
data = go.Choroplethmapbox(geojson=geojson
,locations=states_fin_agg['GID_1']
,z=states_fin_agg['randNumCol']
,colorscale='Viridis'
,marker_opacity=1
,marker_line_width=0.1)
# graphing choropleth
fig = go.Figure(data)
fig.show()
The graph crashes due to memory allocation. My guess is that's a symptom of my problem, not the actual problem. The geometry column in the df is a multipolygon so that might be the problem. That said when I convert to polygon with .explode() I end up with multiple rows for the same state and even more rows of data.
Is there a better way to do this? The end goal is to have a world map in Choroplethmapbox() where the color variable is 'randNumCol'. Any help would be greatly appreciated.
Best option I found was use QGIS. Plotly is too slow.