I am using GeoPandas and Pandas. I have a, say, 300,000 rows Dataframe, df, with 4 columns + the index column.
id lat lon geometry
0 2009 40.711174 -73.99682 0
1 536 40.741444 -73.97536 0
2 228 40.754601 -73.97187 0
however the unique ids are only a handful (~200)
I want to generate a shapely.geometry.point.Point object for each (lat,lon) combination, similarly to what shown here: http://nbviewer.ipython.org/gist/kjordahl/7129098 (see cell#5), where it loops through all rows of the dataframe; but for such a big dataset, I wanted to limit the loop to the much smaller number of unique ids.
Therefore, for a given id value, idvalue (i.e., 2009 from the first row) create the GeoSeries, and assign it directly to ALL rows that have id==idvalue
My code looks like:
for count, iunique in enumerate(df.if.unique()):
sc_start = GeoSeries([Point(np.array(df[df.if==iunique].lon)[0],np.array(df[df.if==iunique].lat)[0])])
df.loc[iunique,['geometry']] = sc_start
however things don't work - the geometry field does not change - and I think is because the indexes of sc_start don't match with the indexes of df.
how can I solve this? should I just stick to the loop through the whole df?
I would take the following approach:
First find the unique id's and create a GeoSeries of Points for this:
Then merge these geometries with the original dataframe on matching ids: