Pandas and GeoPandas indexing and slicing

3.5k views Asked by At

I am using GeoPandas and Pandas. I have a, say, 300,000 rows Dataframe, df, with 4 columns + the index column.

        id      lat         lon     geometry
     0  2009 40.711174  -73.99682    0
     1  536 40.741444   -73.97536    0
     2  228 40.754601   -73.97187    0

however the unique ids are only a handful (~200)

I want to generate a shapely.geometry.point.Point object for each (lat,lon) combination, similarly to what shown here: http://nbviewer.ipython.org/gist/kjordahl/7129098 (see cell#5), where it loops through all rows of the dataframe; but for such a big dataset, I wanted to limit the loop to the much smaller number of unique ids.

Therefore, for a given id value, idvalue (i.e., 2009 from the first row) create the GeoSeries, and assign it directly to ALL rows that have id==idvalue

My code looks like:

    for count, iunique in enumerate(df.if.unique()):
        sc_start = GeoSeries([Point(np.array(df[df.if==iunique].lon)[0],np.array(df[df.if==iunique].lat)[0])])
        df.loc[iunique,['geometry']] = sc_start

however things don't work - the geometry field does not change - and I think is because the indexes of sc_start don't match with the indexes of df.

how can I solve this? should I just stick to the loop through the whole df?

1

There are 1 answers

2
joris On BEST ANSWER

I would take the following approach:

  1. First find the unique id's and create a GeoSeries of Points for this:

    unique_ids = df.groupby('id', as_index=False).first()
    unique_ids['geometry'] = GeoSeries([Point(x, y) for x, y in zip(unique_ids['lon'], unique_ids['lat'])])
    
  2. Then merge these geometries with the original dataframe on matching ids:

    df.merge(unique_ids[['id', 'geometry']], how='left', on='id')