Displace geographic points which lie too closely together (Python / Shapely)

254 views Asked by At

I've got a dataset of refineries in Texas (GeoJSON here - https://pastebin.com/R0D9fif9 ):

Name,Latitude,Longitude
Marathon Petroleum,29.374722,-94.933611
Marathon Petroleum,29.368733,-94.903253
Valero,29.367617,-94.909515
LyondellBasell,29.71584,-95.234814
Valero,29.722213,-95.255198
Exxon,29.743865,-95.009208
Shell,29.720425,-95.12495
Petrobras,29.722466,-95.208807

I would like to create a printed map out of these points. But they lie too closely together at a given resolution.

Since every refinery should get mentioned in the legend, I can't cluster. So I would like to

  • Get the centroid - that was easy

    import json
    import csv
    from shapely.geometry import shape, Point, MultiPoint
    
    with open('refineries.csv', 'rU') as infile:
        reader = csv.DictReader(infile)
        data = {}
        for row in reader:
            for header, value in row.items():
                try:
                    data[header].append(value)
                except KeyError:
                    data[header] = [value]
    
     listo = list(zip(data['Longitude'], data['Latitude']))
     points1 = MultiPoint(points=listo)
    
     points = MultiPoint([(-94.933611, 29.374722), (-94.903253, 29.368733), (-94.909515, 29.367617), (-95.234814, 29.71584), (-95.255198, 29.722213), (-95.009208, 29.743865), (-95.12495, 29.720425), (-95.208807, 29.722466)])
    
     print(points.centroid)
    
  • Shift all points away from the centroid until a minimum distance between all is reached

May you please help me here? Thanks in advance!

1

There are 1 answers

2
ewcz On

It depends how exactly do you want to shift the points away from the centroid. One way would be to calculate for each point its great-circle distance and azimuth with respect to the centroid and rescale all distances in order to ensure that the distance between the two closest points is larger than a specified threshold. In the example below, pyproj is used for the calculation of the azimuths and distances.

import json
import csv
import sys
from shapely.geometry import shape, Point, MultiPoint
from pyproj import Geod

with open('refineries.csv', 'rU') as infile:
    reader = csv.DictReader(infile)
    data = {}
    for row in reader:
        for header, value in row.items():
            if not header in data:
                data[header] = []
            data[header].append(value)

listo = list(zip(map(float, data['Longitude']), map(float, data['Latitude'])))

def scale_coords(coords, required_dist = 1000.):
    g = Geod(ellps = 'WGS84')

    num_of_points = len(coords)

    #calculate centroid
    C = MultiPoint(coords).centroid

    #determine the minimum distance among points
    dist_min, dist_max = float('inf'), float('-inf')
    for i in range(num_of_points):
        lon_i, lat_i = coords[i]
        for j in range(i+1, num_of_points):
            lon_j, lat_j = coords[j]
            _,_,dist = g.inv(lon_i, lat_i, lon_j, lat_j)
            dist_min = min(dist_min, dist)
            dist_max = max(dist_max, dist)

    if dist_min > required_dist:
        return coords

    coords_scaled = [None]*num_of_points
    scaling = required_dist / dist_min
    for i, (lon_i, lat_i) in enumerate(coords):
        az,_,dist = g.inv(C.x, C.y, lon_i, lat_i)
        lon_f,lat_f,_ = g.fwd(C.x, C.y, az, dist*scaling)
        coords_scaled[i] = (lon_f, lat_f)

    return coords_scaled

Alternatively, this might be combined with the approach within which you also relax the azimuths. This would in principle result in smaller scaling factor for the "radial" distances. However, it would also slightly distort the "visual distribution" of the points. Also, the method presented above might be "improved" by ignoring any outlier points in the rescaling, i.e., points which are already sufficiently far from the centroid and which have no nearby neighbors.