I have a csv which containts the name and lat/lng location information of the Underground Stations in London. It looks like this:
Station Lat Lng
Abbey Road 51.53195199 0.003737786
Abbey Wood 51.49078408 0.120286371
Acton 51.51688696 -0.267675543
Acton Central 51.50875781 -0.263415792
Acton Town 51.50307148 -0.280288296
I wish to transform this csv to create an origin destination matrix of all the possible combinations of these stations. There are 270 stations, thus there are 72,900 possible combinations.
Ultimately I wish to turn this matrix into a csv with the following format
O_Station O_lat O_lng D_Station D_lat D_lng
Abbey Road 51.53195199 0.003737786 Abbey Wood 51.49078408 0.120286371
Abbey Road 51.53195199 0.003737786 Acton 51.51688696 -0.267675543
Abbey Road 51.53195199 0.003737786 Acton Central 51.50875781 -0.263415792
Abbey Wood 51.49078408 0.120286371 Abbey Road 51.53195199 0.003737786
Abbey Wood 51.49078408 0.120286371 Acton 51.51688696 -0.267675543
Abbey Wood 51.49078408 0.120286371 Acton Central 51.50875781 -0.263415792
Acton 51.51688696 -0.267675543 Abbey Road 51.53195199 0.003737786
Acton 51.51688696 -0.267675543 Abbey Wood 51.49078408 0.120286371
Acton 51.51688696 -0.267675543 Acton Central 51.50875781 -0.263415792
The first step would be to pair any station using a loop with all of the other possible stations. I would then need to remove the 0 combinations where an origin and destination were the same station.
Ive tried using the NumPy function column_stack. However this gives a strange result.
import csv
import numpy
from pprint import pprint
numpy.set_printoptions(threshold='nan')
with open('./London stations.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
Stations = ['{O_Station}'.format(**row) for row in reader]
print(Stations)
O_D = numpy.column_stack(([Stations],[Stations]))
pprint(O_D)
OUTPUT
Stations =
['Abbey Road', 'Abbey Wood', 'Acton', 'Acton Central', 'Acton Town']
O_D =
array([['Abbey Road', 'Abbey Wood', 'Acton', 'Acton Central', 'Acton Town',
'Abbey Road', 'Abbey Wood', 'Acton', 'Acton Central', 'Acton Town']],
dtype='|S13')
I am ideally looking for more suitable function and finding it hard to locate it in the Numpy manual.
This is an incomplete answer, but I would skip numpy and head right into
pandas
:This is tough since it isn't really comma-delimited, otherwise we could just call
pandas.read_csv()
:So we end up with
df.head()
yielding:Getting the permutations might mean we need to not have the Stations as the index... Not sure for the moment. Hopefully this helps a bit!