How to cluster by Zip Code - all locations within 100 miles

627 views Asked by At

I have a database without about 700 records, each with their own zip code. I want to show the rows (by unique ID) that fall within a 100 mile cluster. I also have a zip code distance DB from the National Bureau of Economic Research (https://data.nber.org/distance/2022/100miles/). This file is quite large but looks like the table below.

zip1    miles_to_zip2    zip2
1534    50               1001
1534    44               1002
1534    48               1003

This data has 13m rows, but I really only need the distance for the zip codes in my database (about 700 different ones).

I can use Python, R, or Power BI to do this, depending on which is the simplest. How do I group all my unique IDs by a cluster group that identifies locations within 100 miles of each other? For reference, the data I want to analyze looks similar to the table below.

ID     ZipCode     Sales
1      15110       10,000
2      15115       15,000
3      98000       2,000
4      98001       10,000
5      70570       10,000

My goal is essentially to return a column called cluster that would assign ID 1 and 2 to cluster1 and ID 3 and 4 to cluster 2, and 5 to cluster 3 (based on the 100 mile criteria).

0

There are 0 answers