I have a shapefile containing linestrings
that describes connection between cities in Brazil. I would like to convert these connections into a neighbourhood object with the cities' code set as the row name, making it compatible with my data frame:
> head(regic_link)
Simple feature collection with 6 features and 6 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -45.7931 ymin: -21.1472 xmax: -41.3903 ymax: -15.9032
proj4string: +proj=longlat +ellps=GRS80 +no_defs
id origin_code nome_ori dest_code nome_dest dist_km geometry
1 14016 3123304 Dores do Turvo 3165701 Senador Firmino 11.732462 LINESTRING (-43.1884 -20.97...
2 14117 3124708 Estrela do Indaiá 3166600 Serra da Saudade 9.080594 LINESTRING (-45.7878 -19.52...
3 15205 3138658 Lontra 3135357 Japonvar 11.104687 LINESTRING (-44.3029 -15.90...
4 17147 3163300 São José do Divino 3144904 Nova Módica 13.066889 LINESTRING (-41.3903 -18.48...
5 17151 3163409 São José do Goiabal 3121803 Dionísio 12.078370 LINESTRING (-42.7077 -19.92...
6 12463 3102100 Alto Rio Doce 3121506 Desterro do Melo 17.982702 LINESTRING (-43.412 -21.025...
So the row name would be set as origin_code
and the neighbour would be set to dest_code
in the list and vice versa (eventually this would be changed to an index I create but this makes things easier to check). Essentially, I need the linestring
equivalent for the following code used for polygons:
nb.orig <- poly2nb(as_Spatial(shp), row.names = shp$index)
names(nb.orig) <- attr(nb.orig, "region.id")
nb2INLA("output/nb_orig.graph", nb.orig)
(shp
is a shapefile consisting of polygons, and the index
variable is present in both the shapefile and the data frame).
So far, I have used the sfnetworks
and igraph
packages to create a neighbourhood object but have not been able to attach index values to the row names:
regic_net <- as_sfnetwork(regic_link, directed = F)
net_adj <- as_adjacency_matrix(regic_net, names = T)
nb_net <- mat2listw(net_adj)$neighbours
The function as_sfnetwork
assigns a number to each of the nodes in the network by default (in the edge data, these are variables 'from' and 'to' and cannot be changed using mutate):
> regic_net
# A sfnetwork with 827 nodes and 3786 edges
#
# CRS: NA
#
# An undirected simple graph with 1 component with spatially explicit edges
#
# Edge Data: 3,786 x 7 (active)
# Geometry type: LINESTRING
# Dimension: XY
# Bounding box: xmin: -50.6938 ymin: -22.855 xmax: -39.9496 ymax: -14.2696
from to origin_code nome_ori dest_code nome_dest geometry
<int> <int> <dbl> <chr> <dbl> <chr> <LINESTRING [°]>
1 1 2 3123304 Dores do Turvo 3165701 Senador Firmino (-43.1884 -20.975, -43.18106 -20.97017, -43.17373 -20.96534, -43.16639 -20.9605, …
2 3 4 3163409 São José do Goiab… 3121803 Dionísio (-42.7077 -19.9255, -42.71344 -19.91851, -42.71917 -19.91152, -42.72491 -19.90453…
3 5 6 3167707 Sobrália 3162609 São João do Orien… (-42.0972 -19.2363, -42.1016 -19.2437, -42.106 -19.2511, -42.11039 -19.2585, -42.…
4 5 7 3167707 Sobrália 3168408 Tarumirim (-42.0972 -19.2363, -42.08899 -19.24038, -42.08079 -19.24447, -42.07258 -19.24855…
5 8 9 3115474 Catuti 3141009 Mato Verde (-42.9607 -15.3612, -42.95243 -15.36428, -42.94415 -15.36735, -42.93588 -15.37043…
6 10 11 3130606 Inconfidentes 3108305 Borda da Mata (-46.328 -22.3174, -46.31896 -22.31505, -46.30993 -22.3127, -46.30089 -22.31034, …
# … with 3,780 more rows
#
# Node Data: 827 x 1
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: -50.6938 ymin: -22.855 xmax: -39.9496 ymax: -14.2696
geometry
<POINT [°]>
1 (-43.1884 -20.975)
2 (-43.1004 -20.917)
3 (-42.7077 -19.9255)
# … with 824 more rows
These values are then used as row names in the next step within the nb
object created using as_adjacency_matrix
and mat2listw
. Does anyone know how to attach my own index or even extract the index that has been created so I can make it consistent between this and the data frame? I have tried to create a conversion table taking the from
/origin_code
and to
/dest_code
but these do not match, the from
value is the lowest of the two indices and so is not always the origin in my data.
Also to make it more difficult, there are 57 more nodes than expected and I cannot find why this has happened. I have checked the duplicates, they have the same coordinates and code/name in the data but are being treated separately with a different index number.
In summary, I'd like to convert a linestring object into a neighbourhood object that can be used in an INLA model where the cities connected by lines are considered neighbours. I need to attach an index to these cities so the neighbourhood object sets the index as the row name and it can be combined with data in a model. Hope that makes sense, please let me know if not!
I will present a solution using a simplified example with only 5
LINESTRING(s)
. If the following code corresponds to the desired solution, it should be easy to translate it to real-world cases.First, define some functions and load relevant packages:
Load data and select only the relevant columns + only five rows
Filter out duplicated connections
Build the sfnetwork object
Extract the nodes (i.e. the unique boundary points for those lines)
Each node should correspond to a boundary point of the LINESTRINGS defined in the object
regic_link
. As you noticed, we cannot simply use the from/to columns since they don’t preserve the order of origin and destination. Hence, we have to manually match the nodes with the starting points and the end points (assuming that every time they have the same coordinates, then they also have the same ID). Moreover, each node may correspond to several (identical) boundary points, but we just need to take the first match (i.e. the[1]
notation below):Check the output:
This means that the first node corresponds to ID
1100015
, the seventh node corresponds to ID1100031
and so on. On the other hand, the other nodes do not correspond to any starting point. Repeat for the endpoints:Check the output, which should be opposite wrt to
start
and with the same interpretations:Combine the two objects.
dplyr::coalescence
is used to take the non-NA ID(s)Add the new ids to the nodes table:
Estimate the adjacency matrix among the cities
If we check the first row, we notice that the city is matched with the right columns:
On the other hand
since that corresponds to one of the
cod_dest
.Finally, we can convert the adj matrix to
inla.graph
object (but I think that’s not strictly required by INLA functions):Created on 2021-08-25 by the reprex package (v2.0.0)
Hope that helps. You can check the complete example removing
LIMIT 5
from the query argument inst_read
. There might be some simpler approaches, but I cannot think of any right now.