Creating a inla.graph object with correct index labels from sf linestring object

240 views Asked by At

I have a shapefile containing linestrings that describes connection between cities in Brazil. I would like to convert these connections into a neighbourhood object with the cities' code set as the row name, making it compatible with my data frame:

> head(regic_link)
Simple feature collection with 6 features and 6 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -45.7931 ymin: -21.1472 xmax: -41.3903 ymax: -15.9032
proj4string:   +proj=longlat +ellps=GRS80 +no_defs 
     id origin_code            nome_ori dest_code        nome_dest   dist_km                       geometry
1 14016     3123304      Dores do Turvo   3165701  Senador Firmino 11.732462 LINESTRING (-43.1884 -20.97...
2 14117     3124708   Estrela do Indaiá   3166600 Serra da Saudade  9.080594 LINESTRING (-45.7878 -19.52...
3 15205     3138658              Lontra   3135357         Japonvar 11.104687 LINESTRING (-44.3029 -15.90...
4 17147     3163300  São José do Divino   3144904      Nova Módica 13.066889 LINESTRING (-41.3903 -18.48...
5 17151     3163409 São José do Goiabal   3121803         Dionísio 12.078370 LINESTRING (-42.7077 -19.92...
6 12463     3102100       Alto Rio Doce   3121506 Desterro do Melo 17.982702 LINESTRING (-43.412 -21.025...

So the row name would be set as origin_code and the neighbour would be set to dest_code in the list and vice versa (eventually this would be changed to an index I create but this makes things easier to check). Essentially, I need the linestring equivalent for the following code used for polygons:

nb.orig <- poly2nb(as_Spatial(shp), row.names = shp$index)
names(nb.orig) <- attr(nb.orig, "region.id")

nb2INLA("output/nb_orig.graph",  nb.orig)

(shp is a shapefile consisting of polygons, and the index variable is present in both the shapefile and the data frame).

So far, I have used the sfnetworks and igraph packages to create a neighbourhood object but have not been able to attach index values to the row names:

regic_net <- as_sfnetwork(regic_link, directed = F) 

net_adj <- as_adjacency_matrix(regic_net, names = T)
nb_net <- mat2listw(net_adj)$neighbours

The function as_sfnetwork assigns a number to each of the nodes in the network by default (in the edge data, these are variables 'from' and 'to' and cannot be changed using mutate):

> regic_net
# A sfnetwork with 827 nodes and 3786 edges
#
# CRS:  NA 
#
# An undirected simple graph with 1 component with spatially explicit edges
#
# Edge Data:     3,786 x 7 (active)
# Geometry type: LINESTRING
# Dimension:     XY
# Bounding box:  xmin: -50.6938 ymin: -22.855 xmax: -39.9496 ymax: -14.2696
   from    to origin_code nome_ori           dest_code nome_dest                                                                                    geometry
  <int> <int>       <dbl> <chr>                  <dbl> <chr>                                                                                <LINESTRING [°]>
1     1     2     3123304 Dores do Turvo       3165701 Senador Firmino    (-43.1884 -20.975, -43.18106 -20.97017, -43.17373 -20.96534, -43.16639 -20.9605, …
2     3     4     3163409 São José do Goiab…   3121803 Dionísio           (-42.7077 -19.9255, -42.71344 -19.91851, -42.71917 -19.91152, -42.72491 -19.90453…
3     5     6     3167707 Sobrália             3162609 São João do Orien… (-42.0972 -19.2363, -42.1016 -19.2437, -42.106 -19.2511, -42.11039 -19.2585, -42.…
4     5     7     3167707 Sobrália             3168408 Tarumirim          (-42.0972 -19.2363, -42.08899 -19.24038, -42.08079 -19.24447, -42.07258 -19.24855…
5     8     9     3115474 Catuti               3141009 Mato Verde         (-42.9607 -15.3612, -42.95243 -15.36428, -42.94415 -15.36735, -42.93588 -15.37043…
6    10    11     3130606 Inconfidentes        3108305 Borda da Mata      (-46.328 -22.3174, -46.31896 -22.31505, -46.30993 -22.3127, -46.30089 -22.31034, …
# … with 3,780 more rows
#
# Node Data:     827 x 1
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: -50.6938 ymin: -22.855 xmax: -39.9496 ymax: -14.2696
             geometry
          <POINT [°]>
1  (-43.1884 -20.975)
2  (-43.1004 -20.917)
3 (-42.7077 -19.9255)
# … with 824 more rows

These values are then used as row names in the next step within the nb object created using as_adjacency_matrix and mat2listw. Does anyone know how to attach my own index or even extract the index that has been created so I can make it consistent between this and the data frame? I have tried to create a conversion table taking the from/origin_code and to/dest_code but these do not match, the from value is the lowest of the two indices and so is not always the origin in my data.

Also to make it more difficult, there are 57 more nodes than expected and I cannot find why this has happened. I have checked the duplicates, they have the same coordinates and code/name in the data but are being treated separately with a different index number.

In summary, I'd like to convert a linestring object into a neighbourhood object that can be used in an INLA model where the cities connected by lines are considered neighbours. I need to attach an index to these cities so the neighbourhood object sets the index as the row name and it can be combined with data in a model. Hope that makes sense, please let me know if not!

1

There are 1 answers

2
agila On

I will present a solution using a simplified example with only 5 LINESTRING(s). If the following code corresponds to the desired solution, it should be easy to translate it to real-world cases.

First, define some functions and load relevant packages:

'%!in%' <- function(x,y) !('%in%'(x,y))
Sys.setenv(`_R_S3_METHOD_REGISTRATION_NOTE_OVERWRITES_` = "false")
suppressPackageStartupMessages({
  library(sf)
  library(lwgeom)
  library(tidygraph)
  library(sfnetworks)
})

Load data and select only the relevant columns + only five rows

regic_link <- st_read(
  dsn = "city_link/REGIC2018_Ligacoes_entre_Cidades.shp",
  query = "SELECT cod_ori, cod_dest FROM REGIC2018_Ligacoes_entre_Cidades LIMIT 5"
)
#> Reading query `SELECT cod_ori, cod_dest FROM REGIC2018_Ligacoes_entre_Cidades LIMIT 5' from data source `C:\Users\Utente\Desktop\city_link\REGIC2018_Ligacoes_entre_Cidades.shp' 
#>   using driver `ESRI Shapefile'
#> Simple feature collection with 5 features and 2 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -63.0338 ymin: -13.4945 xmax: -60.1348 ymax: -9.912
#> Geodetic CRS:  SIRGAS 2000

Filter out duplicated connections

regic_link <- regic_link %>%
  filter(paste0(cod_ori, cod_dest) %!in% paste0(cod_dest, cod_ori))

Build the sfnetwork object

regic_sfn <- as_sfnetwork(regic_link, directed = FALSE)

Extract the nodes (i.e. the unique boundary points for those lines)

regic_nodes <- st_as_sf(regic_sfn, "nodes")

Each node should correspond to a boundary point of the LINESTRINGS defined in the object regic_link. As you noticed, we cannot simply use the from/to columns since they don’t preserve the order of origin and destination. Hence, we have to manually match the nodes with the starting points and the end points (assuming that every time they have the same coordinates, then they also have the same ID). Moreover, each node may correspond to several (identical) boundary points, but we just need to take the first match (i.e. the [1] notation below):

start <- lapply(
  X = st_equals(regic_nodes, st_startpoint(regic_link)),
  FUN = function(x) regic_link$cod_ori[x[1]]
)

Check the output:

start
#> [[1]]
#> [1] "1100015"
#> 
#> [[2]]
#> [1] NA
#> 
#> [[3]]
#> [1] NA
#> 
#> [[4]]
#> [1] NA
#> 
#> [[5]]
#> [1] "1100023"
#> 
#> [[6]]
#> [1] NA
#> 
#> [[7]]
#> [1] "1100031"
#> 
#> [[8]]
#> [1] NA

This means that the first node corresponds to ID 1100015, the seventh node corresponds to ID 1100031 and so on. On the other hand, the other nodes do not correspond to any starting point. Repeat for the endpoints:

end <- lapply(
  X = st_equals(regic_nodes, st_endpoint(regic_link)),
  FUN = function(i) regic_link$cod_dest[i[1]]
)

Check the output, which should be opposite wrt to start and with the same interpretations:

end
#> [[1]]
#> [1] NA
#> 
#> [[2]]
#> [1] "1100049"
#> 
#> [[3]]
#> [1] "1100189"
#> 
#> [[4]]
#> [1] "1100296"
#> 
#> [[5]]
#> [1] NA
#> 
#> [[6]]
#> [1] "1100114"
#> 
#> [[7]]
#> [1] NA
#> 
#> [[8]]
#> [1] "1100304"

Combine the two objects. dplyr::coalescence is used to take the non-NA ID(s)

idxs <- mapply(dplyr::coalesce, start, end)

Add the new ids to the nodes table:

regic_sfn <- regic_sfn %N>%
  mutate(name = idxs)

Estimate the adjacency matrix among the cities

regic_adj <- igraph::as_adjacency_matrix(regic_sfn, names = TRUE)
regic_adj
#> 8 x 8 sparse Matrix of class "dgCMatrix"
#>         1100015 1100049 1100189 1100296 1100023 1100114 1100031 1100304
#> 1100015       .       1       1       1       .       .       .       .
#> 1100049       1       .       .       .       .       .       .       .
#> 1100189       1       .       .       .       .       .       .       .
#> 1100296       1       .       .       .       .       .       .       .
#> 1100023       .       .       .       .       .       1       .       .
#> 1100114       .       .       .       .       1       .       .       .
#> 1100031       .       .       .       .       .       .       .       1
#> 1100304       .       .       .       .       .       .       1       .

If we check the first row, we notice that the city is matched with the right columns:

regic_link %>% filter(cod_ori == 1100023)
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -63.0338 ymin: -10.4323 xmax: -62.4721 ymax: -9.912
#> Geodetic CRS:  SIRGAS 2000
#>   cod_ori cod_dest                       geometry
#> 1 1100023  1100114 LINESTRING (-63.0338 -9.912...

On the other hand

regic_link %>% filter(cod_ori == 1100049)
#> Simple feature collection with 0 features and 2 fields
#> Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
#> Geodetic CRS:  SIRGAS 2000
#> [1] cod_ori  cod_dest geometry
#> <0 rows> (or 0-length row.names)

since that corresponds to one of the cod_dest.

regic_link %>% filter(cod_dest == 1100049)
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -62.0041 ymin: -11.9342 xmax: -61.4512 ymax: -11.4356
#> Geodetic CRS:  SIRGAS 2000
#>   cod_ori cod_dest                       geometry
#> 1 1100015  1100049 LINESTRING (-62.0041 -11.93...

Finally, we can convert the adj matrix to inla.graph object (but I think that’s not strictly required by INLA functions):

INLA::inla.read.graph(regic_adj)
#> $n
#> [1] 8
#> 
#> $nnbs
#> [1] 3 1 1 1 1 1 1 1
#> 
#> $nbs
#> $nbs[[1]]
#> [1] 2 3 4
#> 
#> $nbs[[2]]
#> [1] 1
#> 
#> $nbs[[3]]
#> [1] 1
#> 
#> $nbs[[4]]
#> [1] 1
#> 
#> $nbs[[5]]
#> [1] 6
#> 
#> $nbs[[6]]
#> [1] 5
#> 
#> $nbs[[7]]
#> [1] 8
#> 
#> $nbs[[8]]
#> [1] 7
#> 
#> 
#> $graph.file
#> [1] NA
#> 
#> $cc
#> $cc$id
#> [1] 1 1 1 1 2 2 3 3
#> 
#> $cc$n
#> [1] 3
#> 
#> $cc$nodes
#> $cc$nodes[[1]]
#> [1] 1 2 3 4
#> 
#> $cc$nodes[[2]]
#> [1] 5 6
#> 
#> $cc$nodes[[3]]
#> [1] 7 8
#> 
#> 
#> 
#> attr(,"class")
#> [1] "inla.graph"

Created on 2021-08-25 by the reprex package (v2.0.0)

Hope that helps. You can check the complete example removing LIMIT 5 from the query argument in st_read. There might be some simpler approaches, but I cannot think of any right now.