I'm trying to read the history of OpenStreetMap elements, which is a nested XML object with repeated tags in R, and I have not been able to progress up from a certain point. This is part of an effort to detect vandalism in certain kinds of elements.
A sample object that I could read is this one, which looks like this.
<osm version="0.6" generator="CGImap 0.8.8 (1674799 spike-06.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
<relation id="368462" visible="true" version="1" changeset="3430908" timestamp="2009-12-22T22:50:41Z" user="Blanes Lluis" uid="172008">
<member type="way" ref="45378201" role="outer"/>
<member type="way" ref="45338918" role="outer"/>
<member type="way" ref="45362285" role="outer"/>
<member type="way" ref="45362286" role="outer"/>
<member type="way" ref="45323394" role="outer"/>
<member type="way" ref="44988246" role="outer"/>
<member type="way" ref="45341594" role="outer"/>
<member type="way" ref="45369117" role="outer"/>
<member type="way" ref="45324958" role=""/>
<member type="way" ref="45324960" role=""/>
<tag k="admin_level" v="7"/>
<tag k="boundary" v="administrative"/>
<tag k="name" v="La Selva"/>
<tag k="type" v="boundary"/>
<tag k="wikipedia" v="es:La Selva"/>
</relation>
<relation id="368462" visible="true" version="2" changeset="3430948" timestamp="2009-12-22T22:59:53Z" user="Blanes Lluis" uid="172008">
<member type="way" ref="45378201" role="outer"/>
<member type="way" ref="45338918" role="outer"/>
<member type="way" ref="45362285" role="outer"/>
<member type="way" ref="45362286" role="outer"/>
<member type="way" ref="45323394" role="outer"/>
<member type="way" ref="44988246" role="outer"/>
<member type="way" ref="45341594" role="outer"/>
<member type="way" ref="45369117" role="outer"/>
<member type="way" ref="45324958" role="outer"/>
<member type="way" ref="45324960" role="outer"/>
<member type="way" ref="45324953" role="outer"/>
<member type="way" ref="45341583" role="outer"/>
<member type="way" ref="45323298" role="outer"/>
<member type="way" ref="45355688" role="outer"/>
<tag k="admin_level" v="7"/>
<tag k="boundary" v="administrative"/>
<tag k="name" v="La Selva"/>
<tag k="type" v="boundary"/>
<tag k="wikipedia" v="es:La Selva"/>
</relation>
I'm trying to get a data.frame that looks like this or, at the very least, two lists of data.frames containing each version of the object, one for all members and one for all tags. It should look like this.
> members
relation_id version changeset user member_type member_ref member_role
1 112233 1 75757 Joe way 12345 outer
2 112233 1 75757 Joe way 6789 outer
3 112233 1 75757 Joe node 123456
4 112233 2 86868 Jane way 12345 outer
5 112233 2 86868 Jane way 6789 outer
6 112233 2 86868 Jane node 123456
> tags
relation_id version changeset user key value
1 112233 1 75757 Joe name La Selva
2 112233 1 75757 Joe boundary administrative
3 112233 1 75757 Joe admin_level 7
4 112233 2 86868 Jane name Vandalised name
5 112233 2 86868 Jane boundary look at me I'm a vandal
6 112233 2 86868 Jane admin_level Vandalism rules!
The problem is that when I try to convert the xml to a list, all tags and members information get into the attributes, as such.
$osm$relation$tag
list()
attr(,"k")
[1] "population"
attr(,"v")
[1] "172280"
$osm$relation$tag
list()
attr(,"k")
[1] "population:date"
attr(,"v")
[1] "2011"
If I try to convert it into a data.frame or a tibble using methods such as this one or this one, I end up with empty data.frames since the lists themselves are empty.
The most promising approach is this one
library(httr)
library(xml2)
selva <- GET("https://www.openstreetmap.org/api/0.6/relation/368462/history")
selva2 <- read_xml(selva)
rels <- xml2::xml_find_all(selva2, "//relation")
reltags <- lapply(rels, \(x) xml2::xml_find_all(x, "//relation"))
Which returns something like this.
> head(rels)
{xml_nodeset (6)}
[1] <relation id="368462" visible="true" version="1" changeset="3430908" timestamp="2009-12-22T22:50:41Z" user="Blanes Lluis ...
[2] <relation id="368462" visible="true" version="2" changeset="3430948" timestamp="2009-12-22T22:59:53Z" user="Blanes Lluis ...
[3] <relation id="368462" visible="true" version="3" changeset="3430957" timestamp="2009-12-22T23:01:26Z" user="Blanes Lluis ...
[4] <relation id="368462" visible="true" version="4" changeset="3430967" timestamp="2009-12-22T23:04:15Z" user="Blanes Lluis ...
[5] <relation id="368462" visible="true" version="5" changeset="3431008" timestamp="2009-12-22T23:14:07Z" user="Blanes Lluis ...
[6] <relation id="368462" visible="true" version="6" changeset="3431022" timestamp="2009-12-22T23:19:20Z" user="Blanes Lluis
>head(reltags)
[[1]]
{xml_nodeset (1263)}
[1] <tag k="admin_level" v="7"/>
[2] <tag k="boundary" v="administrative"/>
[3] <tag k="name" v="La Selva"/>
[4] <tag k="type" v="boundary"/>
[5] <tag k="wikipedia" v="es:La Selva"/>
[6] <tag k="admin_level" v="7"/>
[7] <tag k="boundary" v="administrative"/>
[8] <tag k="name" v="La Selva"/>
[9] <tag k="type" v="boundary"/>
[10] <tag k="wikipedia" v="es:La Selva"/>
However, trying to get it into a data.frame returns the following
reltags[[1]] |> as_list() |> as_tibble(.name_repair = "unique")
A tibble: 0 × 1,263
# ℹ 1,263 variables: ...1 <list>, ...2 <list>, ...3 <list>, ...4 <list>, ...5 <list>, ...6 <list>, ...7 <list>, ...8 <list>,
# ...9 <list>, ...10 <list>, ...11 <list>, ...12 <list>, ...13 <list>, ...14 <list>, ...15 <list>, ...16 <list>,
# ...17 <list>, ...18 <list>, ...19 <list>, ...20 <list>, ...21 <list>, ...22 <list>, ...23 <list>, ...24 <list>,
# ...25 <list>, ...26 <list>, ...27 <list>, ...28 <list>, ...29 <list>, ...30 <list>, ...31 <list>, ...32 <list>,
# ...33 <list>, ...34 <list>, ...35 <list>, ...36 <list>, ...37 <list>, ...38 <list>, ...39 <list>, ...40 <list>,
# ...41 <list>, ...42 <list>, ...43 <list>, ...44 <list>, ...45 <list>, ...46 <list>, ...47 <list>, ...48 <list>,
# ...49 <list>, ...50 <list>, ...51 <list>, ...52 <list>, ...53 <list>, ...54 <list>, ...55 <list>, ...56 <list>,
Which is, of course, not what I'm looking for. At this point, I haven't got an idea about where to go next.
EDIT, this solution gets a lot closer to the result, but it keeps losing the member and tag information.
library(httr2)
library(tidyverse)
selva <- httr2::request("https://www.openstreetmap.org/api/0.6/relation/368462/history") |>
req_perform()
selvaxml <- resp_body_xml(selva)
resp_l <- selvaxml |>
xml2::as_list() |>
unlist(recursive = FALSE) |>
lapply(X = _, attributes) |>
tibble::as_tibble(.name_repair = "unique") |>
janitor::clean_names()
resp_unnested <- resp_l |>
t() |>
as.data.frame() |>
setNames(c("elements", "osm_id", "visible", "version", "changeset", "timestamp", "user", "user_id")) |>
unnest(everything())
# A tibble: 10,241 × 8
elements osm_id visible version changeset timestamp user user_id
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 member 368462 true 1 3430908 2009-12-22T22:50:41Z Blanes Lluis 172008
2 member 368462 true 1 3430908 2009-12-22T22:50:41Z Blanes Lluis 172008
3 member 368462 true 1 3430908 2009-12-22T22:50:41Z Blanes Lluis 172008
4 member 368462 true 1 3430908 2009-12-22T22:50:41Z Blanes Lluis 172008
5 member 368462 true 1 3430908 2009-12-22T22:50:41Z Blanes Lluis 172008
6 member 368462 true 1 3430908 2009-12-22T22:50:41Z Blanes Lluis 172008
7 member 368462 true 1 3430908 2009-12-22T22:50:41Z Blanes Lluis 172008
8 member 368462 true 1 3430908 2009-12-22T22:50:41Z Blanes Lluis 172008
9 member 368462 true 1 3430908 2009-12-22T22:50:41Z Blanes Lluis 172008
10 member 368462 true 1 3430908 2009-12-22T22:50:41Z Blanes Lluis 172008
# ℹ 10,231 more rows