How to load all fields/ExtendedData (not just 'name' and 'description') from KML layer into R

2.7k views Asked by At

I've been working on loading KML files into R to make web maps with Leaflet/Shiny. The import is pretty simple (using this sample KML):

library(rgdal)

sampleKml <- readOGR("D:/KML_Samples.kml", layer = ogrListLayers("D:/KML_Samples.kml")[1])

In this example, ogrListLayers pulls in all of the kml layers, and I subset only the first element/layer. Easy peasy.

The problem is that using this method to read KML layers only pulls in two fields: "Name" and "Description," as seen below:

> sampleKml <- readOGR("D:/KML_Samples.kml", layer = ogrListLayers("D:/KML_Samples.kml")[1])
OGR data source with driver: KML 
Source: "D:/KML_Samples.kml", layer: "Placemarks"
with 3 features
It has 2 fields
> sampleKml@data
                Name                                                                                  Description
1   Simple placemark Attached to the ground. Intelligently places itself at the height of the underlying terrain.
2 Floating placemark                                                  Floats a defined distance above the ground.
3 Extruded placemark                                              Tethered to the ground by a customizable "tail" 

So R reads the KML layer as a SpatialPointsDataFrame with 3 features (3 different points) and two fields (the columns). However, when I pull the layer into QGIS and read its attribute table, there are many fields in addition to Name and Description, seen here.

From what I can tell, 'name' and 'description' are KML Placemarks, and any additional data are considered ExtendedData. I want to pull import this extended data along with the placemark data.

Is there a way to pull ALL of these KML layer fields/attributes into R? Preferably with readOGR(), but I'm open to all suggestions.

1

There are 1 answers

3
Sebastian On

TL;DR

The underlying problem is the missing library LibKML for windows. My solution is extracting the data directly from the KML via a function.

Problem

I ran into the same problem and after some googling it appears that this has something to do with LibKML and Windows. Executing the same code on my Ubuntu machine yielded different results, namely the ExtendedData was retrieved when loading the saved KML file.

library(rgdal)
library(dplyr)
poly_df<-data.frame(x=c(1,1,0,0),y=c(1,0,0,1))
poly<-poly_df %>% 
  Polygon %>% 
  list %>% 
  Polygons(ID="1") %>% 
  list %>% 
  SpatialPolygons(proj4string = CRS("+init=epsg:4326")) %>% 
  SpatialPolygonsDataFrame(data=data.frame(test="this is a test"))

writeOGR(poly,"test.kml",driver="KML",layer="poly")
poly2<-readOGR("test.kml")
poly2@data

If one would manage to build LibKML [1], s/he would be able to load KML files with the ExtendedData [2].

On Windows the LibKML needs to be build with Visual Studio 2005 [1]. This Visual Studio version is not supported anymore [3]. In [3] user2889419 supplies the link to the 2005 version.
I downloaded and installed the version but building LibKML eventually failed with a lot of errors and warnings (certain files do not exist). This is were I stopped because I am way out of my comfort zone but wanted to share the results of my chase.

Solution in R

My solution is to read the KML directly and then extract the ExtendedData while loading the Spatial Object via rgdal's readOGR. My assumption is that readOGR starts on top of the file as does my extraction routine. Both are then merged and the output is a SpatialPolygonsDataFrame.
I had some troubles extracting the nodes from the KML files at first because I was not aware of the concept of namespaces [4]. (Edited the following function because I ran into troubles with KML files of other origins.)

readKML <- function(file,keep_name_description=FALSE,layer,...) {
  # Set keep_name_description = TRUE to keep "Name" and "Description" columns
  #   in the resulting SpatialPolygonsDataFrame. Only works when there is
  #   ExtendedData in the kml file.

  sp_obj<-readOGR(file,layer,...)
  xml1<-read_xml(file)
  if (!missing(layer)) {
    different_layers <- xml_find_all(xml1, ".//d1:Folder") 
    layer_names <- different_layers %>% 
      xml_find_first(".//d1:name") %>% 
      xml_contents() %>% 
      xml_text()

    selected_layer <- layer_names==layer
    if (!any(selected_layer)) stop("Layer does not exist.")
    xml2 <- different_layers[selected_layer]
  } else {
    xml2 <- xml1
  }

  # extract name and type of variables

  variable_names1 <- 
    xml_find_first(xml2, ".//d1:ExtendedData") %>% 
    xml_children() 

  while(variable_names1 %>% 
        xml_attr("name") %>% 
        is.na() %>% 
        any()&variable_names1 %>%
        xml_children() %>% 
        length>0) variable_names1 <- variable_names1 %>%
    xml_children()

  variable_names <- variable_names1 %>%
    xml_attr("name") %>% 
    unique()

  # return sp_obj if no ExtendedData is present
  if (is.null(variable_names)) return(sp_obj)

  data1 <- xml_find_all(xml2, ".//d1:ExtendedData") %>% 
    xml_children()

  while(data1 %>%
        xml_children() %>% 
        length>0) data1 <- data1 %>%
    xml_children()

  data <- data1 %>% 
    xml_text() %>% 
    matrix(.,ncol=length(variable_names),byrow = TRUE) %>% 
    as.data.frame()

  colnames(data) <- variable_names

  if (keep_name_description) {
    sp_obj@data <- data
  } else {
    try(sp_obj@data <- cbind(sp_obj@data,data),silent=TRUE)
  }
  sp_obj
}

Old: extracting via ReadLines

My solution is to read the KML directly and then extract the ExtendedData while loading the Spatial Object via rgdal's readOGR. My assumption is that readOGR starts on top of the file as does my extraction routine. Both are then merged and the output is a SpatialPolygonsDataFrame.

library(tidyverse)
library(rgdal)

readKML<-function(file,keep_name_description=FALSE,...) {
  # Set keep_name_description = TRUE to keep "Name" and "Description" columns 
  #   in the resulting SpatialPolygonsDataFrame. Only works when there is 
  #   ExtendedData in the kml file.

  if (!grepl("\\.kml$",file)) stop("File is not a KML file.")
  if (!file.exists(file)) stop("File does not exist.")
  map<-readOGR(file,...)

  f1<-readLines(file)

  # get positions of ExtendedData in document
  exdata_position<-grep("ExtendedData",f1) %>% 
    matrix(ncol=2,byrow = TRUE) %>% 
    apply(1,function(x) {
      pos<-x[1]:x[2]
      pos[2:(length(pos)-1)]
    }) %>% 
    t %>% 
    as.data.frame

  # if there is no ExtendedData return SpatialPolygonsDataFrame
  if (ncol(exdata_position)==0) return(map)

  # Get Name of different columns
  extract1<-f1[exdata_position[1,] %>% 
                 unlist]  
  names_of_data<-extract1 %>% 
    strsplit("name=\"") %>%
    lapply(function(x) strsplit(x[[2]],split="\"") ) %>%
    unlist(recursive = FALSE) %>%
    lapply(function(x) return(x[1])) %>% 
    unlist

  # Extract Extended Data
  dat<-lapply(seq(nrow(exdata_position)),function(x) {
    extract2<-f1[exdata_position[x,] %>% 
                   unlist]  
    extract2 %>% 
      strsplit(">") %>%
      lapply(function(x) strsplit(x[[2]],split="<") ) %>% unlist(recursive = FALSE) %>%
      lapply(function(x) return(x[1])) %>% 
      unlist %>% 
      matrix(nrow=1) %>% 
      as.data.frame
  }) %>% 
    do.call(rbind,.)

  # Rename columns
  colnames(dat)<-names_of_data

  # Check if Name and Description should be dropped
  if (keep_name_description) {
    map@data<-cbind(map@data,dat)
  } else {
    map@data<-dat
  }
  map
}

[1] https://github.com/google/libkml/wiki/Building-and-installing-libkml
[2] https://github.com/r-spatial/sf/issues/499
[3] Where to download visual studio express 2005?
[4] Parsing XML in R: Incorrect namespaces