Calculate a centre point of multiple lat, long points in a data-frame

1.6k views Asked by At

I have a dataset that looks like this:

site   lat      long 
bras2  41.21   -115.11
tex4   45.3    -112.31
bras2  41.15   -115.15 
bras2  41.12   -115.19

For samples with the same site name, I want to calculate their centre point and then add it as a column to the dataset. Some site names are duplicated twice, other three times, other four times.

Like this:

site   lat      long    centre_lat  centre_long 
bras2  41.21   -115.11  value here     value here
tex4   45.3    -112.31  45.3           -112.31 
bras2  41.15   -115.15  value here     value here
bras2  41.12   -115.19  value here     value here

How can I do this?

3

There are 3 answers

2
mrhellmann On BEST ANSWER

If you're using spatial data, you should look into using the sf package. It handles geometries and functions for operating on them well.

Code below shows using both sf::st_centroid and geosphere::centroid. I prefer sf's way of doing things.

df <- read.table(header=TRUE, text= "site   lat      long 
bras2  41.21   -115.11
tex4   45.3    -112.31
bras2  41.15   -115.15 
bras2  41.12   -115.19")


library(dplyr)
library(geosphere)
library(sf)

# Using sf's st_centroid
df_sf <- st_as_sf(df, coords = c('long', 'lat'))

centroids_sf <- df_sf %>%
  group_by(site) %>% 
  summarize(geometry = st_union(geometry)) %>% 
  st_centroid

  
# Using geosphere::centroid
centroids_geoshpere <- df_sf %>%
  group_by(site) %>%
  filter(n() >2)  %>% ## geosphere needs polygons therefore 3+ points
  st_union() %>%
  st_cast('POLYGON') %>%
  as('Spatial') %>% # geoshpere expects SpatialPolygons objects
  centroid() 
  

centroids_geoshpere
#>         [,1]     [,2]
#> [1,] -115.15 41.16001
centroids_sf
#> Simple feature collection with 2 features and 1 field
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -115.15 ymin: 41.16 xmax: -112.31 ymax: 45.3
#> CRS:            NA
#> # A tibble: 2 x 2
#>   site         geometry
#> * <chr>         <POINT>
#> 1 bras2 (-115.15 41.16)
#> 2 tex4   (-112.31 45.3)

Looks like thery're close enough to the same point. I don't think geosphere::centroid can give a centroid for a single point, but may be wrong. sf::st_centroid has no problem with 1,2, or more points. Created on 2020-12-20 by the reprex package (v0.3.0)

0
Dave2e On

The geosphere package has a function centroid to solve problems such as this one.
It as long as there more than one point in shape it is straight forward. Most of the code below involves handling the single point case in the example above.

df <- read.table(header=TRUE, text= "site   lat      long 
bras2  41.21   -115.11
tex4   45.3    -112.31
bras2  41.15   -115.15 
bras2  41.12   -115.19")


library(dplyr)
library(geosphere)

df %>% group_by(side) %>% centroid(.[ ,c(3,2)])

sites <- split(df, df$site)
results <-lapply(sites, function(x) {
   if(nrow(x)>1 ) {
     value <- as.data.frame(centroid(x[, c(3,2)]))
   }
   else {
      value <- x[1, c(3,2)]
      names(value) <- c("lon", "lat")
   }
   value$site <- x$site[1]
   value
})

answer<-bind_rows(results)

      lon      lat  site
1 -115.15 41.16001 bras2
2 -112.31 45.30000  tex4
3
jay.sf On

You could calculate the means grouped by site names using ave after stripping off the site numbers using gsub.

within(dat, {
  g <- gsub("\\d", "", site)
  mid.lat <- ave(lat, g)
  mid.long <- ave(long, g)
  rm(g)
})
#    site   lat    long mid.long mid.lat
# 1 bras2 41.21 -115.11 -115.150  41.160
# 2  tex4 45.30 -112.31 -112.310  45.300
# 3 bras2 41.15 -115.15 -115.150  41.160
# 4 bras2 41.12 -115.19 -115.150  41.160
# 5  foo1 42.10 -123.10 -123.225  42.225
# 6  foo2 42.20 -123.20 -123.225  42.225
# 7 foo11 42.30 -123.30 -123.225  42.225
# 8 foo12 42.30 -123.30 -123.225  42.225

Or, if you depend on the NA:

within(dat, {
  g <- gsub("\\d", "", site)
  n <- ave(site, g, FUN=length)
  mid.lat <- NA
  mid.long <- NA
  mid.lat[n > 1] <- ave(lat[n > 1], g[n > 1])
  mid.long[n > 1] <- ave(long[n > 1], g[n > 1])
  rm(g, n)
  })
#    site   lat    long mid.long mid.lat
# 1 bras2 41.21 -115.11 -115.150  41.160
# 2  tex4 45.30 -112.31       NA      NA
# 3 bras2 41.15 -115.15 -115.150  41.160
# 4 bras2 41.12 -115.19 -115.150  41.160
# 5  foo1 42.10 -123.10 -123.225  42.225
# 6  foo2 42.20 -123.20 -123.225  42.225
# 7 foo11 42.30 -123.30 -123.225  42.225
# 8 foo12 42.30 -123.30 -123.225  42.225

Data:

dat <- structure(list(site = c("bras2", "tex4", "bras2", "bras2", "foo1", 
"foo2", "foo11", "foo12"), lat = c(41.21, 45.3, 41.15, 41.12, 
42.1, 42.2, 42.3, 42.3), long = c(-115.11, -112.31, -115.15, 
-115.19, -123.1, -123.2, -123.3, -123.3)), class = "data.frame", row.names = c(NA, 
-8L))