Why is tidycensus area provided different that calculated by sf::_st_area()

219 views Asked by At

I am using the tidycensus R package to pull in census data and geometries. I want to be able to calculate population densities and have the results match what I see on censusreporter.org. I am noticing a difference between the geography variables returned from tidycenus compared to what I calculate myself using the sf package sf::st_area() function.

library(tidyverse)
library(tidycensus)
census_api_key("my_api_key")
library(sf)
options(tigris_use_cache = TRUE)

pop_texas <-
  get_acs(geography = 'state',
      variables = "B01003_001", # Total Population
      year = 2020, 
      survey = 'acs5', 
      keep_geo_vars = TRUE,
      geometry = TRUE) %>%
filter(GEOID == '48') # Filter to Texas

Since I included the keep_geo_vars argument as TRUE it returned an ALAND column which I believe is the correct area for the geography returned in square meters (m^2).

> pop_texas$ALAND %>% format(big.mark=",")
[1] "676,680,588,914"

# Conversion to square miles
> (pop_texas$ALAND / 1000000 / 2.5899881) %>% format(big.mark=",")
[1] "261,267.8"

When I convert the ALAND amount to square miles I get the same number as shown on censusreporter.org:

enter image description here

I have also tried to calculate the area using the sf::st_area() function, but I get a different result:

> sf::st_area(pop_texas) %>% format(big.mark=",", scientific=FALSE)
[1] "688,276,954,146 [m^2]"

# Conversion to square miles
> (sf::st_area(pop_texas) / 1000000 / 2.5899881) %>%
+   as.numeric() %>%
+   format(big.mark=",", scientific=FALSE)
[1] "265,745.2"

Please let me know if there is something I am missing to reconcile these numbers. I would expect to get the same results either directly through tidycensus or calculating the area using sf::st_area().

Right now I am off by a lot:

> (pop_texas$ALAND - as.numeric(st_area(pop_texas)) ) %>%
+   format(big.mark=",")
[1] "-11,596,365,232"
2

There are 2 answers

0
kwalkertcu On BEST ANSWER

If you want the "official" area of a shape like Texas you should always use the ALAND or published area value. st_area() is using geometry to calculate the area of the polygon which is always going to be a simplified and imperfect representation of Texas (or any other area). For smaller shapes (like Census tracts) the calculations will probably be pretty close; for larger shapes like states (especially those with complex coastal geography, like Texas) you're going to be further off.

0
dieghernan On

These differences are usually due to the CRS (the projection used on your sf objects). Some projections distort area, other projections distors the shape. See this to learn more http://wiki.gis.com/wiki/index.php/Distortion#:~:text=There%20are%20four%20main%20types,%2C%20direction%2C%20shape%20and%20area.