adding `@` attributes to a R haven SPSS dataframe

105 views Asked by At

I'm trying to recreate the output of loading an SPSS sav file using haven, which has attributes attached to each column, like so:

SPSS table information in R

> class(PISA_teacher)
[1] "tbl_df"     "tbl"        "data.frame"

My code is as so:

df <- list(c(1,2,3,4), c("c","d","e","a")) %>% as.data.frame()
names(df) <- c("CNTRYID","Grade")
attr(df$CNTRYID, "label") <- c("Country Identifier")

But it gives me a different sort of attribute, one without an @:

enter image description here

How can I get something similar to the SPSS sav output?

The dput of the PISA_teacher dataframe is as follows:

structure(list(CNTRYID = structure(c(36, 36, 36), label = "Country Identifier", format.spss = "F3.0", labels = c(Albania = 8, Algeria = 12, Argentina = 32, Australia = 36, Austria = 40, Belgium = 56, Brazil = 76, Bulgaria = 100, Canada = 124, Chile = 152, `Chinese Taipei` = 158, Colombia = 170, `Costa Rica` = 188, Croatia = 191, Cyprus = 196, `Czech Republic` = 203, Denmark = 208, `Dominican Republic` = 214, Estonia = 233, Finland = 246, France = 250, Georgia = 268, Germany = 276, Greece = 300, `Hong Kong` = 344, Hungary = 348, Iceland = 352, Indonesia = 360, Ireland = 372, Israel = 376, Italy = 380, Japan = 392, Kazakhstan = 398, Jordan = 400, Korea = 410, Kosovo = 411, Lebanon = 422, Latvia = 428, Lithuania = 440, Luxembourg = 442, Macao = 446, Malaysia = 458, Malta = 470, Mexico = 484, Moldova = 498, Montenegro = 499, Netherlands = 528, `New Zealand` = 554, Norway = 578, Peru = 604, Poland = 616, Portugal = 620, `Puerto Rico (USA)` = 630, Qatar = 634, Romania = 642, `Russian Federation` = 643, Singapore = 702, `Slovak Republic` = 703, Vietnam = 704, Slovenia = 705, Spain = 724, `Spain (Regions)` = 725, Sweden = 752, Switzerland = 756, Thailand = 764, `Trinidad and Tobago` = 780, `United Arab Emirates` = 784, Tunisia = 788, Turkey = 792, FYROM = 807, `United Kingdom` = 826, `United Kingdom - excl. Scotland` = 827, Scotland = 828, `United States` = 840, Uruguay = 858, `B-S-J-G (China)` = 970, `Spain (Regions)` = 971, `USA (Massachusetts)` = 972, `USA (North Carolina)` = 973, `Argentina (Ciudad Autónoma de Buenos)` = 974), class = c("haven_labelled", "vctrs_vctr", "double")), CNT = structure(c("AUS", "AUS", "AUS"), label = "Country code 3-character", format.spss = "A3", display_width = 3L, labels = c(Moldova = "MDA", Thailand = "THA", Brazil = "BRA", France = "FRA", `United States` = "USA", Italy = "ITA", Latvia = "LVA", Algeria = "DZA", Albania = "ALB", Macao = "MAC", Greece = "GRC", Scotland = "QSC", `Massachusettes (USA)` = "QUC", FYROM = "MKD", Netherlands = "NLD", `Puerto Rico (USA)` = "QUD", Switzerland = "CHE", Montenegro = "MNE", `United Arab Emirates` = "ARE", `North Carolina (USA)` = "QUE", Sweden = "SWE", `Czech Republic` = "CZE", `Hong Kong` = "HKG", Argentina = "ARG", `B-S-J-G (China)` = "QCH", `Costa Rica` = "CRI", Denmark = "DNK", `United Kingdom - excl. Scotland` = "QUK", `Slovak Republic` = "SVK", Belgium = "BEL", `Belgium (Flemish)` = "BFL", Chile = "CHL", Colombia = "COL", Poland = "POL", Ireland = "IRL", Iceland = "ISL", `New Zealand` = "NZL", Vietnam = "VNM", `Dominican Republic` = "DOM", Canada = "CAN", Lebanon = "LBN", Indonesia = "IDN", China = "CHN", Finland = "FIN", Japan = "JPN", Hungary = "HUN", Tunisia = "TUN", Slovenia = "SVN", Georgia = "GEO", `Trinidad and Tobago` = "TTO", `Chinese Taipei` = "TAP", Singapore = "SGP", Spain = "ESP", `Argentina (Ciudad Autónoma de Buenos)` = "QAR", `United Kingdom` = "GBR", Peru = "PER", `Belgium (French)` = "BFR", Bulgaria = "BGR", Jordan = "JOR", Korea = "KOR", Norway = "NOR", Israel = "ISR", Turkey = "TUR", `Spain (Regions)` = "QES", Australia = "AUS", `Russian Federation` = "RUS", Malaysia = "MYS", Qatar = "QAT", Malta = "MLT", Portugal = "PRT", Estonia = "EST", Austria = "AUT", Germany = "DEU", Romania = "ROU", Lithuania = "LTU", Croatia = "HRV", Kosovo = "KSV", Mexico = "MEX", Luxembourg = "LUX", Cyprus = "QCY", Uruguay = "URY", Kazakhstan = "KAZ", `International Master` = "ZZZ"), class = c("haven_labelled", "vctrs_vctr", "character"))), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
1

There are 1 answers

1
Ben Bolker On BEST ANSWER

Use haven::labelled (this doesn't get you the @format.spss component, but it gets you everything else)

library(dplyr)
library(haven)
df <- data.frame(CNTRYID = c(1,2,3,4), Grade = c("c","d","e","a"))
df |>
  mutate(across(CNTRYID, labelled, 
         labels = c("Albania"=1, "Austria"=2, "Zambia"=3, "Zimbabwe"=4), 
  label = "Country Name"))

The documentation does say

I expect you'll coerce to a standard R class (e.g. a ‘factor()’) soon after importing.