issues with function for plotting scatterplots to show correlations between multiple survey questions

34 views Asked by At

I have a dataset containing answers to a survey (q1:q4), alongside characteristics of respondents (Project, Level).

data <- data.frame(Project = c(paste0("P", sample(1:3, 10, replace = TRUE))),
                Level = c(sample(1:3, 10, replace = TRUE)),
                q1 = c(sample(1:10, 10, replace = TRUE)),
                q2 = c(sample(1:10, 10, replace = TRUE)),
                q3 = c(sample(1:10, 10, replace = TRUE)),
                q4 = c(sample(1:10, 10, replace = TRUE))
                )

I would like to create nice-looking scatterplots using ggscatterplot showing the correlation between q1 and the other three questions grouping respondents by level and by project.

I have developed this function:

var_look2 <- function(data) {
  var_names <- data %>% select(q1:q4) %>% colnames()
  levels <- c(1:3)
  projects <-   unique(data$Project)
  
  df_cor <- data %>% mutate_if(is.character, as.factor)
  df_cor <- df_cor %>% mutate_if(is.factor, as.numeric)
  
  for(var in var_names) {
    for (level in levels) {
      data_subset <- subset(df_cor, Level == 1)
      
      for(project in projects) {
        data_subset <- subset(df_cor, Project == project)
        
        n <- nrow(data_subset)
        
          p<- ggscatterstats(
          data = data_subset,
          type = "non-parametric",
          x = {{var}},
          y = q1,
          bf.message = FALSE, 
          title = paste(paste(project, "scatterplot level",  level, "N =", n)),
          marginal = TRUE
        ) 
        
        
        ggsave(filename = paste0(project, " ", var, " ", level, " ", " .jpeg"), plot = p, 
               width = 1000, height = 1000, units = "px", scale = 1)
      }
    }
  }
}

Problem 1: When run var_look2(data) I get the following output:

> var_look2(data)
# Error:
# ! Problem while setting up layer.
# ℹ Error occurred in the 3rd layer.
# Caused by error in `$<-.data.frame`:
# ! replacement has 1 row, data has 0
# Run `rlang::last_trace()` to see where the error occurred.

After turning on and off all the loops, I figured that the problem is with this line:

data_subset <- subset(df_cor, Project == project)

as this line generates an empty data_subset. Any ideas?

Problem 2: If I remove the line data_subset <- subset(df_cor, Project == project) ggsave does what I expect.

However, what I actually want is to be able to plot these scatterplots grouped by level and/or project to allow readers to do immediate comparisons.

In order to do this, instead of having the ggsave command at the end, I would like to create a list containing all the plots named appropriately so that I can eventually feed to ggplot. i tried with this command

    p<- ggscatterstats(
      data = data_subset,
      type = "non-parametric",
      x = {{var}},
      y = q1,
      bf.message = FALSE, 
      title = paste((project, " ", {{var}}, " scatterplot level ",  level, "N =", n)),
      marginal = TRUE) 

plot_list[[paste0(project, " ", var, " ", level)]] <- p

However, if i run the command:

plot_list <- var_look(GES_BGD1),

what I get is that plot_list is a NULL object.

I was expecting that plot_list would contain all the scatterplots as described above. This is weird to me, because the ggsave command does save the scatterplot, so the ggscatterplot command is not the issue.

0

There are 0 answers