How do you compute magnitudes (percentages) that are over and under a specific number in another column?

Question

How do you compute magnitudes (percentages) that are over and under a specific number in another column?

49 views Asked by dampfy At 01 October 2020 at 12:51

I have this data set

study_ID title                  experiment question_ID participant_ID estimate_level estimate correct_answer question                      type   category   age gender
      <dbl> <chr>                       <dbl> <chr>                <int> <chr>             <dbl>          <dbl> <chr>                         <chr>  <chr>    <int> <chr> 
 1       11 Dallacker_Parents'_co…          1 1                        1 individual          3             10   How many sugar cubes does or… unlim… nutriti…    32 Female
 2       11 Dallacker_Parents'_co…          1 2                        1 individual         10             11.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 3       11 Dallacker_Parents'_co…          1 3                        1 individual          7              6.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 4       11 Dallacker_Parents'_co…          1 4                        1 individual          1             16.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 5       11 Dallacker_Parents'_co…          1 5                        1 individual          7             11   How many sugar cubes does a … unlim… nutriti…    32 Female
 6       11 Dallacker_Parents'_co…          1 6                        1 individual          5              2.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 7       11 Dallacker_Parents'_co…          1 1                        2 individual          2             10   How many sugar cubes does or… unlim… nutriti…    29 Female
 8       11 Dallacker_Parents'_co…          1 2                        2 individual         10             11.5 How many sugar cubes does a … unlim… nutriti…    29 Female
 9       11 Dallacker_Parents'_co…          1 3                        2 individual          1.5            6.5 How many sugar cubes does a … unlim… nutriti…    29 Female
10       11 Dallacker_Parents'_co…          1 4                        2 individual          2             16.5 How many sugar cubes does a … unlim… nutriti…    29 Female

There are 6 questions in this data set , each of which has a correct_answer column, and an estimate column. I am trying to compute a magnitude for each question, so that I get a percentage of people who under- or overestimated and who estimated correctly.

For instance, for each of the 6 questions, it would return something like this: 80 percent underestimated, 10 overestimated, and 10 percent answered correctly.

How can I do this? I am stumped. Thanks in advance!

Here is the dput

dput(head(DF, 10))
structure(list(study_ID = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5), title = c("5_Jayles_Debiasing_The_Crowd", 
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd", 
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd", 
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd", 
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd", 
"5_Jayles_Debiasing_The_Crowd"), experiment = c(1, 1, 1, 1, 1, 
1, 1, 1, 1, 1), question_ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 
    participant_ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), estimate_level = c("individual", 
    "individual", "individual", "individual", "individual", "individual", 
    "individual", "individual", "individual", "individual"), 
    estimate = c(2e+07, 4500000, 21075541, 2e+07, 1e+06, 1.1e+07, 
    2.5e+07, 8e+06, 1.6e+07, 9800000), correct = c(3.8e+07, 3.8e+07, 
    3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 
    3.8e+07), question = c("What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?", 
    "What is the population of Tokyo and its agglomeration?"), 
    type = c("unlimited", "unlimited", "unlimited", "unlimited", 
    "unlimited", "unlimited", "unlimited", "unlimited", "unlimited", 
    "unlimited"), category = c("demographics", "demographics", 
    "demographics", "demographics", "demographics", "demographics", 
    "demographics", "demographics", "demographics", "demographics"
    ), age = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", 
    "NA", "NA"), gender = c("NA", "NA", "NA", "NA", "NA", "NA", 
    "NA", "NA", "NA", "NA")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

Original Q&A

There are 2 answers

David Moore On 01 October 2020 at 14:38

list1 <- lapply(split(DF, DF$question_ID), function (x) {
  overestimated <- 100 * length(which(x$estimate > x$correct)) / length(x$estimate)
  underestimated <- 100 * length(which(x$estimate < x$correct)) / length(x$estimate)
  correct <- 100 * length(which(x$estimate == x$correct)) / length(x$estimate)
  data.frame(overestimated, underestimated, correct)
})
list2 <- mapply(function (x, y) {
  x$question_ID <- y
  return (x)
}, x = list1, y = names(list1), SIMPLIFY = F)
Percent_Data <- do.call("rbind", list2)
Percent_Data <- Percent_Data[, c(which(colnames(Percent_Data) == "question_ID"), which(colnames(Percent_Data) != "question_ID"))]
Percent_Data
#   question_ID overestimated underestimated correct
# 1           1             0            100       0

**Gregor Thomas** · Accepted Answer · 2020-10-01T14:30:31+00:00

Here's a dplyr approach:

library(dplyr)
df %>%
  group_by(question_ID) %>%
  summarize(prop_over = mean(estimate > correct),
            prop_under = mean(estimate < correct),
            prop_correct = mean(estimate == correct)
  )
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 1 x 4
#   question_ID prop_over prop_under prop_correct
#         <dbl>     <dbl>      <dbl>        <dbl>
# 1           1         0          1            0

TechQA.

How do you compute magnitudes (percentages) that are over and under a specific number in another column?

There are 2 answers

Related Questions in R

Related Questions in PERCENTAGE

Related Questions in MAGNITUDE

Popular Questions

Popular Tags

Trending Questions