I have this data set
study_ID title experiment question_ID participant_ID estimate_level estimate correct_answer question type category age gender
<dbl> <chr> <dbl> <chr> <int> <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <chr>
1 11 Dallacker_Parents'_co… 1 1 1 individual 3 10 How many sugar cubes does or… unlim… nutriti… 32 Female
2 11 Dallacker_Parents'_co… 1 2 1 individual 10 11.5 How many sugar cubes does a … unlim… nutriti… 32 Female
3 11 Dallacker_Parents'_co… 1 3 1 individual 7 6.5 How many sugar cubes does a … unlim… nutriti… 32 Female
4 11 Dallacker_Parents'_co… 1 4 1 individual 1 16.5 How many sugar cubes does a … unlim… nutriti… 32 Female
5 11 Dallacker_Parents'_co… 1 5 1 individual 7 11 How many sugar cubes does a … unlim… nutriti… 32 Female
6 11 Dallacker_Parents'_co… 1 6 1 individual 5 2.5 How many sugar cubes does a … unlim… nutriti… 32 Female
7 11 Dallacker_Parents'_co… 1 1 2 individual 2 10 How many sugar cubes does or… unlim… nutriti… 29 Female
8 11 Dallacker_Parents'_co… 1 2 2 individual 10 11.5 How many sugar cubes does a … unlim… nutriti… 29 Female
9 11 Dallacker_Parents'_co… 1 3 2 individual 1.5 6.5 How many sugar cubes does a … unlim… nutriti… 29 Female
10 11 Dallacker_Parents'_co… 1 4 2 individual 2 16.5 How many sugar cubes does a … unlim… nutriti… 29 Female
There are 6 questions in this data set , each of which has a correct_answer
column, and an estimate
column. I am trying to compute a magnitude for each question, so that I get a percentage of people who under- or overestimated and who estimated correctly.
For instance, for each of the 6 questions, it would return something like this: 80 percent underestimated, 10 overestimated, and 10 percent answered correctly.
How can I do this? I am stumped. Thanks in advance!
Here is the dput
dput(head(DF, 10))
structure(list(study_ID = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5), title = c("5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd"), experiment = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1), question_ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
participant_ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), estimate_level = c("individual",
"individual", "individual", "individual", "individual", "individual",
"individual", "individual", "individual", "individual"),
estimate = c(2e+07, 4500000, 21075541, 2e+07, 1e+06, 1.1e+07,
2.5e+07, 8e+06, 1.6e+07, 9800000), correct = c(3.8e+07, 3.8e+07,
3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07,
3.8e+07), question = c("What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?"),
type = c("unlimited", "unlimited", "unlimited", "unlimited",
"unlimited", "unlimited", "unlimited", "unlimited", "unlimited",
"unlimited"), category = c("demographics", "demographics",
"demographics", "demographics", "demographics", "demographics",
"demographics", "demographics", "demographics", "demographics"
), age = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA"), gender = c("NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
Here's a
dplyr
approach: