This is my current coding for data wrangling.
crime_89 <- crime %>%
filter(state_name %in% c("Georgia", "Pennsylvania", "West Virginia"), !is.na(rape_legacy)) %>%
mutate(treat = case_when(year < 1989 ~ 0,
TRUE ~ 1),
id = case_when(state_name == "Georgia" ~ 1,
state_name == "Pennsylvania" ~ 2,
state_name == "West Virginia" ~ 3),
crime_rate = 100 * (violent_crime + homicide + rape_legacy + robbery + aggravated_assault + property_crime + burglary + larceny + motor_vehicle_theft) / population )
I know that mutate from dplyr is the tool to create new columns.
The problem is that this crime89 is the narrow scope from the original data limited to specific states. I would like to use the original data crime that has all lists of the states. Then, id columns should be different; counting from Alaska as 1 ...... to Wyoming as 51. I don't know how to create id columns with this condition without using case_when typing all states individually.
Plus, I also need to create treat columns depending on the states.
The yardstick is the year the law was launched and '1' means treated status.
For example, "Georgia", "Pennsylvania", "West Virginia" are in 1989 so this is why I typed like this::
mutate(treat = case_when(year < 1989 ~ 0,
TRUE ~ 1),
However, not all of the states have the same year of the law. For example,
"Idaho", "Mississippi", "Oregon" is 1990 so the code should be
mutate(treat = case_when(year < 1990 ~ 0,
TRUE ~ 1))
In the case covering all states, I don't know how to create new columns treat in mutate for different states at once.
This is the original data head(crime) and the year is from 1979 to 2016.
year state_abbr state_name population violent_crime homicide rape_legacy rape_revised robbery aggravated_assault
1 1979 220099000 1208030 21460 76390 NA 480700 629480
2 1979 AK Alaska 406000 1994 54 292 NA 445 1203
3 1979 AL Alabama 3769000 15578 496 1037 NA 4127 9918
4 1979 AR Arkansas 2180000 7984 198 595 NA 1626 5565
5 1979 AZ Arizona 2450000 14528 219 1120 NA 4305 8884
6 1979 CA California 22696000 184087 2952 12239 NA 75767 93129
property_crime burglary larceny motor_vehicle_theft caveats
1 11041500 3327700 6601000 1112800
2 23193 5616 15076 2501
3 144372 48517 83791 12064
4 70949 21457 45267 4225
5 177977 48916 116976 12085
6 1511021 496310 847148 167563
Is this what you want to do?
I've reduced the sample data to only what's needed to address the two questions.
By mutating the state variables to
factoryou can get the order you want. Assuming alphabetical is okay, you can then create an ID column taking the integer of the factoredstate_name.You could use the
state_abbrin thecase_whento reduce the typing, and I'm assumingtreatis then determined by bothyearand matching the state?Created on 2022-05-26 by the reprex package (v2.0.1)