I would like to create new columns for the original table in R

53 views Asked by At

This is my current coding for data wrangling.

crime_89 <- crime %>%
  filter(state_name %in% c("Georgia", "Pennsylvania", "West Virginia"), !is.na(rape_legacy)) %>%
  mutate(treat = case_when(year < 1989 ~ 0, 
                           TRUE ~ 1),
         id = case_when(state_name == "Georgia" ~ 1,
                        state_name == "Pennsylvania" ~ 2,
                        state_name == "West Virginia" ~ 3),
         crime_rate = 100 * (violent_crime + homicide + rape_legacy + robbery + aggravated_assault + property_crime + burglary + larceny + motor_vehicle_theft) / population )

I know that mutate from dplyr is the tool to create new columns.

The problem is that this crime89 is the narrow scope from the original data limited to specific states. I would like to use the original data crime that has all lists of the states. Then, id columns should be different; counting from Alaska as 1 ...... to Wyoming as 51. I don't know how to create id columns with this condition without using case_when typing all states individually.

Plus, I also need to create treat columns depending on the states. The yardstick is the year the law was launched and '1' means treated status. For example, "Georgia", "Pennsylvania", "West Virginia" are in 1989 so this is why I typed like this::

 mutate(treat = case_when(year < 1989 ~ 0, 
                           TRUE ~ 1),

However, not all of the states have the same year of the law. For example, "Idaho", "Mississippi", "Oregon" is 1990 so the code should be

  mutate(treat = case_when(year < 1990 ~ 0, 
                            TRUE ~ 1))

In the case covering all states, I don't know how to create new columns treat in mutate for different states at once.

This is the original data head(crime) and the year is from 1979 to 2016.

 year state_abbr state_name population violent_crime homicide rape_legacy rape_revised robbery aggravated_assault
1 1979                        220099000       1208030    21460       76390           NA  480700             629480
2 1979         AK     Alaska     406000          1994       54         292           NA     445               1203
3 1979         AL    Alabama    3769000         15578      496        1037           NA    4127               9918
4 1979         AR   Arkansas    2180000          7984      198         595           NA    1626               5565
5 1979         AZ    Arizona    2450000         14528      219        1120           NA    4305               8884
6 1979         CA California   22696000        184087     2952       12239           NA   75767              93129
  property_crime burglary larceny motor_vehicle_theft caveats
1       11041500  3327700 6601000             1112800        
2          23193     5616   15076                2501        
3         144372    48517   83791               12064        
4          70949    21457   45267                4225        
5         177977    48916  116976               12085        
6        1511021   496310  847148              167563        
1

There are 1 answers

0
Carl On

Is this what you want to do?

I've reduced the sample data to only what's needed to address the two questions.

By mutating the state variables to factor you can get the order you want. Assuming alphabetical is okay, you can then create an ID column taking the integer of the factored state_name.

You could use the state_abbr in the case_when to reduce the typing, and I'm assuming treat is then determined by both year and matching the state?

library(tidyverse)

crime <- tribble(
  ~year, ~state_abbr, ~state_name,
  1979, "AK", "Alaska", 
  1979, "AL", "Alabama", 
  1979, "AR", "Arkansas", 
  1979, "AZ", "Arizona", 
  1979, "CA", "California", 
  1989, "GA", "Georgia", 
  1989, "PA", "Pennsylvania", 
  1989, "WV", "West Virginia",
  1990, "ID", "Idaho", 
  1990, "MS", "Mississippi", 
  1990, "OR", "Oregon"
)

crime %>%
  mutate(across(starts_with("state_"), factor),
    id = as.integer(state_name),
    treat = case_when(
      year == 1989 & state_abbr %in% c("GA", "PA", "WV") ~ 1,
      year == 1990 & state_abbr %in% c("ID", "MS", "OR") ~ 1,
      TRUE ~ 0
    )
  )
#> # A tibble: 11 × 5
#>     year state_abbr state_name       id treat
#>    <dbl> <fct>      <fct>         <int> <dbl>
#>  1  1979 AK         Alaska            2     0
#>  2  1979 AL         Alabama           1     0
#>  3  1979 AR         Arkansas          4     0
#>  4  1979 AZ         Arizona           3     0
#>  5  1979 CA         California        5     0
#>  6  1989 GA         Georgia           6     1
#>  7  1989 PA         Pennsylvania     10     1
#>  8  1989 WV         West Virginia    11     1
#>  9  1990 ID         Idaho             7     1
#> 10  1990 MS         Mississippi       8     1
#> 11  1990 OR         Oregon            9     1

Created on 2022-05-26 by the reprex package (v2.0.1)