How can I correct a typo within a column in R?

57 views Asked by At

Newbie here.

I am attempting to clean a dataset in R and I came across a zip code that is 9306 instead of 93060. I googled and read a number of tutorials but none of them updated the data frame. There are over 2,000 observations in this dataset. I already changed the data type of the zip code column from numerical to characters.

It sounds like this is supposed to work? The dataset is clean_transport_2022, Project_Zip is the zip code list. gsub('9306' '93060', clean_transport_2022$Project_Zip)

I did see that there is a package that can be installed specifically for working with zip codes but for my purposes I really just need to substitute that one bit of data.

Thank you in advance!

1

There are 1 answers

3
Hugh On

Your gsub code does indeed substitute 9306 with 93060; however, as you are new to R I'm going to guess you made a mistake almost everyone starting out with R makes and forgot to assign the result to the data.

That is, you forgot to do

clean_transport_2022$Project_Zip <- gsub('9306' '93060', clean_transport_2022$Project_Zip)
# ^^^^^^^ This part ^^^^^^^^^^^^

If you just wrote the RHS of the above assignment operation, you asked R to take the contents Project_Zip column, substitute 9306 with 93060, and then return this new object (without modifying the original data frame)

I note you changed the column from numeric to character. This is fair enough if you're using gsub however it's not necessary for this particular task. If you leave your column as numeric, you can simply use ifelse

clean_transport_2022$Project_Zip <- ifelse(clean_transport_2022$Project_Zip == 9306, 93060, clean_transport_2022$Project_Zip)

This solution would be recommended as your gsub solution may incorrectly modify correct zip codes. For example, it would change 93060 to 930600.