how to clean the obs values in a column in R

54 views Asked by At

I have the following data:

head(MS.data.in)
  encounter_id patient_nbr            race gender     age weight admission_type_id
1      2278392     8222157       Caucasian Female  [0-10)      ?                 6
2       149190    55629189       Caucasian Female [10-20)      ?                 1
3        64410    86047875 AfricanAmerican Female [20-30)      ?                 1
4       500364    82442376       Caucasian   Male [30-40)      ?                 1
5        16680    42519267       Caucasian   Male [40-50)      ?                 1
6        35754    82637451       Caucasian   Male [50-60)      ?                 2

I wud like to change the obs of 'age' column by taking the upper 2 digits of the given interval something as shown below:

head(MS.data.in$age)
[1] 10 20 30 40 50 60
1

There are 1 answers

1
akrun On BEST ANSWER

We can use sub to extract the values by matching characters until the - (.*-) followed by numbers inside a capture group ((\\d+)) followed by characters until the end of string (.*) and replace with the backreference (\\1) of the capture group.

MS.data.in$age <- as.numeric(sub(".*-(\\d+).*", "\\1", MS.data.in$age))
MS.data.in$age
#[1] 10 20 30 40 50 60