Recode A String Variable

1.2k views Asked by At
input VAR1 VAR2
A1 1
A2 0
A3 1
A4 1
A5 1
A6 1
A7 1
A8 1
A9 1
A10 1
A15 1
B7 0
A1 0
A16 1
A17 1
A18 1
A19 1
A20 0
A21 1
end

Say you have data such as the ones shown. I have VAR1 and wish to create from it VAR2 which takes values 1 if VAR1 contains at the beginning: A1, A3-A10, A15-A19, A21 or if not then it is zero. I believe for this you can use strpos(VAR1) but is it possible to say for example: strpos(VAR1, "A1, A3/A10, A15/A19, A21") ?

1

There are 1 answers

9
Bicep On BEST ANSWER

The following works if you have a small number of strings of interest. You may need an alternate approach if you are searching for a larger number of strings, where writing out the ranges of strings (e.g. A3-A10) is unfeasible.

clear
input str3 VAR1 VAR2
A1 1
A2 0
A3 1
A4 1
A5 1
A6 1
A7 1
A8 1
A9 1
A10 1
A15 1
B7 0
A1 1
A16 1
A17 1
A18 1
A19 1
A20 0
A21 1
end

gen wanted = 0

local mystrings = "A1 A3 A4 A5 A6 A7 A8 A9 A10 A15 A16 A17 A18 A19 A21"

foreach string in `mystrings' {
    replace wanted = 1 if strpos(VAR1, "`string'") == 1
}

assert wanted == VAR2

Note that in your example input, the second occurrence of A1 had a value of 0 but should have a value of 1 according to your post.

Here is a more generalisable solution for larger ranges of strings:

gen     A = 0
replace A = 1 if strpos(VAR1,"A") == 1

gen newvar = substr(VAR1,2,.)
destring newvar, replace

gen wanted = 0
replace wanted = 1 if A == 1 & (inlist(newvar,1,21) | inrange(newvar,3,10) | inrange(newvar,15,19))

assert wanted == VAR2