Substr command in SAS not working how it's supposed to

80 views Asked by At

I have a variable called GEOID10 in my SAS table which is made up of 10-11 digits. The first 4-5 digits are the State and County FIPS codes and and the last 6 digits are the Census Tract. I want to create a new variable called TACTCE only takes the last 6 digits of the GEOID10, and drops the first 4-5 digits.

Here are some examples of the GEOID10s I'm working with:

1001020700
1001020900
1001020900
1001020900
1001020900
1001021000
56035000102
56037970500
56037971600
56037971600
56037971600
56037971600
56037971600

I have tried the following codes in SAS, however it never starts at the right number.

data My.Data;
set My.Data;
TRACTCE = input(substr(GEOID10, 5), 6.);
run;

This code gave me 10207 instead of 020700 for GEOID10 1001020700.

data My.Data;
set My.Data;
TRACTCE = input(substr(GEOID10, 5, 6), 6.);
run;

This code also gave me 10207 instead of 020700 for GEOID10 1001020700.

data My.Data;
set My.Data;
TRACTCE = input(substr(GEOID10, 5), 10.);
run;

This code also gave me 1020700 instead of 020700 for GEOID10 1001020700.

data My.Data;
set My.Data;
TRACTCE = input(compress(substr(GEOID10, 5),, 'kd'), 10.);
run;

This code also gave me 1020700 instead of 020700 for GEOID10 1001020700.

data My.Data;
set My.Data;
TRACTCE = input(substr(GEOID10, 5, 6), 10.);
run;

This code also gave me 10207 instead of 020700 for GEOID10 1001020700.

data My.Data;
set My.Data;
TRACTCE = put(input(substr(GEOID10, 5, 6), 10.), z6.);
run;

This code also gave me 10207 instead of 020700 for GEOID10 1001020700.

2

There are 2 answers

0
Tom On

I cannot tell what your string actually has in it from what you have shared.

But based on your first result I will assume that the first digit 1 appears in the second position. Which means that the second digit 1 is in the 5th position.

23   data test;
24     GEOID10=' 1001020700';
25     TRACTCE = input(substr(GEOID10, 5), 6.);
26     put GEOID10 = $quote. TRACTCE= comma12. ;
27   run;

GEOID10=" 1001020700" TRACTCE=102,070

If you want to split 10 or 11 byte character strings into the last 6 and the rest then you can just use LENGTH() function to find where the last non blank character is and subtract 5.

68   data test;
69     length geoid10 $11 left $5 right $6;
70     GEOID10= '1001020700';
71     loc = length(geoid10)-5;
72     left = substrn(geoid10,1,loc-1);
73     right = substrn(geoid10,loc);
74     format _character_ $quote.;
75     put (_all_) (=);
76   run;

geoid10="1001020700" left="1001" right="020700" loc=5

If GEOID10 is NUMERIC then don't use character functions on it. Just use arithmetic.

61   data test;
62     GEOID10= 1001020700;
63     right = mod(geoid10,10**6);
64     left = int(geoid10/10**6);
65     format geoid10 Z11. right z6. left Z5.;
66     put (_all_) (=);
67   run;

GEOID10=01001020700 right=020700 left=01001
0
Mr. SATAN On

Well hello there, can you please run the code with below change and do let me know if any thing other, okay?

first_four=substr(string_variable, -1, 6);

Or

last_three = substr(string_variable, length(string_variable)-2, 3);
run;

What it will do is it will read those values from the end(right to left) and you can adjust the "to" value as you want. By default, variable's values are taken in left to right direction, while population of buffer.