Tokenizing a phone number in C

1.7k views Asked by At

I'm trying to tokenize a phone number and split it into two arrays. It starts out in a string in the form of "(515) 555-5555". I'm looking to tokenize the area code, the first 3 digits, and the last 4 digits. The area code I would store in one array, and the other 7 digits in another one. Both arrays are to hold just the numbers themselves.

My code seems to work... sort of. The issue is when I print the two storage arrays, I find some quirks;

  1. My array aCode; it stores the first 3 digits as I ask it to, but then it also prints some garbage values notched at the end. I walked through it in the debugger, and the array only stores what I'm asking it to store- the 515. So how come it's printing those garbage values? What gives?

  2. My array aNum; I can append the tokens I need to the end of it, the only problem is I end up with an extra space at the front (which makes sense; I'm adding on to an empty array, ie adding on to empty space). I modify the code to only hold 7 variables just to mess around, I step into the debugger, and it tells me that the array holds and empty space and 6 of the digits I need- there's no room for the last one. Yet when I print it, the space AND all 7 digits are printed. How does that happen?

And how could I set up my strtok function so that it first copies the 3 digits before the "-", then appends to that the last 4 I need? All examples of tokenization I've seen utilize a while loop, which would mean I'd have to choose either strcat or strcpy to complete my task. I can set up an "if" statement to check for the size of the current token each time, but that seems too crude to me and I feel like there's a simpler method to this. Thanks all!

int main() {

    char phoneNum[]= "(515) 555-5555";
    char aCode[3];
    char aNum[7];

    char *numPtr;

    numPtr = strtok(phoneNum, " ");

    strncpy(aCode, &numPtr[1], 3);
    printf("%s\n", aCode);  

    numPtr = strtok(&phoneNum[6], "-");

    while (numPtr != NULL) {
        strcat(aNum, numPtr);
        numPtr = strtok(NULL, "-");
    }
    printf("%s", aNum);  
}
4

There are 4 answers

6
Sourav Ghosh On BEST ANSWER

I can primarily see two errors,

  1. Being an array of 3 chars, aCode is not null-terminated here. Using it as an argument to %s format specifier in printf() invokes undefined behaviour. Same thing in a differrent way for aNum, too.

  2. strcat() expects a null-terminated array for both the arguments. aNum is not null-terminated, when used for the first time, will result in UB, too. Always initialize your local variables.

Also, see other answers for a complete bug-free code.

0
Arkku On

Other answers have already mentioned the major issue, which is insufficient space in aCode and aNum for the terminating NUL character. The sscanf answer is also the cleanest for solving the problem, but given the restriction of using strtok, here's one possible solution to consider:

char phone_number[]= "(515) 555-1234";
char area[3+1] = "";
char digits[7+1] = "";
const char *separators = " (-)";
char *p = strtok(phone_number, separators);
if (p) {
    int len = 0;
    (void) snprintf(area, sizeof(area), "%s", p);
    while (len < sizeof(digits) && (p = strtok(NULL, separators))) {
        len += snprintf(digits + len, sizeof(digits) - len, "%s", p);
    }
}
(void) printf("(%s) %s\n", area, digits);
5
Iharob Al Asimi On

Your code has multiple issues

  1. You allocate the wrong size for aCode, you should add 1 for the nul terminator byte and initialize the whole array to '\0' to ensure end of lines.

    char aCode[4] = {'\0'};
    
  2. You don't check if strtok() returns NULL.

    numPtr = strtok(phoneNum, " ");
    strncpy(aCode, &numPtr[1], 3);
    
  3. Point 1, applies to aNum in strcat(aNum, numPtr) which will also fail because aNum is not yet initialized at the first call.

  4. Subsequent calls to strtok() must have NULL as the first parameter, hence

    numPtr = strtok(&phoneNum[6], "-");
    

    is wrong, it should be

    numPtr = strtok(NULL, "-");
    
1
Sergey Kalinichenko On

The biggest problem in your code is undefined behavior: since you are reading a three-character constant into a three-character array, you have left no space for null terminator.

Since you are tokenizing a value in a very specific format of fixed length, you could get away with a very concise implementation that employs sscanf:

char *phoneNum = "(515) 555-5555";
char aCode[3+1];
char aNum[7+1];
sscanf(phoneNum, "(%3[0-9]) %3[0-9]-%4[0-9]", aCode, aNum, &aNum[3]);
printf("%s %s", aCode, aNum);

This solution passes the format (###) ###-#### directly to sscanf, and tells the function where each value needs to be placed. The only "trick" used above is passing &aNum[3] for the last argument, instructing sscanf to place data for the third segment into the same storage as the second segment, but starting at position 3.

Demo.