Most efficient way to get current word from char array

1.2k views Asked by At

Say I have a string "text", a caret position "caret" and then want to find the current word (seperated by space).

The way I'm currently doing it seems inefficient and I was wondering if anyone had a efficient way of doing it?

const char* text;
int caret;
int initpos;
int start;
int count = 0;
char word[256];

// text and caret values assigned here.

initpos = caret;
while(caret > 0 && text[caret] != ' ') // get start
{
    caret--;
    count++;
}
start = caret;
caret = initpos;

while(text[caret] && text[caret] != ' ') // get end
{
    caret++;
    count++;
}

word = strsub(text, start, count);
3

There are 3 answers

0
Fred Foo On BEST ANSWER

By "seems inefficient", do you mean the code looks inefficient to you or that you've measured and found it too slow for you purposes?

Your method takes O(n) steps where n is the length of the longest word in your input. That's pretty fast unless your words have the size of DNA strings.

A faster method, for some datasets, would be to use an index of word start and end positions. An binary search tree storing intervals would fit this bill, but at the expense of O(lg N) retrieval time, where N is the number of words in your input. Probably not worth it.

2
Richard Pennington On
#include <ctype.h>

...
// Other definitions from above.
char *p = word;
char *q = text + caret;
while(q >= text && !isblank(*q)) {
   q--;
}
if (q < text) q++; // All non-blanks.
while (*q && !isblank(*q)) {
   *p++ = *q++;
}
*p = '\0';
// word now has nul terminated non-blank characters, p points to EOL or blanks.
1
CygnusX1 On

I think it is an efficient approach. I would just suggest checking if the character is a letter, rather than if it is not a space:

while(caret > 0 && ((text[caret]>='A' && text[caret]<='Z') || (text[caret]>='a' && text[caret]<='z')))

This will catch other cases, e.g. when word is terminated by a dot, a digit, a bracket etc.