Scanning long strings is very slow

29 views Asked by At

I use the following to scan quoted strings with Flex:

\"[^\0\"]*\" { return STRING; }

Any character except null is allowed between quotes.

Very long strings are extremely slow to scan, even when they contain no newlines. Some profiling tells me that most time is spent in yy_get_previous_state(). How can I understand where the performance problem comes from, and what are good ways to increase performance?

Why we need good performance on long strings: While they are unlikely to appear in practical use cases, coverage based fuzzing tends to come up with such extreme examples and then time out. I either need a way to improve performance, or to restrict the maximum string length based on whether a macro is defined.

What I tried so far was to try to use start conditions to make sure no other rule is active when scanning the contents of strings. This made no difference. Here's the attempt:

%x STRBODY STRCLOSE

\"                      { BEGIN(STRBODY); }
<STRBODY>[^\0"]*        { BEGIN(STRCLOSE); return STRING; }
<STRCLOSE>\"            { BEGIN(INITIAL); }

<*>.|\n                 { return ERROR; }
0

There are 0 answers