My objective is to parse like Python does with strings.
Question: How to write a lex to support the following:
"string..."
'string...'
"""multi line string \n \n end"""
'''multi line string \n \n end'''
Some code:
states = ( ('string', 'exclusive'), ) # Strings def t_begin_string(self, t): r'(\'|(\'{3})|\"|(\"{3}))' t.lexer.push_state('string') def t_string_end(self, t): r'(\'|(\'{3})|\"|(\"{3}))' t.lexer.pop_state() def t_string_newline(self, t): r'\n' t.lexer.lineno += 1 def t_string_error(self, t): print("Illegal character in string '%s'" % t.value[0]) t.lexer.skip(1)
My current idea is to create 4 unique states that will match the 4 different string cases, but I'm wondering if there's a better approach.
Thanks for your help!
Try using the pyparsing module. With this module you can easily parse strings with good style without using regular expressions.
The following example should help you parsing expressions like
"string..."
and"""string"""
as well.