I have a an HTML table code, which needs to be converted into plain text, using the Flex utility in Linux systems.
I've come up with a list of tokens in my .lex file, which are as follows:
OPENTABLE <table>
CLOSETABLE </table>
OPENROW <tr>
CLOSEROW </tr>
OPENHEADING <th>
CLOSEHEADING </th>
OPENDATA <td>
CLOSEDATA </td>
STRING [0-9a-zA-Z]*
%%
%%
My CGF (Translation Scheme included) for the HTML parse looks like:
TABLE --> OPENTABLE ROWLIST CLOSETABLE ;
ROWLIST --> ROWLIST ROW | ^ ;
ROW --> OPENROW DATALIST CLOSEROW printf("\n");
DATALIST --> DATALIST DATA | ^ ;
DATA --> OPENDATA STRIN CLOSEDATA printf(yytext+"\t");
I've seen some examples, but I'm not getting what should I write in the rules section of my .lex file.
I spent some time on the basics, and figured it out. Flex' info page was of great help. This is what the required file is. Works good, but still needs to improvements.