I have these tokens defined in my lex file:
(?xi:
ADC|AND|ASL|BIT|BRK|CLC|CLD|CLI|CLV|CMP|CPX|
DEY|EOR|INC|INX|INY|JMP|JSR|LDA|LDX|LDY|LSR|
NOP|ORA|PHA|PHP|PLA|PLP|ROL|ROR|RTI|RTS|SBC|
SEC|SED|SEI|STA|STX|STY|TAX|TAY|TSX|TXA|TXS|
TYA|CPY|DEC|DEX
) {
yylval.str = strdup(yytext);
for(char *ptr = yylval.str; *ptr = tolower(*ptr); ptr++);
return MNEMONIC;
}
[\(\)=Aa#XxYy,:\+\-\<\>] {
return *yytext;
}
\$[0-9a-fA-F]{4} {
yylval.str = strdup(yytext);
return ABSOLUTE;
}
\$[0-9a-fA-F]{2} {
yylval.str = strdup(yytext);
return ZEROPAGE;
}
and this is how I parse them in bison:
struct addr_offset {
char *str;
int offset;
};
%union {
char *str;
int number;
struct addr_offset *ao;
}
%type<str> MNEMONIC
%type<str> ABSOLUTE
%type<ao> zp
%token ZEROPAGE
expression:
MNEMONIC { statement(0, $1, NULL, "i"); }
| MNEMONIC zp { statement(5, $1, $2, }
;
zp:
ZEROPAGE { $$->str = strdup($1); }
| '>' ABSOLUTE { $$->str = strdup($2); }
| '<' ABSOLUTE { $$->str = strdup($2); }
;
Weird thing is, if I add the last two parts to the zp rule, the MNEMONIC is not read correctly in the expression rule.
If you don't set
$$in a rule, bison will by default initialize it with the value of$1. If that is a different%typethan$$is expecting, bad things will happen.In the case you are describing, it will likely be the value associated with the
<or>token. Since those tokens don't setyylvalin the lexer code, it will be whatever happens to be there from the previous token -- in this case, the string allocated withstrdupforMNEMONIC. So when you assign to$$->str, it will treat the string as if it is a pointer to the data structure in question, and will overwrite 4 or 8 characters in the string with the pointer to another string that is being assigned there.So the likely result will be some heap corruption which will manifest as bad/corrupted opcodes when you go to look at them.
So with the addition of the
%union/%typedeclarations, we can see what is happening -- your're allocating a string and then treating the string's memory as astruct ao, which causes heap corruption and undefined behavior.You need your actions that return a
struct aoto actually allocate astruct ao:Note that you don't need a strdup here, as the string has already been allocated in the lexer code, and you're just transferring ownership of that string from the token to the new
struct aoyou're creating.You might want to encapsulation the creation of the ao object in a function:
then your actions just become eg,
{ $$ = new_ao($1); }