I am trying to understand bash's parser and lexer mechanism. (My ultimate goal is implementing a bash-like shell).
The first case
$ test='o a'
$ ech$test
a
(^ edit: I removed double quotes for second line. My actual test case was that.)
The expander expanded the command and found the argument after expanding.
Expanded full command: echo a
So, I can assume that the lexer runs after the expansion operation because the bash understood that "echo a" is not a command name. "echo" is the command name, and "a" is an argument. (btw zsh don't that.)
The second case
$ test="'"
$ echo $test
'
Echo prints only one single quote. However, if we expand this string to: echo ', it is not a valid command because it has an unclosed quote. So, I can assume two things:
At first, the lexer understands what it is and expands after.
Actually, the value of the 'test' variable is not one single quote. Its value is exactly:
"'". So, in reality, we don't expand toecho 'we expand toecho "'", which is valid.
But the first assumption and the first test case's assumption do not coincide. So, I assume the second one.
The third case
$ test="'"
$ echo "$test"
'
Echo prints only one single quote again. However, (I assume) it expanded this string to: echo ""'"", which is invalid because we have an unclosed quote.
So, my question is: "How does the bash understand what I mean?"
The POSIX specification includes a detailed description of the behavior of the shell command language, including considerable detail about how it processes input. It begins with:
It continues from there with details of how lines are tokenized.
Only after a line has been tokenized can any kind of substitution or expansion be performed, because only at that point can the shell recognize where substitutions and expansions are called for.
I don't believe you. I do not reproduce that result in Bash 4.3, nor do I expect to do. The POSIX specifications and the Bash manual are explicit and in agreement on this point: parameter expansions that occur inside double quotes are not subject to word splitting.
With respect to the order of command-line processing, what happens is that
ech"$test"is recognized as a single token. Parameter expansion, field splitting, and quote removal apply to that token ("word" in shell jargon), with the overall result that it expands to a single wordecho a, which, by virtue of its position in the fully-expanded command line, is interpreted as a command name.It would be different if you instead did
where the
$testparameter expansion was not quoted. I suspect that's what you actually did in Bash to get the outputa. In this case, the expansion of$testis not protected from word splitting, so after expanding the word to the (single) wordecho a, that is split into two words at the space. The result isechoas command name andaas its argument.Yes, as the spec describes, quote characters are recognized during token recognition. Quote characters introduced into a command by parameter expansion are significant only as themselves.
Yes, it is. And this can be tested in a variety of ways, such as (your third case)
echo "$test", orecho ${#test}, or (in bash)echo "${test:0:1}".No, it isn't. See above for the actual explanation of the behavior you observed.
Overall, I strongly recommend relying first and foremost on the specifications for the behavior you want to implement. Experimenting is a fine way to try to clarify and solidify your interpretation of the specs, but it is a very unreliable way to determine the details of the required behavior.