I'm trying to write an LPeg pattern to match strings which:
- begin with a letter
- thereafter contain alphanumeric characters
- does not contain two or more consecutive hyphens (e.g. disallows
test--string)
For reference, the regular expression [a-zA-Z](-?[a-zA-Z0-9])* matches what I'm looking for.
Here's the code I'm working with, for reference:
require "lpeg"
P,R,C = lpeg.P,lpeg.R,lpeg.C
dash = P"-"
ucase = R"AZ"
lcase = R"az"
digit = R"09"
letter = ucase + lcase
alphanum = letter + digit
str_match = C(letter * ((dash^-1) * alphanum)^0)
strs = {
"1too",
"too0",
"t-t-t",
"t-t--t",
"t--t-t",
"t-1-t",
"t--t",
"t-one1",
"1-1",
"t-1",
"t",
"tt",
"t1",
"1",
}
for _,v in ipairs(strs) do
if lpeg.match(str_match,v) ~= nil then
print(v," => match!")
else
print(v," => no match")
end
end
However, much to my frustration, I get the following output:
1too => no match
too0 => match!
t-t-t => match!
t-t--t => match!
t--t-t => match!
t-1-t => match!
t--t => match!
t-one1 => match!
1-1 => no match
t-1 => match!
t => match!
tt => match!
t1 => match!
1 => no match
Despite what the code outputs, t-t--t, t--t-t, and t--t shouldn't match.
In your pattern
letter * ((dash^-1) * alphanum)^0, lpeg will try to match against the prefix of the string. For cases where you didn't expect a matchThe part highlighted in bold is where your pattern successfully matches.
lpeg.matchreturns the last position(which is a number) it was able to parse up to using your pattern if nothing gets captured. For the above 3 cases, the matching subpart is captured which explains the erroneous output you're seeing.If you're just matching each string one at a time, you can modify your pattern to check that there are no remaining characters left after the parse.
Similarly using
lpeg.remoduleFor stream matching or finding all pattern occurrences in the target string, stack the grammar rules together like this
Any matches will get captured and returned. If there are no matches you'll either get back
nilor a number indicating where in the string the pattern stopped parsing.Edit: For cases where you need the parse to return
nilon no match, this tweak to the grammar should do the trick