Splitting string into substrings in Lua

210 views Asked by At

I am trying to split a string to substrings using Lua. Using the pattern in the for loop below I would have expected 4 matches but I only get 2.

print(words[1]) displays

"###Lorem ipsum dolor sit amet, Gruß consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam \n"

and print(words[2]) displays

"###At vero eos et accusam et justo duo dolores et ea rebum. Stet clita \nkasd gubergren, no sea takimata Gruß sanctus est \n"

Can someone please explain me this behavior ?

i=0
content = "###Lorem ipsum dolor sit amet, Gruß consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam \n ###voluptua. ###At vero eos et accusam et justo duo dolores et ea rebum. Stet clita \nkasd gubergren, no sea takimata Gruß sanctus est \n###XLorem ipsum dolor sit amet. Lorem ipsum \ndolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor \ninvidunt ut labore et Gruß dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.sdl"
for word in string.gmatch(content, '###') 
do i = i+1 end

if(i>1) then 
 content = content .. '###'
else end

words= {}    
for y in string.gmatch(content,"(###.-)###")
do  
   table.insert(words, y) 
end

print(words[3])
2

There are 2 answers

0
lhf On

Your first loop does find four matches. Try this to confirm:

for word in string.gmatch(content, '###([^#]+)') do
  print(word)
end

If that works for you, then save word in the loop as needed.

0
Yu Hao On

This is a simplified version of the your second loop:

content = '###aa###bb###cc###dd###'
words= {}    
for y in string.gmatch(content,"(###.-)###") do  
    print(y)
    table.insert(words, y) 
end

Output:

###aa
###cc

The problem is, with the pattern (###.-)###, the second ### is consumed, too. What you need is like the regex lookahead (###.+?)(?=###). Unfortunately Lua pattern doesn't support lookahead. This is one possible workaround:

local left = content
local start = 1
while true do
    start, index, match = string.find(left, "(###.-)###")
    if not start then break end
    print(match)
    left = left:sub(index - 3)   --3 is the length of "###"
end