This is an extremely basic question and I honestly feel a bit silly writing it.
TL;DR: How can I write a function which makes use of parsec
library to mimic the behavior of the words
function from Data.List
? An example of the intended behavior:
wordsReplica "I love lamp" = ["I","love","lamp"]
I just read the first couple pages of the Parsec chapter from Real World Haskell
and it would be incredibly helpful to understand what constitutes a bare-minimum parsing function (one that does more than return the argument or return nothing). (RWH's introductory example shows how to parse a multi-line CSV file...)
As such, I thought it'd be a useful, basic exercise to rewrite words
using parsec
... It's turning out to be not so basic (for me)...
The following is my attempt; unfortunately it generates an "unexpected end of input" error (at runtime) no matter what I give it. I've tried reading the descriptions/definitions of the simple functions in the parsec
library on haskell.org but they aren't that illustrative, atleast for someone who's never done parsing of any kind before, including in other languages.
testParser :: String -> Either ParseError [[String]]
testParser input = parse dcParser "(unknown)" input
where
wordsReplica = endBy
(sepBy
(many (noneOf " "))
(char ' '))
(char ' ')
(Please pardon the lisp-y, non-pointfree presentation - when I'm learning about a new function, it helps me if I make the notation/structure super explicit.)
Update:
Here's something that's a step in the right direction (but still not quite there as it doesn't do numbers):
λ: let wordsReplica = sepBy (many letter) (char ' ')
λ: parse wordsReplica "i love lamp 867 5309"
Right ["i","love","lamp",""]
Update 2:
Seems like this function gets the job done, though am not sure how idiomatic it is:
λ: let wordsReplica = sepBy (many (satisfy(not . isSpace))) (char ' ')
wordsReplica :: Stream s m Char => ParsecT s u m [[Char]]
λ: parse wordsReplica "" "867 5309 i love lamp %all% !(nonblanks are $$captured$$"
Right ["867","5309","i","love","lamp","%all%","!(nonblanks","are","$$captured$$"]
it :: Either ParseError [[Char]]
It's fine, but it doesn't work as you intend:
Not quite what you want. After all, a word should consist of at least one character. But if you change
many
tomany1
, you will notice another error:That's because your separating parser isn't greedy enough. Instead of parsing a single space, parse as many as you can: