Insert a character into parser combinator character stream in Haskell

565 views Asked by At

This question is related to both Parsec and uu-parsinglib. When we write parser combinators, they process characters streams from compiler. Is it somehow possible to parse a character and put it back (or return another character back) to the input stream?

I want for example to parse input "test + 5", parse the t, e, s, t and after recognition of test pattern, put for example v character back into the character stream, so while continuating the parsing process we are matching against v + 5

I do not want to use this in any particular case for now - I want to deeply learn the possibilities.

3

There are 3 answers

0
Doaitse Swierstra On

This is easily done in uu-parsinglib using the pSwitch function. But the question is why you want to do so? Because the v is missing from the input? In that case uu-parsinglib will perform error correction automatically so you do not need something like this. Otherwise you can write

pSwitch :: (st1 -> (st2, st2 -> st1)) -> P st2 a -> P st1 a
pInsert_v = pSwitch (\st1 -> (prepend v st2, id) (pSucceed ())

It depends on your actual state type how the v is actually added, so you will have to define the function

prepend
yourself. I do not know e.g. how such an insertion would influence the current position in the file etc.

Doaitse Swierstra

0
Petr On

I'm not sure if it's possible with these parsers directly, but in general you can accomplish it by combining parsers with some streaming that allows injecting leftovers.

For example, using attoparsec-conduit you can turn a parser into a conduit using

sinkParser :: (AttoparsecInput a, MonadThrow m)
           => Parser a b -> Consumer a m b

where Consumer is a special kind of conduit that doesn't produce any output, only receives input and returns a final value.

Since conduits support leftovers, you can create a helper method that converts a parser that optionally returns a value to be pushed into the stream into a conduit:

import Data.Attoparsec.Types
import Data.Conduit
import Data.Conduit.Attoparsec
import Data.Functor

reinject :: (AttoparsecInput a, MonadThrow m)
    => Parser a (Maybe a, b) -> Consumer a m b
reinject p = do
    (lo, r) <- sinkParser p
    maybe (return ()) leftover lo
    return r

Then you convert standard parsers to conduits using sinkParser and these special parsers using reinject, and then combine conduits instead of parsers.

0
Boldizsár Németh On

I think the simplest way to archive this is to build a multi-layered parser. Think of a lexer + parser combination. This is a clean approach to this problem.

You have to separate the two kind of parsing. The search-and-replace parsing goes to the first parser and the build-the-AST parsing to the second. Or you can create an intermediate token representation.

import Text.Parsec
import Text.Parsec.String

parserLvl1 :: Parser String
parserLvl1 = many (try (string "test" >> return 'v') <|> anyChar)

parserLvl2 :: Parser Plus
parserLvl2 = do text1 <- many (noneOf "+")
                char '+'
                text2 <- many (noneOf "+")
                return $ Plus text1 text2

data Plus = Plus String String
  deriving Show

wholeParse :: String -> Either ParseError Plus
wholeParse source = do res1 <- parse parserLvl1 "lvl1" source
                       res2 <- parse parserLvl2 "lvl2" res1
                       return res2

Now you can parse your example. wholeParse "test+5" results in Right (Plus "v" "5").

Possible variations:

  • Create a class and an instance for combining wrapped parser stages. (Possibly carrying parser state.)
  • Create an intermediate representation, a stream of tokens