Based on provided example, we can get length of each line
import Conduit
import Data.Text (Text, pack)
import Text.Regex.TDFA ((=~), getAllTextMatches)
import Control.Monad.IO.Class (liftIO)
wc :: IO ()
wc = runResourceT
$ runConduit
$ sourceFile "input.txt"
.| decodeUtf8C
.| peekForeverE (lineC lengthCE >>= liftIO . print)
However, how would I get all matches based on regex? and in the end write them to a file?
regex :: IO ()
regex = runResourceT
$ runConduit
$ sourceFile "input.txt"
.| decodeUtf8C
.| do
line <- mapCE (\l -> getAllTextMatches (l =~ "^foo") :: [Text])
liftIO $ print $ line
Update:
Figured out there's built-in lines function, but is there a way to print a line and pass it along without consuming it?
grep :: IO ()
grep = runResourceT
$ runConduit
$ yield "foo\ndoo"
.| decodeUtf8C
.| Data.Conduit.Text.lines
.| mapC (\a -> a =~ ("[fd]oo" :: Text))
.| mapM_C (liftIO . (print :: Text -> IO ()))
.| encodeUtf8C
.| stdoutC
The above does print per line, but stdoutC ends up being not consumed
ghci> grep
"foo"
"doo"
Update 2: Figured out how to print in a pipeline
grep :: IO ()
grep = runResourceT
$ runConduit
$ yieldMany ["foo\ndoo", "\nduh"]
.| decodeUtf8C
.| Data.Conduit.Text.lines
.| mapC (\a -> a =~ ("[fd]oo" :: Text) :: Text)
.| log1
.| unlinesC
.| encodeUtf8C
.| stdoutC
But why does order of await matters?
log1 :: ConduitT Text Text (ResourceT IO) ()
log1 = do
Just l <- await -- <- has to be first
liftIO $ print l
yield l
It's not that clear from your question what you're trying to do, but if you are trying to copy all matching lines from
"input.txt"to"output.txt", kind of like thegrepcommand line utility, then you probably want a conduit that looks something like this:Note that
linesUnboundedCis a function in the "conduit" package that's equivalent to the deprecatedlinesfunction from "conduit-extra". Also, usingfilterChere is probably more natural than yourmapCfor filtering matching lines, rather than generating empty matches.Operating on the text file:
this conduit will copy the two matching lines to the output:
If you want to write the matching lines to both standard output and
output.txtsimultaneously, the conduit-friendly method is probably to end your conduit with asequenceSinkscomponent. (Thevoidcall here is needed to get the return type right.)If you prefer a
logconduit that you can insert in the middle to write a copy tostdout, then the following ought to work:or, if you're okay with having the Haskell quoted representations printed (i.e., surrounded by quotation marks with character escaping), then:
Some code to play around with: