Haskell - how to avoid messing pure with IO

161 views Asked by At

I am implementing some algorithm on haskell. This algorithm requires generating some data.

I have a function of an algorithm which takes generation function as a parameter. For example, algorithm is just multiplying input data by n:

 algo :: a -> ??? -> [a]
 algo n dgf = map (\x -> x * n) $ dgf

dgf is used to generate data. How to write function header correctly, as dgf can be any function with any number of parameters?

Another variant is accepting not the generation function but already generated data.

algo :: a -> [b] -> [a]
algo n d = (\x -> n*x) d

So, now let's imagine I'm generation data with stdGen, which uses IO. How can I make function more generic, so that it could accept both IO instance and plain values like just [1,2,3]. This also relates to variant with function, as it can also produce IO.

All in all, which solution is better - having a generation function or a pre-generated data?

Thanks in advance.

2

There are 2 answers

2
rampion On BEST ANSWER

For contrast, I'm going to take the opposite approach as dfeuer's answer.

Just use lists.

Consider your first example:

algo :: a -> ??? -> [a]
algo n dgf = map (\x -> x * n) $ dgf

You ask "How to write function header correctly, as dgf can be any function with any number of parameters?"

Well, one way is to use uncurrying.

Normally, Haskell functions are curried. If we have a function like

add :: Int -> Int -> Int
add x y = x + y

And we want a function that adds two to its input we can just use add 2.

>>> map (add 2) [1..10]
[3,4,5,6,7,8,9,10,11,12]

Because add is not actually a function that takes two arguments, it's a function of one argument that returns a function of one argument.

We could have added parentheses to the argument of add above to make this more clear:

add :: Int -> (Int -> Int)

In Haskell, all functions are functions of one argument.

However, we can also go the other way - uncurry a function that returns a function to get a function that takes a pair:

>>> :t uncurry
uncurry :: (a -> b -> c) -> (a, b) -> c
>>> :t uncurry add
uncurry add :: (Int, Int) -> Int

This can also be useful, say if we want to find the sum of each pair in a list:

>>> map (uncurry add) [ (1,2), (3,4), (5,6), (7,8), (9,10) ]
[3,7,11,15,19]

In general, we can uncurry any function of type a0-> a1 -> ... -> aN -> b into a function (a0, a1, ..., aN) -> b, though there might not be a cute library function to do it for us.

With that in mind, we could implement algo by passing it an uncurried function and a tuple of values:

algo :: Num a => a -> (t -> [a]) -> t -> [a]
algo n f t = map (\x -> x * n) $ f t

And then use anonymous functions to uncurry our argument functions:

>>> algo 2 (\(lo,hi) -> enumFromTo lo hi) (5, 10)
[10,12,14,16,18,20]
>>> algo 3 (\(a,b,c,d) -> zipWith (+) [a..b] [c..d]) (1, 5, 10, 14)
[33,39,45,51,57]

Now we could do it this way, but we don't need to. As implemented above, algo is only using f and t once. So why not pass it the list directly?

algo' :: Num a => a -> [a] -> [a]
algo' n ns = map (\x -> x * n) ns

It calculates the same results:

>>> algo' 2 $ (\(lo,hi) -> enumFromTo lo hi) (5, 10)
[10,12,14,16,18,20]
>>> algo' 2 $ enumFromTo 5 10
[10,12,14,16,18,20]
>>> algo' 3 $ (\(a,b,c,d) -> zipWith (+) [a..b] [c..d]) (1, 5, 10, 14)
[33,39,45,51,57]
>>> algo' 3 $ zipWith (+) [1..5] [10..14]
[33,39,45,51,57]

Furthermore, since haskell is non-strict, the argument to algo' isn't evaluated until it's actually used, so we don't have to worry about "wasting" time computing arguments that won't actually be used:

algo'' :: Num a => a -> [a] -> [a]
algo'' n ns = [n,n,n,n]

algo'' doesn't use the list passed to it, so it's never forced, so whatever computation is used to calculate it never runs:

>>> let isPrime n = n > 2 && null [ i | i <- [2..n-1], n `rem` i == 0 ]
>>> :set +s
>>> isPrime 10000019
True
(6.18 secs, 2,000,067,648 bytes)
>>> algo'' 5 (filter isPrime [1..999999999999999])
[5,5,5,5]
(0.01 secs, 68,936 bytes)

Now to the second part of your question - what if your data is being generated within some monad?

Rather than convince algo to operate on monadic values, you could take the stream based approach as dfeuer explains. Or you could just use a list.

Just because you're in a monad, doesn't mean that your values suddenly become strict.

For example, want a infinite list of random numbers? No problem.

newRandoms :: Num a -> IO [a]
newRandoms = unfoldr (\g -> Just (random g)) <$> newStdGen

Now I can just pass those to some algorithm:

>>> rints <- newRandoms :: IO [Int]
(0.00 secs, 60,624 bytes)
>>> algo'' 5 rints
[5,5,5,5]
(0.00 secs, 68,920 bytes)

For a small program which is just reading input from a file or two, there's no problem with just using readFile and lazy I/O to get a list to operate on.

For example

>>> let grep pat lines = [ line | line <- lines, pat `isInfixOf` line ]
>>> :set +s
>>> dict <- lines <$> readFile "/usr/share/dict/words"
(0.01 secs, 81,504 bytes)
>>> grep "poop" dict
["apoop","epoophoron","nincompoop","nincompoopery","nincompoophood","nincompoopish","poop","pooped","poophyte","poophytic","whisterpoop"]
(0.72 secs, 423,650,152 bytes)
0
dfeuer On

One option is to take a stream rather than a list. If generating the values involves performing IO, and there may be many many values, this is often the best approach. There are several packages that offer streams of some sort, but I'll use the streaming package in this example.

import qualified Streaming.Prelude as S
import Streaming

algo :: Monad m => a -> Stream (Of a) m r -> Stream (Of a) m r
algo a = S.map (a +)

You can read Stream (Of a) m r as "a way to use operations in m to produce successive values of type a and finally a result of type r". This algo function doesn't commit to any particular way of generating the data; they can be created purely:

algo a (S.each [these, are, my, elements])

or within IO,

algo a $ S.takeWhile (> 3) (S.readLn :: Stream (Of Int) IO ())

or using a randomness monad, or whatever you like.