How to use Data.Text.Lazy.IO to parse JSON files with Aeson

337 views Asked by At

I want to parse all json files in a given directory into a data type Result.

So i have a decode function

decodeResult :: Data.ByteString.Lazy.ByteString -> Maybe Result

I began with Data.Text.Lazy.IO to load file into Lazy ByteString,

import qualified Data.Text.Lazy.IO as T
import qualified Data.Text.Lazy.Encoding as T

getFileContent :: FilePath -> IO B.ByteString
getFileContent path = T.encodeUtf8 `fmap` T.readFile path

It compiled, but I ran into Too many files opened problem, so I thought maybe I should use withFile.

import System.IO
import qualified Data.ByteString.Lazy as B
import qualified Data.Text.Lazy.IO as T
import qualified Data.Text.Lazy.Encoding as T

getFileContent :: FilePath -> IO (Maybe Result)
getFileContent path = withFile path ReadMode $ \hnd -> do
   content <- T.hGetContents hnd
   return $ (decodeAnalytic . T.encodeUtf8) content

loadAllResults :: FilePath -> IO [Result]
loadAllResults path = do
   paths <- listDirectory path
   results <- sequence $ fmap getFileContent (fmap (path ++ ) $ filter (endswith ".json") paths)
   return $ catMaybes results

In this version, the lazy io seems never got evaluated, it always return empty list. But If i print content inside getFileContent function, then everything seems work correctly.

getFileContent :: FilePath -> IO (Maybe Result)
getFileContent path = withFile path ReadMode $ \hnd -> do
   content <- T.hGetContents hnd
   print content
   return $ (decodeAnalytic . T.encodeUtf8) content

So I am not sure what am I missing, should I use conduit for this type of things?

1

There are 1 answers

1
Michael Snoyman On BEST ANSWER

Generally speaking I would recommend using a streaming library for parsing arbitrarily sized data like a JSON file. However, in the specific case of parsing JSON with aeson, the concerns of overrunning memory are not as significant IMO, since the aeson library itself will ultimately represent the entire file in memory as a Value type. So given that, you may choose to simply use strict bytestring I/O. I've given an example of using both conduit and strict I/O for parsing a JSON value. (I think the conduit version exists in some libraries already, I'm not sure.)

#!/usr/bin/env stack
{- stack --resolver lts-7.14 --install-ghc runghc
   --package aeson --package conduit-extra
-}
import           Control.Monad.Catch     (MonadThrow, throwM)
import           Control.Monad.IO.Class  (MonadIO, liftIO)
import           Data.Aeson              (FromJSON, Result (..), eitherDecodeStrict',
                                          fromJSON, json, Value)
import           Data.ByteString         (ByteString)
import qualified Data.ByteString         as B
import           Data.Conduit            (ConduitM, runConduitRes, (.|))
import           Data.Conduit.Attoparsec (sinkParser)
import           Data.Conduit.Binary     (sourceFile)

sinkFromJSON :: (MonadThrow m, FromJSON a) => ConduitM ByteString o m a
sinkFromJSON = do
    value <- sinkParser json
    case fromJSON value of
        Error e -> throwM $ userError e
        Success x -> return x

readJSONFile :: (MonadIO m, FromJSON a) => FilePath -> m a
readJSONFile fp = liftIO $ runConduitRes $ sourceFile fp .| sinkFromJSON

-- Or using strict I/O
readJSONFileStrict :: (MonadIO m, FromJSON a) => FilePath -> m a
readJSONFileStrict fp = liftIO $ do
    bs <- B.readFile fp
    case eitherDecodeStrict' bs of
        Left e -> throwM $ userError e
        Right x -> return x

main :: IO ()
main = do
    x <- readJSONFile "test.json"
    y <- readJSONFileStrict "test.json"
    print (x :: Value)
    print (y :: Value)

EDIT Forgot to mention: I'd strongly recommend against using textual I/O for reading your JSON files. JSON files should be encoded with UTF-8, while the textual I/O functions will use whatever your system settings specify for character encoding. Relying on Data.ByteString.readFile and similar is more reliable. I went into more detail in a recent blog post.