I want to parse all json files in a given directory into a data type Result.
So i have a decode function
decodeResult :: Data.ByteString.Lazy.ByteString -> Maybe Result
I began with Data.Text.Lazy.IO to load file into Lazy ByteString,
import qualified Data.Text.Lazy.IO as T
import qualified Data.Text.Lazy.Encoding as T
getFileContent :: FilePath -> IO B.ByteString
getFileContent path = T.encodeUtf8 `fmap` T.readFile path
It compiled, but I ran into Too many files opened problem, so I thought maybe I should use withFile.
import System.IO
import qualified Data.ByteString.Lazy as B
import qualified Data.Text.Lazy.IO as T
import qualified Data.Text.Lazy.Encoding as T
getFileContent :: FilePath -> IO (Maybe Result)
getFileContent path = withFile path ReadMode $ \hnd -> do
content <- T.hGetContents hnd
return $ (decodeAnalytic . T.encodeUtf8) content
loadAllResults :: FilePath -> IO [Result]
loadAllResults path = do
paths <- listDirectory path
results <- sequence $ fmap getFileContent (fmap (path ++ ) $ filter (endswith ".json") paths)
return $ catMaybes results
In this version, the lazy io seems never got evaluated, it always return empty list. But If i print content inside getFileContent function, then everything seems work correctly.
getFileContent :: FilePath -> IO (Maybe Result)
getFileContent path = withFile path ReadMode $ \hnd -> do
content <- T.hGetContents hnd
print content
return $ (decodeAnalytic . T.encodeUtf8) content
So I am not sure what am I missing, should I use conduit for this type of things?
Generally speaking I would recommend using a streaming library for parsing arbitrarily sized data like a JSON file. However, in the specific case of parsing JSON with aeson, the concerns of overrunning memory are not as significant IMO, since the aeson library itself will ultimately represent the entire file in memory as a
Value
type. So given that, you may choose to simply use strict bytestring I/O. I've given an example of using both conduit and strict I/O for parsing a JSON value. (I think the conduit version exists in some libraries already, I'm not sure.)EDIT Forgot to mention: I'd strongly recommend against using textual I/O for reading your JSON files. JSON files should be encoded with UTF-8, while the textual I/O functions will use whatever your system settings specify for character encoding. Relying on
Data.ByteString.readFile
and similar is more reliable. I went into more detail in a recent blog post.