I am writing a compiler for a DSL. After reading the source file into a string, all the rest steps (parsing, type checking, and codegen) are all pure code, transforming the code from one representation to another. All is good till there are dependencies in the source file (think of #include preprocessor in C). The parser needs to read the dependent files and recursively parse them. This makes it not pure anymore. I have to change it from returning AST to IO AST. Also, all the subsequent steps (type checking and codegen) have to return IO types as well, which requires significant changes. What is a good way to handle reading dependent files in this case?
p.s. I can use unsafePerformIO, but that seems a hacky solution that can lead to technical debt.
A good solution is to parse into an AST containing dependency information, then resolve the dependencies separately, outside the parser. For example, suppose you have a format that may be an
#includeline or a content line:And a parser
parse :: String -> WithIncludesso that these files:file1:file2:Parse to these representations:
You can add another type representing a flattened file with the imports resolved:
And separately from parsing, load and recursively flatten includes:
Then the result is:
Parsing remains pure, and you just have an
IOwrapper around it driving which files to load. You can even reuse the logic here for loading a single file:It’s also a good idea to add logic here to check for import cycles, for example by adding an accumulator to
flattencontaining aSetof canonicalisedFilePaths, and checking at eachIncludethat you haven’t seen the sameFilePathalready.For a more complex AST, you may want to share most of the structure between the unresolved and resolved types. In that case, you can parameterise the type by whether it’s resolved, and have the unresolved & resolved types be aliases for the underlying AST type with different arguments, for instance: