Making Attoparsec based parser more efficient

110 views Asked by At

I wrote a simple text STL (Standard Tessellation Library) parser using Attoparsec. STL contains a collection of facets. Each facet contains a normal, and vertices of triangle. Typical STL file can be large (~100 Mb or more).

STL file format is

solid
 facet normal nx ny nz
   outer loop
       vertex x1 y1 z1
       vertex x2 y2 z2
       vertex x3 y3 z3 
 endfacet
endsolid

The parser code is here. Full implementation can be found at STLReader

-- | Point represents a vertex of the triangle
data Point a = Point !a !a !a deriving Show

-- | Vector for a given class
data Vector a = Vector !a !a !a deriving Show

-- | Parse coordinate triplet. 
coordinates :: (Fractional a) => Text -> (a -> a -> a -> b) -> Parser b
coordinates s f = do
  skipSpace
  string s
  !x <- coordinate
  !y <- coordinate
  !z <- coordinate
  return $! f x y z
  where
    coordinate = skipWhile isHorizontalSpace *> fmap realToFrac double
{-# INLINE coordinates #-}

type RawFacet a = (Vector a, Point a, Point a, Point a)

-- | Parse a facet. The facet comprises of a normal, and three vertices
facet  :: Fractional a => Parser (RawFacet a)
facet = (,,,) <$> beginFacet
          <* (skipSpace     *> "outer loop")
          <*> vertexPoint
          <*> vertexPoint
          <*> vertexPoint
          <*  (skipSpace <* "endloop" <* endFacet )
          <?> "facet"
  where
    beginFacet  = skipSpace <* "facet" *> coordinates "normal" Vector
    endFacet    = skipSpace <* string "endfacet"
    vertexPoint = coordinates "vertex" Point
{-# INLINE facet #-}

rawFacets :: Fractional a => Parser [RawFacet a]
rawFacets = beginSolid *> many' facet <* endSolid
      where
            solidName  = option "default" (skipWhile isHorizontalSpace *> fmap T.pack (many1 $ satisfy isAlphaNum) )
            beginSolid = skipSpace <* "solid" *> solidName <?> "start solid"
            endSolid   = skipSpace <* "endsolid" <?> "end solid"

-- | Read text STL file. STL extensions for color etc. are not supported in this version. 
readTextSTL :: Fractional a => FilePath -> IO (Either String [RawFacet a])
readTextSTL path = liftM (Al.eitherResult . Al.parse rawFacets) (TIO.readFile path)

main :: IO Int
main = do
  (path:_) <- getArgs
  putStrLn $ "Parsing STL file: " ++ path
  s <- readTextSTL path
  putStrLn "Parsing complete"
  case s of
   Left error -> putStrLn error
   Right s    -> putStrLn $ "Num facets : " ++ show (length s)
  return 0

I benchmarked this code with 'c' parser supplied with Meshlab. When I tested with 69Mb scan, meshlab completed the job in roughly 13 s, whereas it took 22 s with attoparsec. So though attoparsec enabled me to parse faster than Parsec (with parsec it was about 36 s), I still have a long way to go.

How can I improve this parser further?

0

There are 0 answers