I need to read a large file in Scala and process it in blocks of k bits (k could be 65536 typically). As a simple example (but not what I want):
file blocks are (f1, f2, ... fk).
I want to compute SHA256(f1)+SHA256(f2)+...+ SHA256(fk)
Such a computation can be done incrementally using only constant storage and the current block without needing other blocks.
What is the best way to read the file? (perhaps something that uses continuations?)
EDIT: The linked question kind of solves the problem but not always, as the file I am looking at contains binary data.
Here is an approach using Akka Streams. This uses constant memory and and can process the file chunks as they are read.
See the "Streaming File IO" at the bottom of this page for more info. http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-RC3/scala/stream-io.html
Start with a simple
build.sbtfile:The interesting parts are the
Source,Flow, andSink. TheSourceis aSynchronousFileSourcethat reads in a large file with a chunk size of65536. AByteStringof chunk size is emitted from theSourceand consumed by aFlowwhich calculates a SHA256 hash for each chunk. Lastly, theSinkconsumes the output from theFlowand prints the byte arrays out. You'll want to convert these and sum them using afoldto get a total sum.BYTE ARRAYS!