I need to read a large file in Scala and process it in blocks of k bits (k could be 65536 typically). As a simple example (but not what I want):
file blocks are (f1, f2, ... fk)
.
I want to compute SHA256(f1)+SHA256(f2)+...+ SHA256(fk)
Such a computation can be done incrementally using only constant storage and the current block without needing other blocks.
What is the best way to read the file? (perhaps something that uses continuations?)
EDIT: The linked question kind of solves the problem but not always, as the file I am looking at contains binary data.
Here is an approach using Akka Streams. This uses constant memory and and can process the file chunks as they are read.
See the "Streaming File IO" at the bottom of this page for more info. http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0-RC3/scala/stream-io.html
Start with a simple
build.sbt
file:The interesting parts are the
Source
,Flow
, andSink
. TheSource
is aSynchronousFileSource
that reads in a large file with a chunk size of65536
. AByteString
of chunk size is emitted from theSource
and consumed by aFlow
which calculates a SHA256 hash for each chunk. Lastly, theSink
consumes the output from theFlow
and prints the byte arrays out. You'll want to convert these and sum them using afold
to get a total sum.BYTE ARRAYS!